Commit Graph

576 Commits

Author SHA1 Message Date
Eric Dumazet 92f1655aa2 net: fix __dst_negative_advice() race
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.

RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).

Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.

Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.

Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.

Many thanks to Clement Lecigne for tracking this issue.

This old bug became visible after the blamed commit, using UDP sockets.

Fixes: a87cb3e48e ("net: Facility to report route quality of connected sockets")
Reported-by: Clement Lecigne <clecigne@google.com>
Diagnosed-by: Clement Lecigne <clecigne@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 17:34:49 -07:00
Jakub Kicinski e7073830cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
  35d92abfba ("net: hns3: fix kernel crash when devlink reload during initialization")
  2a1a1a7b5f ("net: hns3: add command queue trace for hns3")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-09 10:01:01 -07:00
Jakub Kicinski 179a6f5df8 ipsec-next-2024-05-03
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmY0mEIACgkQrB3Eaf9P
 W7fU7g//bQyydwei/4Vo+cNCPp82k8wL/qhDY3IjN10PfJOSNmeCAcSgkuuHTRSx
 g/hoxZEVzLrQT5bt+Sb38JxADFiL787GjdGEUy1gzF7CnDKcGT5KnydYNjDqDVGt
 nOv9kAGfWIkMKdNqrhHifddPMWd+ZqvpUcFz5olvqIE2mpNgMwy2i3NID9bNAV31
 v5AEvNINa1LKOhX9cEka8iPQXwp+I6yTLqyOd4VciOuFr8dPg0FQqFaYR+OtMsV0
 kIxdGTVfmRWaNgq/Tsg4z/2rXEwmEjTWzAhNVGu8o8L3JozXOvbjIrDG7Ws6qB3V
 XTFl8ueRMk0UCTlY/QAfip5H7IlAo+H0FUBC45FNP1UhHeWisXT4D5rqAEqQTlZR
 bddtuueLZyKclFpXRNi+/8vdDrXhhEzeNINkc52Ef33rUTtZJR8bXrEUKzaYCIuF
 ldub0PA3+e5wvwIxq5/Chc/+MIaIHnXBMUmbCJSPnMrupBQtO+i6arPQcbtaBAgS
 YyVGTRk9YN0UAjSriIuiViLlgUCMsvsWgfSz9rd0PE54MFBrvcLPeCtPxKZ+sTVG
 Y2iSZ8d3ThvsMiQVNU8gj3SlTY1oTvuaijDDGjnR0nWkxV9LMJHCPKfIzsbOKLJe
 +ee5hKP4TOFygnV58BkqdGK/LavNpouTIbrM43hgmJ0IX9kSt4o=
 =QiGZ
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2024-05-03

1) Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support.
   This was defined by an early version of an IETF draft
   that did not make it to a standard.

2) Introduce direction attribute for xfrm states.
   xfrm states have a direction, a stsate can be used
   either for input or output packet processing.
   Add a direction to xfrm states to make it clear
   for what a xfrm state is used.

* tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: Restrict SA direction attribute to specific netlink message types
  xfrm: Add dir validation to "in" data path lookup
  xfrm: Add dir validation to "out" data path lookup
  xfrm: Add Direction to the SA in or out
  udpencap: Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support
====================

Link: https://lore.kernel.org/r/20240503082732.2835810-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-06 19:14:56 -07:00
Antony Antony 601a0867f8 xfrm: Add dir validation to "out" data path lookup
Introduces validation for the x->dir attribute within the XFRM output
data lookup path. If the configured direction does not match the expected
direction, output, increment the XfrmOutStateDirError counter and drop
the packet to ensure data integrity and correct flow handling.

grep -vw 0 /proc/net/xfrm_stat
XfrmOutPolError         	1
XfrmOutStateDirError    	1

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-01 10:05:52 +02:00
Eric Dumazet e8dfd42c17 ipv6: introduce dst_rt6_info() helper
Instead of (struct rt6_info *)dst casts, we can use :

 #define dst_rt6_info(_ptr) \
         container_of_const(_ptr, struct rt6_info, dst)

Some places needed missing const qualifiers :

ip6_confirm_neigh(), ipv6_anycast_destination(),
ipv6_unicast_destination(), has_gateway()

v2: added missing parts (David Ahern)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-29 13:32:01 +01:00
Antony Antony 8b06a24bb6 xfrm: fix possible derferencing in error path
Fix derferencing pointer when xfrm_policy_lookup_bytype returns an
 error.

Fixes: 63b21caba1 ("xfrm: introduce forwarding of ICMP Error messages")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/kernel-janitors/f6ef0d0d-96de-4e01-9dc3-c1b3a6338653@moroto.mountain/
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-04-11 10:23:36 +02:00
David S. Miller 3dbf6d67f2 ipsec-next-2024-03-06
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmXoQdQACgkQrB3Eaf9P
 W7dnTQ//RnTEaOPgTsHzhSwVOfWhsWkHx2xqUAlPNY8W2jrzxGgAIknPzobivvRJ
 U2bYPXDocDHUJAHIELUlu+lzATEz8baBN5zK5a+pPx5hXJlf5UI95linNZ5rEIiV
 RoxLpicnJqtWn1oMZ8d7Y0CknsLR/f4ruiVApzoifk1JaXC/zX8FcqqKsSPwVlqA
 GKy4+f71rNrIE9fbBAqDpmt6RuyRp/5yXPHLBoZlEXfYrYU1JOG8b/HLtGMD0SzV
 yHbDcgRPtbkWgAwNO/zxSDKa+PZr7NbVgakDzyHK+TltpU+6cOsajCaSXHWwsTBB
 +AebDschYY1H49oQe4bwLbNdGY+4lFvXxtk02sa8eM5a104MWxxTEB1QGAEri6gQ
 biAh3xTTbDpls26qkm97iZ6LlDE6pVIzF744buOYedvR8gjjoLt1z1PId05wMYGB
 A/4P6WkM8I1CZL++ODVfT8qR2N6lwFAQ6AM/eqHLvc6QpZ5Hm3lQAdLz1tK6QlCP
 MIV9uuNz8dFPrX1QifmLGojjdedB+4ASglxffOaoqRpHnMgHgzWTOux8tSFpuJGu
 mIYO/Dv5sHMdH8Jm+xXX1549bRzR+KGuqjXPxOSiO1jbOb5VC5ZDd3LVWb7fpDid
 K4eaU4Bo4R3eiCo1Bapt/1jKV1YFuyBKqTvObCDslVuN3Fu9d7I=
 =e4aa
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
1) Introduce forwarding of ICMP Error messages. That is specified
   in RFC 4301 but was never implemented. From Antony Antony.

2) Use KMEM_CACHE instead of kmem_cache_create in xfrm6_tunnel_init()
   and xfrm_policy_init(). From Kunwu Chan.

3) Do not allocate stats in the xfrm interface driver, this can be done
   on net core now. From Breno Leitao.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:56:05 +00:00
Mike Yu d4872d70fc xfrm: fix xfrm child route lookup for packet offload
In current code, xfrm_bundle_create() always uses the matched
SA's family type to look up a xfrm child route for the skb.
The route returned by xfrm_dst_lookup() will eventually be
used in xfrm_output_resume() (skb_dst(skb)->ops->local_out()).

If packet offload is used, the above behavior can lead to
calling ip_local_out() for an IPv6 packet or calling
ip6_local_out() for an IPv4 packet, which is likely to fail.

This change fixes the behavior by checking if the matched SA
has packet offload enabled. If not, keep the same behavior;
if yes, use the matched SP's family type for the lookup.

Test: verified IPv6-in-IPv4 packets on Android device with
      IPsec packet offload enabled
Signed-off-by: Mike Yu <yumike@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-03-05 10:48:18 +01:00
Kunwu Chan 1476de6d2b xfrm: Simplify the allocation of slab caches in xfrm_policy_init
commit 0a31bd5f2b ("KMEM_CACHE(): simplify slab cache creation")
introduces a new macro.
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-02-09 06:48:04 +01:00
Antony Antony 63b21caba1 xfrm: introduce forwarding of ICMP Error messages
This commit aligns with RFC 4301, Section 6, and addresses the
requirement to forward unauthenticated ICMP error messages that do not
match any xfrm policies. It utilizes the ICMP payload as an skb and
performs a reverse lookup. If a policy match is found, forward
the packet.

The ICMP payload typically contains a partial IP packet that is likely
responsible for the error message.

The following error types will be forwarded:
- IPv4 ICMP error types: ICMP_DEST_UNREACH & ICMP_TIME_EXCEEDED
- IPv6 ICMPv6 error types: ICMPV6_DEST_UNREACH, ICMPV6_PKT_TOOBIG,
			   ICMPV6_TIME_EXCEED

To implement this feature, a reverse lookup has been added to the xfrm
forward path, making use of the ICMP payload as the skb.

To enable this functionality from user space, the XFRM_POLICY_ICMP flag
should be added to the outgoing and forward policies, and the
XFRM_STATE_ICMP flag should be set on incoming states.

e.g.
ip xfrm policy add flag icmp tmpl

ip xfrm policy
src 192.0.2.0/24 dst 192.0.1.0/25
	dir out priority 2084302 ptype main flag icmp

ip xfrm state add ...flag icmp

ip xfrm state
root@west:~#ip x s
src 192.1.2.23 dst 192.1.2.45
	proto esp spi 0xa7b76872 reqid 16389 mode tunnel
	replay-window 32 flag icmp af-unspec

Changes since v5:
- fix return values bool->int, feedback from Steffen

Changes since v4:
- split the series to only ICMP erorr forwarding

Changes since v3: no code chage
 - add missing white spaces detected by checkpatch.pl

Changes since v2: reviewed by Steffen Klassert
 - user consume_skb instead of kfree_skb for the inner skb
 - fixed newskb leaks in error paths
 - free the newskb once inner flow is decoded with change due to
   commit 7a0207094f ("xfrm: policy: replace session decode with flow dissector")
 - if xfrm_decode_session_reverse() on inner payload fails ignore.
   do not increment error counter

Changes since v1:
- Move IPv6 variable declaration inside IS_ENABLED(CONFIG_IPV6)

Changes since RFC:
- Fix calculation of ICMPv6 header length

Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-01-25 08:41:44 +01:00
Guillaume Nault 1982a2a02c xfrm: Clear low order bits of ->flowi4_tos in decode_session4().
Commit 23e7b1bfed ("xfrm: Don't accidentally set RTO_ONLINK in
decode_session4()") fixed a problem where decode_session4() could
erroneously set the RTO_ONLINK flag for IPv4 route lookups. This
problem was reintroduced when decode_session4() was modified to
use the flow dissector.

Fix this by clearing again the two low order bits of ->flowi4_tos.
Found by code inspection, compile tested only.

Fixes: 7a0207094f ("xfrm: policy: replace session decode with flow dissector")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-01-17 08:18:53 +01:00
Daniel Xu 8f0ec8c681 bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
This commit adds an unstable kfunc helper to access internal xfrm_state
associated with an SA. This is intended to be used for the upcoming
IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other
words: for custom software RSS.

That being said, the function that this kfunc wraps is fairly generic
and used for a lot of xfrm tasks. I'm sure people will find uses
elsewhere over time.

This commit also adds a corresponding bpf_xdp_xfrm_state_release() kfunc
to release the refcnt acquired by bpf_xdp_get_xfrm_state(). The verifier
will require that all acquired xfrm_state's are released.

Co-developed-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/a29699c42f5fad456b875c98dd11c6afc3ffb707.1702593901.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-14 17:12:49 -08:00
Jakub Kicinski e0f9f0e073 ipsec-next-2023-10-28
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmU8x1wACgkQrB3Eaf9P
 W7eP7A//QaUgDJXR5ZrkU9SPp5GXufFew7Zq7F4ODQNR+2l/llNkvfVyp0W0va4D
 Keu1pEGTLfis2pc3nR1zezTJU+ZTk0Y+AshqasvYE5uSCcn8BNLaaV+4AmVEHtp5
 i3C+guevUnQ66IXhVu/sdPNduDUhMj/tSr3y9pT30oSINT4nsYY2Z8VZezrzdWrr
 j+rRuqtrg0avuEkbbuXT5zS396ngEsrHd6RKQVN/eST0U787Xb2D+8hobvW6I9At
 tWTa8la/qSTP/ez/m7Z/c6YDFECzRUPrY7S9dW7e3DOpJmAE48RZj/odE9wDZJKW
 pu8EaCGGasDnYA3jWD8L65+/Mold9PbUc1m6HBGHYdCFR6UKpD7Eahl6kZ+p/9Wn
 TjaVFqBKZkB38H5Opedr8V8vQUR/3dQ8MON+PscnG1SI1ZRGlZLc3+gS3c1zOIVC
 iJG2GUaKTySmBEOL6YZbAzKA9nD4JdWMObZvxM8CwQ0q5AWR2Xzv187IVLF+NWMx
 aEyyrIpy7BBAEoWjcXmuBceCcdUdsDmiIWplSUbEdTaSJfjFvxezK0GhiPCibUr+
 FHvexPF1kuQAaNo+VNVLL5170RmxD1PoIcH3YIy1MLfv7sk+P3nd5EDmXAolrpXh
 hFg247odTiRehOF/y3VEUnHTHErakM39jFzkwf43elVlwlnMWmY=
 =slHB
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2023-10-28

1) Remove unused function declarations of xfrm4_extract_input and
   xfrm6_extract_input. From Yue Haibing.

2) Annotate struct xfrm_sec_ctx with __counted_by.
   From Kees Cook.

3) Support GRO decapsulation for ESP in UDP encapsulation.
   From Antony Antony et all.

4) Replace the xfrm session decode with flow dissector.
   From Florian Westphal.

5) Fix a use after free in __xfrm6_udp_encap_rcv.

6) Fix the layer 4 flowi decoding.
   From Florian Westphal.

* tag 'ipsec-next-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: policy: fix layer 4 flowi decoding
  xfrm Fix use after free in __xfrm6_udp_encap_rcv.
  xfrm: policy: replace session decode with flow dissector
  xfrm: move mark and oif flowi decode into common code
  xfrm: pass struct net to xfrm_decode_session wrappers
  xfrm: Support GRO for IPv6 ESP in UDP encapsulation
  xfrm: Support GRO for IPv4 ESP in UDP encapsulation
  xfrm: Use the XFRM_GRO to indicate a GRO call on input
  xfrm: Annotate struct xfrm_sec_ctx with __counted_by
  xfrm: Remove unused function declarations
====================

Link: https://lore.kernel.org/r/20231028084328.3119236-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-30 14:36:57 -07:00
Florian Westphal eefed7662f xfrm: policy: fix layer 4 flowi decoding
The commit shipped with two bugs:
 fl4->fl4_icmp_type = flkeys->icmp.type;
 fl4->fl4_icmp_type = flkeys->icmp.code;
               ~~~~ should have been "code".

But the more severe bug is that I got fooled by flowi member defines:
fl4_icmp_type, fl4_gre_key and fl4_dport share the same union/address.

Fix typo and make gre/icmp key setting depend on the l4 protocol.

Fixes: 7a0207094f ("xfrm: policy: replace session decode with flow dissector")
Reported-and-tested-by: Antony Antony <antony@phenome.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-10-27 10:12:09 +02:00
Jakub Kicinski 041c3466f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

net/mac80211/key.c
  02e0e426a2 ("wifi: mac80211: fix error path key leak")
  2a8b665e6b ("wifi: mac80211: remove key_mtx")
  7d6904bf26 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/ti/Kconfig
  a602ee3176 ("net: ethernet: ti: Fix mixed module-builtin object")
  98bdeae950 ("net: cpmac: remove driver to prepare for platform removal")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 13:29:01 -07:00
Eric Dumazet de5724ca38 xfrm: fix a data-race in xfrm_lookup_with_ifid()
syzbot complains about a race in xfrm_lookup_with_ifid() [1]

When preparing commit 0a9e5794b2 ("xfrm: annotate data-race
around use_time") I thought xfrm_lookup_with_ifid() was modifying
a still private structure.

[1]
BUG: KCSAN: data-race in xfrm_lookup_with_ifid / xfrm_lookup_with_ifid

write to 0xffff88813ea41108 of 8 bytes by task 8150 on cpu 1:
xfrm_lookup_with_ifid+0xce7/0x12d0 net/xfrm/xfrm_policy.c:3218
xfrm_lookup net/xfrm/xfrm_policy.c:3270 [inline]
xfrm_lookup_route+0x3b/0x100 net/xfrm/xfrm_policy.c:3281
ip6_dst_lookup_flow+0x98/0xc0 net/ipv6/ip6_output.c:1246
send6+0x241/0x3c0 drivers/net/wireguard/socket.c:139
wg_socket_send_skb_to_peer+0xbd/0x130 drivers/net/wireguard/socket.c:178
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

write to 0xffff88813ea41108 of 8 bytes by task 15867 on cpu 0:
xfrm_lookup_with_ifid+0xce7/0x12d0 net/xfrm/xfrm_policy.c:3218
xfrm_lookup net/xfrm/xfrm_policy.c:3270 [inline]
xfrm_lookup_route+0x3b/0x100 net/xfrm/xfrm_policy.c:3281
ip6_dst_lookup_flow+0x98/0xc0 net/ipv6/ip6_output.c:1246
send6+0x241/0x3c0 drivers/net/wireguard/socket.c:139
wg_socket_send_skb_to_peer+0xbd/0x130 drivers/net/wireguard/socket.c:178
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

value changed: 0x00000000651cd9d1 -> 0x00000000651cd9d2

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 15867 Comm: kworker/u4:58 Not tainted 6.6.0-rc4-syzkaller-00016-g5e62ed3b1c8a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
Workqueue: wg-kex-wg2 wg_packet_handshake_send_worker

Fixes: 0a9e5794b2 ("xfrm: annotate data-race around use_time")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-10-13 07:57:27 +02:00
Florian Westphal 7a0207094f xfrm: policy: replace session decode with flow dissector
xfrm needs to populate ipv4/v6 flow struct for route lookup.
In the past there were several bugs in this code:

1. callers that forget to reload header pointers after
   xfrm_decode_session() (it may pull headers).
2. bugs in decoding where accesses past skb->data occurred.

Meanwhile network core gained a packet dissector as well.
This switches xfrm to the flow dissector.

Changes since RFC:
Drop ipv6 mobiliy header support, AFAIU noone uses this.

Drop extraction of flowlabel, replaced code doesn't set it either.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/netdev/20230908120628.26164-3-fw@strlen.de/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-10-06 08:35:12 +02:00
Florian Westphal 45f87dd6b3 xfrm: move mark and oif flowi decode into common code
flowi4_oif/flowi6_oif and mark are aliased to flowi_common field, i.e.
all can be used interchangeably.

Instead of duplicating place this in common code.
No functional changes intended.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-10-06 08:33:47 +02:00
Florian Westphal 2b1dc6285c xfrm: pass struct net to xfrm_decode_session wrappers
Preparation patch, extra arg is not used.
No functional changes intended.

This is needed to replace the xfrm session decode functions with
the flow dissector.

skb_flow_dissect() cannot be used as-is, because it attempts to deduce the
'struct net' to use for bpf program fetch from skb->sk or skb->dev, but
xfrm code path can see skbs that have neither sk or dev filled in.

So either flow dissector needs to try harder, e.g. by also trying
skb->dst->dev, or we have to pass the struct net explicitly.

Passing the struct net doesn't look too bad to me, most places
already have it available or can derive it from the output device.

Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/netdev/202309271628.27fd2187-oliver.sang@intel.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-10-06 08:31:53 +02:00
Eric Dumazet 3e4bc23926 xfrm: fix a data-race in xfrm_gen_index()
xfrm_gen_index() mutual exclusion uses net->xfrm.xfrm_policy_lock.

This means we must use a per-netns idx_generator variable,
instead of a static one.
Alternative would be to use an atomic variable.

syzbot reported:

BUG: KCSAN: data-race in xfrm_sk_policy_insert / xfrm_sk_policy_insert

write to 0xffffffff87005938 of 4 bytes by task 29466 on cpu 0:
xfrm_gen_index net/xfrm/xfrm_policy.c:1385 [inline]
xfrm_sk_policy_insert+0x262/0x640 net/xfrm/xfrm_policy.c:2347
xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639
do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943
ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012
rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054
sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697
__sys_setsockopt+0x1c9/0x230 net/socket.c:2263
__do_sys_setsockopt net/socket.c:2274 [inline]
__se_sys_setsockopt net/socket.c:2271 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2271
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffffffff87005938 of 4 bytes by task 29460 on cpu 1:
xfrm_sk_policy_insert+0x13e/0x640
xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639
do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943
ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012
rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054
sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697
__sys_setsockopt+0x1c9/0x230 net/socket.c:2263
__do_sys_setsockopt net/socket.c:2274 [inline]
__se_sys_setsockopt net/socket.c:2271 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2271
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00006ad8 -> 0x00006b18

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 29460 Comm: syz-executor.1 Not tainted 6.5.0-rc5-syzkaller-00243-g9106536c1aa3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023

Fixes: 1121994c80 ("netns xfrm: policy insertion in netns")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-09-13 09:20:28 +02:00
Zhengchao Shao 762c8dc7f2 net: dst: remove unnecessary input parameter in dst_alloc and dst_init
Since commit 1202cdd66531("Remove DECnet support from kernel") has been
merged, all callers pass in the initial_ref value of 1 when they call
dst_alloc(). Therefore, remove initial_ref when the dst_alloc() is
declared and replace initial_ref with 1 in dst_alloc().
Also when all callers call dst_init(), the value of initial_ref is 1.
Therefore, remove the input parameter initial_ref of the dst_init() and
replace initial_ref with the value 1 in dst_init.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20230911125045.346390-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 11:42:25 +02:00
Dong Chenchen 6d41d4fe28 net: xfrm: skip policies marked as dead while reinserting policies
BUG: KASAN: slab-use-after-free in xfrm_policy_inexact_list_reinsert+0xb6/0x430
Read of size 1 at addr ffff8881051f3bf8 by task ip/668

CPU: 2 PID: 668 Comm: ip Not tainted 6.5.0-rc5-00182-g25aa0bebba72-dirty #64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x72/0xa0
 print_report+0xd0/0x620
 kasan_report+0xb6/0xf0
 xfrm_policy_inexact_list_reinsert+0xb6/0x430
 xfrm_policy_inexact_insert_node.constprop.0+0x537/0x800
 xfrm_policy_inexact_alloc_chain+0x23f/0x320
 xfrm_policy_inexact_insert+0x6b/0x590
 xfrm_policy_insert+0x3b1/0x480
 xfrm_add_policy+0x23c/0x3c0
 xfrm_user_rcv_msg+0x2d0/0x510
 netlink_rcv_skb+0x10d/0x2d0
 xfrm_netlink_rcv+0x49/0x60
 netlink_unicast+0x3fe/0x540
 netlink_sendmsg+0x528/0x970
 sock_sendmsg+0x14a/0x160
 ____sys_sendmsg+0x4fc/0x580
 ___sys_sendmsg+0xef/0x160
 __sys_sendmsg+0xf7/0x1b0
 do_syscall_64+0x3f/0x90
 entry_SYSCALL_64_after_hwframe+0x73/0xdd

The root cause is:

cpu 0			cpu1
xfrm_dump_policy
xfrm_policy_walk
list_move_tail
			xfrm_add_policy
			... ...
			xfrm_policy_inexact_list_reinsert
			list_for_each_entry_reverse
				if (!policy->bydst_reinsert)
				//read non-existent policy
xfrm_dump_policy_done
xfrm_policy_walk_done
list_del(&walk->walk.all);

If dump_one_policy() returns err (triggered by netlink socket),
xfrm_policy_walk() will move walk initialized by socket to list
net->xfrm.policy_all. so this socket becomes visible in the global
policy list. The head *walk can be traversed when users add policies
with different prefixlen and trigger xfrm_policy node merge.

The issue can also be triggered by policy list traversal while rehashing
and flushing policies.

It can be fixed by skip such "policies" with walk.dead set to 1.

Fixes: 9cf545ebd5 ("xfrm: policy: store inexact policies in a tree ordered by destination address")
Fixes: 12a169e7d8 ("ipsec: Put dumpers on the dump list")
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-09-05 11:38:14 +02:00
Eric Dumazet 3c5b4d69c3 net: annotate data-races around sk->sk_mark
sk->sk_mark is often read while another thread could change the value.

Fixes: 4a19ec5800 ("[NET]: Introducing socket mark socket option.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Leon Romanovsky bf06fcf4be xfrm: add missed call to delete offloaded policies
Offloaded policies are deleted through two flows: netdev is going
down and policy flush.

In both cases, the code lacks relevant call to delete offloaded policy.

Fixes: 919e43fad5 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-07 09:58:48 +02:00
Benedict Wong 1f8b6df6a9 xfrm: Treat already-verified secpath entries as optional
This change allows inbound traffic through nested IPsec tunnels to
successfully match policies and templates, while retaining the secpath
stack trace as necessary for netfilter policies.

Specifically, this patch marks secpath entries that have already matched
against a relevant policy as having been verified, allowing it to be
treated as optional and skipped after a tunnel decapsulation (during
which the src/dst/proto/etc may have changed, and the correct policy
chain no long be resolvable).

This approach is taken as opposed to the iteration in b0355dbbf1,
where the secpath was cleared, since that breaks subsequent validations
that rely on the existence of the secpath entries (netfilter policies, or
transport-in-tunnel mode, where policies remain resolvable).

Fixes: b0355dbbf1 ("Fix XFRM-I support for nested ESP tunnels")
Test: Tested against Android Kernel Unit Tests
Test: Tested against Android CTS
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-05-21 09:21:37 +02:00
Benedict Wong 8680407b6f xfrm: Check if_id in inbound policy/secpath match
This change ensures that if configured in the policy, the if_id set in
the policy and secpath states match during the inbound policy check.
Without this, there is potential for ambiguity where entries in the
secpath differing by only the if_id could be mismatched.

Notably, this is checked in the outbound direction when resolving
templates to SAs, but not on the inbound path when matching SAs and
policies.

Test: Tested against Android kernel unit tests & CTS
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-05-10 07:56:05 +02:00
Martin Willi 5fc46f9421 Revert "Fix XFRM-I support for nested ESP tunnels"
This reverts commit b0355dbbf1.

The reverted commit clears the secpath on packets received via xfrm interfaces
to support nested IPsec tunnels. This breaks Netfilter policy matching using
xt_policy in the FORWARD chain, as the secpath is missing during forwarding.
Additionally, Benedict Wong reports that it breaks Transport-in-Tunnel mode.

Fix this regression by reverting the commit until we have a better approach
for nested IPsec tunnels.

Fixes: b0355dbbf1 ("Fix XFRM-I support for nested ESP tunnels")
Link: https://lore.kernel.org/netdev/20230412085615.124791-1-martin@strongswan.org/
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-04-25 09:50:34 +02:00
Sabrina Dubroca 430cac4874 xfrm: don't check the default policy if the policy allows the packet
The current code doesn't let a simple "allow" policy counteract a
default policy blocking all incoming packets:

    ip x p setdefault in block
    ip x p a src 192.168.2.1/32 dst 192.168.2.2/32 dir in action allow

At this stage, we have an allow policy (with or without transforms)
for this packet. It doesn't matter what the default policy says, since
the policy we looked up lets the packet through. The case of a
blocking policy is already handled separately, so we can remove this
check.

Fixes: 2d151d3907 ("xfrm: Add possibility to set the default to block if we have no policy")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-04-06 12:04:31 +02:00
Eric Dumazet 0a9e5794b2 xfrm: annotate data-race around use_time
KCSAN reported multiple cpus can update use_time
at the same time.

Adds READ_ONCE()/WRITE_ONCE() annotations.

Note that 32bit arches are not fully protected,
but they will probably no longer be supported/used in 2106.

BUG: KCSAN: data-race in __xfrm_policy_check / __xfrm_policy_check

write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 0:
__xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664
__xfrm_policy_check2 include/net/xfrm.h:1174 [inline]
xfrm_policy_check include/net/xfrm.h:1179 [inline]
xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189
udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703
udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792
udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935
__udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020
udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133
ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439
ip6_input_finish net/ipv6/ip6_input.c:484 [inline]
NF_HOOK include/linux/netfilter.h:302 [inline]
ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:454 [inline]
ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:302 [inline]
ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core net/core/dev.c:5482 [inline]
__netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596
process_backlog+0x23f/0x3b0 net/core/dev.c:5924
__napi_poll+0x65/0x390 net/core/dev.c:6485
napi_poll net/core/dev.c:6552 [inline]
net_rx_action+0x37e/0x730 net/core/dev.c:6663
__do_softirq+0xf2/0x2c7 kernel/softirq.c:571
do_softirq+0xb1/0xf0 kernel/softirq.c:472
__local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396
__raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline]
_raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284
wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184
wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline]
wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 1:
__xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664
__xfrm_policy_check2 include/net/xfrm.h:1174 [inline]
xfrm_policy_check include/net/xfrm.h:1179 [inline]
xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189
udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703
udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792
udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935
__udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020
udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133
ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439
ip6_input_finish net/ipv6/ip6_input.c:484 [inline]
NF_HOOK include/linux/netfilter.h:302 [inline]
ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:454 [inline]
ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:302 [inline]
ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core net/core/dev.c:5482 [inline]
__netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596
process_backlog+0x23f/0x3b0 net/core/dev.c:5924
__napi_poll+0x65/0x390 net/core/dev.c:6485
napi_poll net/core/dev.c:6552 [inline]
net_rx_action+0x37e/0x730 net/core/dev.c:6663
__do_softirq+0xf2/0x2c7 kernel/softirq.c:571
do_softirq+0xb1/0xf0 kernel/softirq.c:472
__local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396
__raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline]
_raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284
wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184
wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline]
wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

value changed: 0x0000000063c62d6f -> 0x0000000063c62d70

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 4185 Comm: kworker/1:2 Tainted: G W 6.2.0-rc4-syzkaller-00009-gd532dd102151-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: wg-crypt-wg0 wg_packet_tx_worker

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-01-27 10:21:09 +01:00
Benedict Wong b0355dbbf1 Fix XFRM-I support for nested ESP tunnels
This change adds support for nested IPsec tunnels by ensuring that
XFRM-I verifies existing policies before decapsulating a subsequent
policies. Addtionally, this clears the secpath entries after policies
are verified, ensuring that previous tunnels with no-longer-valid
do not pollute subsequent policy checks.

This is necessary especially for nested tunnels, as the IP addresses,
protocol and ports may all change, thus not matching the previous
policies. In order to ensure that packets match the relevant inbound
templates, the xfrm_policy_check should be done before handing off to
the inner XFRM protocol to decrypt and decapsulate.

Notably, raw ESP/AH packets did not perform policy checks inherently,
whereas all other encapsulated packets (UDP, TCP encapsulated) do policy
checks after calling xfrm_input handling in the respective encapsulation
layer.

Test: Verified with additional Android Kernel Unit tests
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-01-09 07:11:05 +01:00
Leon Romanovsky 3c611d40c6 xfrm: speed-up lookup of HW policies
Devices that implement IPsec packet offload mode should offload SA and
policies too. In RX path, it causes to the situation that HW will always
have higher priority over any SW policies.

It means that we don't need to perform any search of inexact policies
and/or priority checks if HW policy was discovered. In such situation,
the HW will catch the packets anyway and HW can still implement inexact
lookups.

In case specific policy is not found, we will continue with packet lookup and
check for existence of HW policies in inexact list.

HW policies are added to the head of SPD to ensure fast lookup, as XFRM
iterates over all policies in the loop.

The same solution of adding HW SAs at the begging of the list is applied
to SA database too. However, we don't need to change lookups as they are
sorted by insertion order and not priority.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05 10:37:33 +01:00
Leon Romanovsky 919e43fad5 xfrm: add an interface to offload policy
Extend netlink interface to add and delete XFRM policy from the device.
This functionality is a first step to implement packet IPsec offload solution.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05 10:33:13 +01:00
Sabrina Dubroca bd12240337 xfrm: add extack to xfrm_do_migrate
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-25 10:11:41 +01:00
Sabrina Dubroca f157c416c5 xfrm: a few coding style clean ups
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-25 10:11:39 +01:00
Leon Romanovsky cc2bbbfd9a xfrm: Remove not-used total variable
Total variable is not used in xfrm_byidx_resize() and can
be safely removed.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-10-25 10:49:28 +02:00
David S. Miller 42e8e6d906 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Refactor selftests to use an array of structs in xfrm_fill_key().
   From Gautam Menghani.

2) Drop an unused argument from xfrm_policy_match.
   From Hongbin Wang.

3) Support collect metadata mode for xfrm interfaces.
   From Eyal Birger.

4) Add netlink extack support to xfrm.
   From Sabrina Dubroca.

Please note, there is a merge conflict in:

include/net/dst_metadata.h

between commit:

0a28bfd497 ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")

from the net-next tree and commit:

5182a5d48c ("net: allow storing xfrm interface metadata in metadata_dst")

from the ipsec-next tree.

Can be solved as done in linux-next.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03 07:52:13 +01:00
Eyal Birger abc340b38b xfrm: interface: support collect metadata mode
This commit adds support for 'collect_md' mode on xfrm interfaces.

Each net can have one collect_md device, created by providing the
IFLA_XFRM_COLLECT_METADATA flag at creation. This device cannot be
altered and has no if_id or link device attributes.

On transmit to this device, the if_id is fetched from the attached dst
metadata on the skb. If exists, the link property is also fetched from
the metadata. The dst metadata type used is METADATA_XFRM which holds
these properties.

On the receive side, xfrmi_rcv_cb() populates a dst metadata for each
packet received and attaches it to the skb. The if_id used in this case is
fetched from the xfrm state, and the link is fetched from the incoming
device. This information can later be used by upper layers such as tc,
ebpf, and ip rules.

Because the skb is scrubed in xfrmi_rcv_cb(), the attachment of the dst
metadata is postponed until after scrubing. Similarly, xfrm_input() is
adapted to avoid dropping metadata dsts by only dropping 'valid'
(skb_valid_dst(skb) == true) dsts.

Policy matching on packets arriving from collect_md xfrmi devices is
done by using the xfrm state existing in the skb's sec_path.
The xfrm_if_cb.decode_cb() interface implemented by xfrmi_decode_session()
is changed to keep the details of the if_id extraction tucked away
in xfrm_interface.c.

Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-29 10:41:28 +02:00
Hongbin Wang 0de1978852 xfrm: Drop unused argument
Drop unused argument from xfrm_policy_match,
__xfrm_policy_eval_candidates and xfrm_policy_eval_candidates.
No functional changes intended.

Signed-off-by: Hongbin Wang <wh_bin@126.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-23 08:12:30 +02:00
Nikolay Aleksandrov 17ecd4a4db xfrm: policy: fix metadata dst->dev xmit null pointer dereference
When we try to transmit an skb with metadata_dst attached (i.e. dst->dev
== NULL) through xfrm interface we can hit a null pointer dereference[1]
in xfrmi_xmit2() -> xfrm_lookup_with_ifid() due to the check for a
loopback skb device when there's no policy which dereferences dst->dev
unconditionally. Not having dst->dev can be interepreted as it not being
a loopback device, so just add a check for a null dst_orig->dev.

With this fix xfrm interface's Tx error counters go up as usual.

[1] net-next calltrace captured via netconsole:
  BUG: kernel NULL pointer dereference, address: 00000000000000c0
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP
  CPU: 1 PID: 7231 Comm: ping Kdump: loaded Not tainted 5.19.0+ #24
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014
  RIP: 0010:xfrm_lookup_with_ifid+0x5eb/0xa60
  Code: 8d 74 24 38 e8 26 a4 37 00 48 89 c1 e9 12 fc ff ff 49 63 ed 41 83 fd be 0f 85 be 01 00 00 41 be ff ff ff ff 45 31 ed 48 8b 03 <f6> 80 c0 00 00 00 08 75 0f 41 80 bc 24 19 0d 00 00 01 0f 84 1e 02
  RSP: 0018:ffffb0db82c679f0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffd0db7fcad430 RCX: ffffb0db82c67a10
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb0db82c67a80
  RBP: ffffb0db82c67a80 R08: ffffb0db82c67a14 R09: 0000000000000000
  R10: 0000000000000000 R11: ffff8fa449667dc8 R12: ffffffff966db880
  R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000
  FS:  00007ff35c83f000(0000) GS:ffff8fa478480000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000000c0 CR3: 000000001ebb7000 CR4: 0000000000350ee0
  Call Trace:
   <TASK>
   xfrmi_xmit+0xde/0x460
   ? tcf_bpf_act+0x13d/0x2a0
   dev_hard_start_xmit+0x72/0x1e0
   __dev_queue_xmit+0x251/0xd30
   ip_finish_output2+0x140/0x550
   ip_push_pending_frames+0x56/0x80
   raw_sendmsg+0x663/0x10a0
   ? try_charge_memcg+0x3fd/0x7a0
   ? __mod_memcg_lruvec_state+0x93/0x110
   ? sock_sendmsg+0x30/0x40
   sock_sendmsg+0x30/0x40
   __sys_sendto+0xeb/0x130
   ? handle_mm_fault+0xae/0x280
   ? do_user_addr_fault+0x1e7/0x680
   ? kvm_read_and_reset_apf_flags+0x3b/0x50
   __x64_sys_sendto+0x20/0x30
   do_syscall_64+0x34/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0
  RIP: 0033:0x7ff35cac1366
  Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
  RSP: 002b:00007fff738e4028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
  RAX: ffffffffffffffda RBX: 00007fff738e57b0 RCX: 00007ff35cac1366
  RDX: 0000000000000040 RSI: 0000557164e4b450 RDI: 0000000000000003
  RBP: 0000557164e4b450 R08: 00007fff738e7a2c R09: 0000000000000010
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
  R13: 00007fff738e5770 R14: 00007fff738e4030 R15: 0000001d00000001
   </TASK>
  Modules linked in: netconsole veth br_netfilter bridge bonding virtio_net [last unloaded: netconsole]
  CR2: 00000000000000c0

CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 2d151d3907 ("xfrm: Add possibility to set the default to block if we have no policy")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-17 11:06:37 +02:00
Xin Xiong 9c9cb23e00 xfrm: fix refcount leak in __xfrm_policy_check()
The issue happens on an error path in __xfrm_policy_check(). When the
fetching process of the object `pols[1]` fails, the function simply
returns 0, forgetting to decrement the reference count of `pols[0]`,
which is incremented earlier by either xfrm_sk_policy_lookup() or
xfrm_policy_lookup(). This may result in memory leaks.

Fix it by decreasing the reference count of `pols[0]` in that path.

Fixes: 134b0fc544 ("IPsec: propagate security module errors up from flow_cache_lookup")
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-07-26 23:08:24 +02:00
Hangyu Hua f85daf0e72 xfrm: xfrm_policy: fix a possible double xfrm_pols_put() in xfrm_bundle_lookup()
xfrm_policy_lookup() will call xfrm_pol_hold_rcu() to get a refcount of
pols[0]. This refcount can be dropped in xfrm_expand_policies() when
xfrm_expand_policies() return error. pols[0]'s refcount is balanced in
here. But xfrm_bundle_lookup() will also call xfrm_pols_put() with
num_pols == 1 to drop this refcount when xfrm_expand_policies() return
error.

This patch also fix an illegal address access. pols[0] will save a error
point when xfrm_policy_lookup fails. This lead to xfrm_pols_put to resolve
an illegal address in xfrm_bundle_lookup's error path.

Fix these by setting num_pols = 0 in xfrm_expand_policies()'s error path.

Fixes: 80c802f307 ("xfrm: cache bundles instead of policies for outgoing flows")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-06-02 11:05:19 +02:00
Xin Long 4d33ab08c0 xfrm: set dst dev to blackhole_netdev instead of loopback_dev in ifdown
The global blackhole_netdev has replaced pernet loopback_dev to become the
one given to the object that holds an netdev when ifdown in many places of
ipv4 and ipv6 since commit 8d7017fd62 ("blackhole_netdev: use
blackhole_netdev to invalidate dst entries").

Especially after commit faab39f63c ("net: allow out-of-order netdev
unregistration"), it's no longer safe to use loopback_dev that may be
freed before other netdev.

This patch is to set dst dev to blackhole_netdev instead of loopback_dev
in ifdown.

v1->v2:
  - add Fixes tag as Eric suggested.

Fixes: faab39f63c ("net: allow out-of-order netdev unregistration")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/e8c87482998ca6fcdab214f5a9d582899ec0c648.1652665047.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-17 11:02:07 +02:00
David Ahern 748b82c23e xfrm: Pass flowi_oif or l3mdev as oif to xfrm_dst_lookup
The commit referenced in the Fixes tag no longer changes the
flow oif to the l3mdev ifindex. A xfrm use case was expecting
the flowi_oif to be the VRF if relevant and the change broke
that test. Update xfrm_bundle_create to pass oif if set and any
potential flowi_l3mdev if oif is not set.

Fixes: 40867d74c3 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-04-04 13:32:56 +02:00
David S. Miller 62f65554f5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/
ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2022-03-19

1) Delete duplicated functions that calls same xfrm_api_check.
   From Leon Romanovsky.

2) Align userland API of the default policy structure to the
   internal structures. From Nicolas Dichtel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-19 14:49:08 +00:00
Nicolas Dichtel b58b1f563a xfrm: rework default policy structure
This is a follow up of commit f8d858e607 ("xfrm: make user policy API
complete"). The goal is to align userland API to the internal structures.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by:  Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-18 07:23:12 +01:00
Yan Yan c1aca3080e xfrm: Check if_id in xfrm_migrate
This patch enables distinguishing SAs and SPs based on if_id during
the xfrm_migrate flow. This ensures support for xfrm interfaces
throughout the SA/SP lifecycle.

When there are multiple existing SPs with the same direction,
the same xfrm_selector and different endpoint addresses,
xfrm_migrate might fail with ENODATA.

Specifically, the code path for performing xfrm_migrate is:
  Stage 1: find policy to migrate with
    xfrm_migrate_policy_find(sel, dir, type, net)
  Stage 2: find and update state(s) with
    xfrm_migrate_state_find(mp, net)
  Stage 3: update endpoint address(es) of template(s) with
    xfrm_policy_migrate(pol, m, num_migrate)

Currently "Stage 1" always returns the first xfrm_policy that
matches, and "Stage 3" looks for the xfrm_tmpl that matches the
old endpoint address. Thus if there are multiple xfrm_policy
with same selector, direction, type and net, "Stage 1" might
rertun a wrong xfrm_policy and "Stage 3" will fail with ENODATA
because it cannot find a xfrm_tmpl with the matching endpoint
address.

The fix is to allow userspace to pass an if_id and add if_id
to the matching rule in Stage 1 and Stage 2 since if_id is a
unique ID for xfrm_policy and xfrm_state. For compatibility,
if_id will only be checked if the attribute is set.

Tested with additions to Android's kernel unit test suite:
https://android-review.googlesource.com/c/kernel/tests/+/1668886

Signed-off-by: Yan Yan <evitayan@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-01-26 07:44:01 +01:00
Guillaume Nault 23e7b1bfed xfrm: Don't accidentally set RTO_ONLINK in decode_session4()
Similar to commit 94e2238969 ("xfrm4: strip ECN bits from tos field"),
clear the ECN bits from iph->tos when setting ->flowi4_tos.
This ensures that the last bit of ->flowi4_tos is cleared, so
ip_route_output_key_hash() isn't going to restrict the scope of the
route lookup.

Use ~INET_ECN_MASK instead of IPTOS_RT_MASK, because we have no reason
to clear the high order bits.

Found by code inspection, compile tested only.

Fixes: 4da3089f2b ("[IPSEC]: Use TOS when doing tunnel lookups")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-11 20:36:08 -08:00
Jakub Kicinski 8aaaf2f3af Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in fixes directly in prep for the 5.17 merge window.
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09 17:00:17 -08:00
Jean Sacren ac1077e928 net: xfrm: drop check of pols[0] for the second time
!pols[0] is checked earlier.  If we don't return, pols[0] is always
true.  We should drop the check of pols[0] for the second time and the
binary is also smaller.

Before:
   text	   data	    bss	    dec	    hex	filename
  48395	    957	    240	  49592	   c1b8	net/xfrm/xfrm_policy.o

After:
   text	   data	    bss	    dec	    hex	filename
  48379	    957	    240	  49576	   c1a8	net/xfrm/xfrm_policy.o

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-12-01 09:23:01 +01:00
Ghalem Boudour bcf141b2eb xfrm: fix policy lookup for ipv6 gre packets
On egress side, xfrm lookup is called from __gre6_xmit() with the
fl6_gre_key field not initialized leading to policies selectors check
failure. Consequently, gre packets are sent without encryption.

On ingress side, INET6_PROTO_NOPOLICY was set, thus packets were not
checked against xfrm policies. Like for egress side, fl6_gre_key should be
correctly set, this is now done in decode_session6().

Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Cc: stable@vger.kernel.org
Signed-off-by: Ghalem Boudour <ghalem.boudour@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-11-23 10:12:21 +01:00