Now that the shash algorithm type does not support nonzero alignmasks,
crypto_shash_alignmask() always returns 0 and will be removed. In
preparation for this, stop checking crypto_shash_alignmask() in
net/ceph/messenger_v2.c.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Select CRYPTO_AEAD so that crypto_has_aead is available.
Fixes: 1383e2ab102c ("ipsec: Stop using crypto_has_alg")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309202112.33V1Ezb1-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stop using the obsolete crypto_has_alg helper that is type-agnostic.
Instead use the type-specific helpers such as the newly added
crypto_has_aead.
This means that changes in the underlying type/mask values won't
affect IPsec.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The header file crypto/algapi.h is for internal use only. Use the
header file crypto/utils.h instead.
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The header file crypto/algapi.h is for internal use only. Use the
header file crypto/utils.h instead.
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The header file crypto/algapi.h is for internal use only. Use the
header file crypto/utils.h instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The header file crypto/algapi.h is for internal use only. Use the
header file crypto/utils.h instead.
Acked-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Current release - regressions:
- eth: stmmac: fix failure to probe without MAC interface specified
Current release - new code bugs:
- docs: netlink: fix missing classic_netlink doc reference
Previous releases - regressions:
- deal with integer overflows in kmalloc_reserve()
- use sk_forward_alloc_get() in sk_get_meminfo()
- bpf_sk_storage: fix the missing uncharge in sk_omem_alloc
- fib: avoid warn splat in flow dissector after packet mangling
- skb_segment: call zero copy functions before using skbuff frags
- eth: sfc: check for zero length in EF10 RX prefix
Previous releases - always broken:
- af_unix: fix msg_controllen test in scm_pidfd_recv() for
MSG_CMSG_COMPAT
- xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()
- netfilter:
- nft_exthdr: fix non-linear header modification
- xt_u32, xt_sctp: validate user space input
- nftables: exthdr: fix 4-byte stack OOB write
- nfnetlink_osf: avoid OOB read
- one more fix for the garbage collection work from last release
- igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
- bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t
- handshake: fix null-deref in handshake_nl_done_doit()
- ip: ignore dst hint for multipath routes to ensure packets
are hashed across the nexthops
- phy: micrel:
- correct bit assignments for cable test errata
- disable EEE according to the KSZ9477 errata
Misc:
- docs/bpf: document compile-once-run-everywhere (CO-RE) relocations
- Revert "net: macsec: preserve ingress frame ordering", it appears
to have been developed against an older kernel, problem doesn't
exist upstream
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmT6R6wACgkQMUZtbf5S
IrsmTg//TgmRjxSZ0lrPQtJwZR/eN3ZR2oQG3rwnssCx+YgHEGGxQsfT4KHEMacR
ZgGDZVTpthUJkkACBPi8ZMoy++RdjEmlCcanfeDkGHoYGtiX1lhkofhLMn1KUHbI
rIbP9EdNKxQT0SsBlw/U28pD5jKyqOgL23QobEwmcjLTdMpamb+qIsD6/xNv9tEj
Tu4BdCIkhjxnBD622hsE3pFTG7oSn2WM6rf5NT1E43mJ3W8RrMcydSB27J7Oryo9
l3nYMAhz0vQINS2WQ9eCT1/7GI6gg1nDtxFtrnV7ASvxayRBPIUr4kg1vT+Tixsz
CZMnwVamEBIYl9agmj7vSji7d5nOUgXPhtWhwWUM2tRoGdeGw3vSi1pgDvRiUCHE
PJ4UHv7goa2AgnOlOQCFtRybAu+9nmSGm7V+GkeGLnH7xbFsEa5smQ/+FSPJs8Dn
Yf4q5QAhdN8tdnofRlrN/nCssoDF3cfmBsTJ7wo5h71gW+BWhsP58eDCJlXd/r8k
+Qnvoe2kw27ktFR1tjsUDZ0AcSmeVARNwmXCOBYZsG4tEek8pLyj008mDvJvdfyn
PGPn7Eo5DyaERlHVmPuebHXSyniDEPe2GLTmlHcGiRpGspoUHbB+HRiDAuRLMB9g
pkL8RHpNfppnuUXeUoNy3rgEkYwlpTjZX0QHC6N8NQ76ccB6CNM=
=YpmE
-----END PGP SIGNATURE-----
Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking updates from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- eth: stmmac: fix failure to probe without MAC interface specified
Current release - new code bugs:
- docs: netlink: fix missing classic_netlink doc reference
Previous releases - regressions:
- deal with integer overflows in kmalloc_reserve()
- use sk_forward_alloc_get() in sk_get_meminfo()
- bpf_sk_storage: fix the missing uncharge in sk_omem_alloc
- fib: avoid warn splat in flow dissector after packet mangling
- skb_segment: call zero copy functions before using skbuff frags
- eth: sfc: check for zero length in EF10 RX prefix
Previous releases - always broken:
- af_unix: fix msg_controllen test in scm_pidfd_recv() for
MSG_CMSG_COMPAT
- xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()
- netfilter:
- nft_exthdr: fix non-linear header modification
- xt_u32, xt_sctp: validate user space input
- nftables: exthdr: fix 4-byte stack OOB write
- nfnetlink_osf: avoid OOB read
- one more fix for the garbage collection work from last release
- igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
- bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t
- handshake: fix null-deref in handshake_nl_done_doit()
- ip: ignore dst hint for multipath routes to ensure packets are
hashed across the nexthops
- phy: micrel:
- correct bit assignments for cable test errata
- disable EEE according to the KSZ9477 errata
Misc:
- docs/bpf: document compile-once-run-everywhere (CO-RE) relocations
- Revert "net: macsec: preserve ingress frame ordering", it appears
to have been developed against an older kernel, problem doesn't
exist upstream"
* tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
Revert "net: team: do not use dynamic lockdep key"
net: hns3: remove GSO partial feature bit
net: hns3: fix the port information display when sfp is absent
net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
net: hns3: fix debugfs concurrency issue between kfree buffer and read
net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
net: hns3: Support query tx timeout threshold by debugfs
net: hns3: fix tx timeout issue
net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
s390/bpf: Pass through tail call counter in trampolines
...
-----BEGIN PGP SIGNATURE-----
iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmT4pk0NHGZ3QHN0cmxl
bi5kZQAKCRBwkajZrV/2AFILEADUXdj4PWEZv0/1avh+cT14q4gM0KkNi0eWDePA
kRRTvuVZd6CkAxMNW9EAa6koJdK7k7dq2Jax9MYgzvvkFBO5PH5HHvwQRjs+Ch4W
WCckV83J0W2eTcPqSxTM6wbSL8p1a0rJSHVSG7N431dPQUJoMnmd4zW5q1P9qTkv
gqE4iQoK2jGL01GOyuQLnh6fEH2TUviguXZPKxVymoy/pbBiDA1mAMQplyAJisAz
c1lAb+/oW58xYpxTOCtlU3kqB4tPonUow8x8KkYrlZlrA8rxgr2WkVeg82eVXP9u
X8DM/nSzxCY7aN3TVrS2m8dS1cK1Gj841atuPA2N9ot/R954h7VjxhvhW1IEpmY9
P6EYvRrLtCZrif4cmbRY4n6UztFNo3FJeBgmM3pgiVQhCXm+1fpNnSTm3QvZY1YO
6o5rQn7F7IiIzkBeQ5sPihQPvLqn+J/61ZyQWAdP/TAG+roC0ARPeEFjyBoRHttI
uw4PcXuzrMjBzym816h8kAjJbVIygpZFIpA/QllN8d3rB96mWpFCILIBG9FnA91K
Ie6IyRGjDKfe5TuJplSIfCdBrvOPqUjc+l7XAPz9QXL8W3MQnYZshHRs5ViSOaca
iMhMScBYT31uBZ7KODYxgPcifr0EWSYIGceWnSCDOmRz+VJk1LSfvZTwvqOB+qZ3
x01bTw==
=Z3CA
-----END PGP SIGNATURE-----
Merge tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
This PR contains nf_tables updates for your *net* tree.
This time almost all fixes are for old bugs:
First patch fixes a 4-byte stack OOB write, from myself.
This was broken ever since nftables was switches from 128 to 32bit
register addressing in v4.1.
2nd patch fixes an out-of-bounds read.
This has been broken ever since xt_osf got added in 2.6.31, the bug
was then just moved around during refactoring, from Wander Lairson Costa.
3rd patch adds a missing enum description, from Phil Sutter.
4th patch fixes a UaF inftables that occurs when userspace adds
elements with a timeout so small that expiration happens while the
transaction is still in progress. Fix from Pablo Neira Ayuso.
Patch 5 fixes a memory out of bounds access, this was
broken since v4.20. Patch from Kyle Zeng and Jozsef Kadlecsik.
Patch 6 fixes another bogus memory access when building audit
record. Bug added in the previous pull request, fix from Pablo.
netfilter pull request 2023-09-06
* tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
====================
Link: https://lore.kernel.org/r/20230906162525.11079-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZPhEYwAKCRDbK58LschI
g6puAQCGLE9S2d5xw68Fq2KLrTjTXC5trB9N25NWvZWqgXHAeQEA3wXxTC8eF0dG
J24RwZC2vOC4hF3aNGeI0fv/6j7SKA0=
=0IU4
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-09-06
We've added 9 non-merge commits during the last 6 day(s) which contain
a total of 12 files changed, 189 insertions(+), 44 deletions(-).
The main changes are:
1) Fix bpf_sk_storage to address an invalid wait context lockdep
report and another one to address missing omem uncharge,
from Martin KaFai Lau.
2) Two BPF recursion detection related fixes,
from Sebastian Andrzej Siewior.
3) Fix tailcall limit enforcement in trampolines for s390 JIT,
from Ilya Leoshkevich.
4) Fix a sockmap refcount race where skbs in sk_psock_backlog can
be referenced after user space side has already skb_consumed them,
from John Fastabend.
5) Fix BPF CI flake/race wrt sockmap vsock write test where
the transport endpoint is not connected, from Xu Kuohai.
6) Follow-up doc fix to address a cross-link warning,
from Eduard Zingerman.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
s390/bpf: Pass through tail call counter in trampolines
bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().
bpf, sockmap: Fix skb refcnt race after locking changes
docs/bpf: Fix "file doesn't exist" warnings in {llvm_reloc,btf}.rst
selftests/bpf: Fix a CI failure caused by vsock write
====================
Link: https://lore.kernel.org/r/20230906095117.16941-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
fscrypt support to CephFS! The list of things which don't work with
encryption should be fairly short, mostly around the edges: fallocate
(not supported well in CephFS to begin with), copy_file_range (requires
re-encryption), non-default striping patterns.
This was a multi-year effort principally by Jeff Layton with assistance
from Xiubo Li, Luís Henriques and others, including several dependant
changes in the MDS, netfs helper library and fscrypt framework itself.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmT4pl4THGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi5kzB/4sMgzZyUa3T1vA/G2pPvEkyy1qDxsW
y+o4dDMWA9twcrBVpNuGd54wbXpmO/LAekHEdorjayH+f0zf10MsnP1ePz9WB3NG
jr7RRujb+Gpd2OFYJXGSEbd3faTg8M2kpGCCrVe7SFNoyu8z9NwFItwWMog5aBjX
ODGQrq+kA4ARA6xIqwzF5gP0zr+baT9rWhQdm7Xo9itWdosnbyDLJx1dpEfLuqBX
te3SmifDzedn3Gw73hdNo/+ybw0kHARoK+RmXCTsoDDQw+JsoO9KxZF5Q8QcDELq
2woPNp0Hl+Dm4MkzGnPxv56Qj8ZDViS59syXC0CfGRmu4nzF1Rw+0qn5
=/WlE
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"Mixed with some fixes and cleanups, this brings in reasonably complete
fscrypt support to CephFS! The list of things which don't work with
encryption should be fairly short, mostly around the edges: fallocate
(not supported well in CephFS to begin with), copy_file_range
(requires re-encryption), non-default striping patterns.
This was a multi-year effort principally by Jeff Layton with
assistance from Xiubo Li, Luís Henriques and others, including several
dependant changes in the MDS, netfs helper library and fscrypt
framework itself"
* tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits)
ceph: make num_fwd and num_retry to __u32
ceph: make members in struct ceph_mds_request_args_ext a union
rbd: use list_for_each_entry() helper
libceph: do not include crypto/algapi.h
ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper
ceph: fix updating i_truncate_pagecache_size for fscrypt
ceph: wait for OSD requests' callbacks to finish when unmounting
ceph: drop messages from MDS when unmounting
ceph: update documentation regarding snapshot naming limitations
ceph: prevent snapshot creation in encrypted locked directories
ceph: add support for encrypted snapshot names
ceph: invalidate pages when doing direct/sync writes
ceph: plumb in decryption during reads
ceph: add encryption support to writepage and writepages
ceph: add read/modify/write to ceph_sync_write
ceph: align data in pages in ceph_sync_write
ceph: don't use special DIO path for encrypted inodes
ceph: add truncate size handling support for fscrypt
ceph: add object version support for sync read
libceph: allow ceph_osdc_new_request to accept a multi-op read
...
The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can
lead to the use of wrong `CIDR_POS(c)` for calculating array offsets,
which can lead to integer underflow. As a result, it leads to slab
out-of-bound access.
This patch adds back the IP_SET_HASH_WITH_NET0 macro to
ip_set_hash_netportnet to address the issue.
Fixes: 886503f34d ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net")
Suggested-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
New elements in this transaction might expired before such transaction
ends. Skip sync GC for such elements otherwise commit path might walk
over an already released object. Once transaction is finished, async GC
will collect such expired element.
Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
The opt_num field is controlled by user mode and is not currently
validated inside the kernel. An attacker can take advantage of this to
trigger an OOB read and potentially leak information.
BUG: KASAN: slab-out-of-bounds in nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
Read of size 2 at addr ffff88804bc64272 by task poc/6431
CPU: 1 PID: 6431 Comm: poc Not tainted 6.0.0-rc4 #1
Call Trace:
nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
nf_osf_find+0x186/0x2f0 net/netfilter/nfnetlink_osf.c:281
nft_osf_eval+0x37f/0x590 net/netfilter/nft_osf.c:47
expr_call_ops_eval net/netfilter/nf_tables_core.c:214
nft_do_chain+0x2b0/0x1490 net/netfilter/nf_tables_core.c:264
nft_do_chain_ipv4+0x17c/0x1f0 net/netfilter/nft_chain_filter.c:23
[..]
Also add validation to genre, subtype and version fields.
Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Reported-by: Lucas Leong <wmliang@infosec.exchange>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
If priv->len is a multiple of 4, then dst[len / 4] can write past
the destination array which leads to stack corruption.
This construct is necessary to clean the remainder of the register
in case ->len is NOT a multiple of the register size, so make it
conditional just like nft_payload.c does.
The bug was added in 4.1 cycle and then copied/inherited when
tcp/sctp and ip option support was added.
Bug reported by Zero Day Initiative project (ZDI-CAN-21950,
ZDI-CAN-21951, ZDI-CAN-21961).
Fixes: 49499c3e6e ("netfilter: nf_tables: switch registers to 32 bit addressing")
Fixes: 935b7f6430 ("netfilter: nft_exthdr: add TCP option matching")
Fixes: 133dc203d7 ("netfilter: nft_exthdr: Support SCTP chunks")
Fixes: dbb5281a1f ("netfilter: nf_tables: add support for matching IPv4 options")
Signed-off-by: Florian Westphal <fw@strlen.de>
__skb_get_hash_symmetric() was added to compute a symmetric hash over
the protocol, addresses and transport ports, by commit eb70db8756
("packet: Use symmetric hash for PACKET_FANOUT_HASH."). It uses
flow_keys_dissector_symmetric_keys as the flow_dissector to incorporate
IPv4 addresses, IPv6 addresses and ports. However, it should not specify
the flag as FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, which stops further
dissection when an IPv6 flow label is encountered, making transport
ports not being incorporated in such case.
As a consequence, the symmetric hash is based on 5-tuple for IPv4 but
3-tuple for IPv6 when flow label is present. It caused a few problems,
e.g. when nft symhash and openvswitch l4_sym rely on the symmetric hash
to perform load balancing as different L4 flows between two given IPv6
addresses would always get the same symmetric hash, leading to uneven
traffic distribution.
Removing the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL makes sure the
symmetric hash is based on 5-tuple for both IPv4 and IPv6 consistently.
Fixes: eb70db8756 ("packet: Use symmetric hash for PACKET_FANOUT_HASH.")
Reported-by: Lars Ekman <uablrek@gmail.com>
Closes: https://github.com/antrea-io/antrea/issues/5457
Signed-off-by: Quan Tian <qtian@vmware.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a follow up of commit 915d975b2f ("net: deal with integer
overflows in kmalloc_reserve()") based on David Laight feedback.
Back in 2010, I failed to realize malicious users could set dev->mtu
to arbitrary values. This mtu has been since limited to 0x7fffffff but
regardless of how big dev->mtu is, it makes no sense for igmpv3_newpack()
to allocate more than IP_MAX_MTU and risk various skb fields overflows.
Fixes: 57e1ab6ead ("igmp: refine skb allocations")
Link: https://lore.kernel.org/netdev/d273628df80f45428e739274ab9ecb72@AcuMS.aculab.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Cc: Kyle Zeng <zengyhkyle@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
kcm_exit_net() should call mutex_destroy() on knet->mutex. This is especially
needed if CONFIG_DEBUG_MUTEXES is enabled.
Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://lore.kernel.org/r/20230902170708.1727999-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As with sk->sk_shutdown shown in the previous patch, sk->sk_err can be
read locklessly by unix_dgram_sendmsg().
Let's use READ_ONCE() for sk_err as well.
Note that the writer side is marked by commit cc04410af7 ("af_unix:
annotate lockless accesses to sk->sk_err").
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race where skb's from the sk_psock_backlog can be referenced
after userspace side has already skb_consumed() the sk_buff and its refcnt
dropped to zer0 causing use after free.
The flow is the following:
while ((skb = skb_peek(&psock->ingress_skb))
sk_psock_handle_Skb(psock, skb, ..., ingress)
if (!ingress) ...
sk_psock_skb_ingress
sk_psock_skb_ingress_enqueue(skb)
msg->skb = skb
sk_psock_queue_msg(psock, msg)
skb_dequeue(&psock->ingress_skb)
The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is
what the application reads when recvmsg() is called. An application can
read this anytime after the msg is placed on the queue. The recvmsg hook
will also read msg->skb and then after user space reads the msg will call
consume_skb(skb) on it effectively free'ing it.
But, the race is in above where backlog queue still has a reference to
the skb and calls skb_dequeue(). If the skb_dequeue happens after the
user reads and free's the skb we have a use after free.
The !ingress case does not suffer from this problem because it uses
sendmsg_*(sk, msg) which does not pass the sk_buff further down the
stack.
The following splat was observed with 'test_progs -t sockmap_listen':
[ 1022.710250][ T2556] general protection fault, ...
[...]
[ 1022.712830][ T2556] Workqueue: events sk_psock_backlog
[ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80
[ 1022.713653][ T2556] Code: ...
[...]
[ 1022.720699][ T2556] Call Trace:
[ 1022.720984][ T2556] <TASK>
[ 1022.721254][ T2556] ? die_addr+0x32/0x80^M
[ 1022.721589][ T2556] ? exc_general_protection+0x25a/0x4b0
[ 1022.722026][ T2556] ? asm_exc_general_protection+0x22/0x30
[ 1022.722489][ T2556] ? skb_dequeue+0x4c/0x80
[ 1022.722854][ T2556] sk_psock_backlog+0x27a/0x300
[ 1022.723243][ T2556] process_one_work+0x2a7/0x5b0
[ 1022.723633][ T2556] worker_thread+0x4f/0x3a0
[ 1022.723998][ T2556] ? __pfx_worker_thread+0x10/0x10
[ 1022.724386][ T2556] kthread+0xfd/0x130
[ 1022.724709][ T2556] ? __pfx_kthread+0x10/0x10
[ 1022.725066][ T2556] ret_from_fork+0x2d/0x50
[ 1022.725409][ T2556] ? __pfx_kthread+0x10/0x10
[ 1022.725799][ T2556] ret_from_fork_asm+0x1b/0x30
[ 1022.726201][ T2556] </TASK>
To fix we add an skb_get() before passing the skb to be enqueued in the
engress queue. This bumps the skb->users refcnt so that consume_skb()
and kfree_skb will not immediately free the sk_buff. With this we can
be sure the skb is still around when we do the dequeue. Then we just
need to decrement the refcnt or free the skb in the backlog case which
we do by calling kfree_skb() on the ingress case as well as the sendmsg
case.
Before locking change from fixes tag we had the sock locked so we
couldn't race with user and there was no issue here.
Fixes: 799aa7f98d ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Xu Kuohai <xukuohai@huawei.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@gmail.com
The existing code incorrectly casted a negative value (the result of a
subtraction) to an unsigned value without checking. For example, if
/proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred
lifetime would jump to 4 billion seconds. On my machine and network the
shortest lifetime that avoided underflow was 3 seconds.
Fixes: 76506a986d ("IPv6: fix DESYNC_FACTOR")
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Blamed commit changed:
ptr = kmalloc(size);
if (ptr)
size = ksize(ptr);
to:
size = kmalloc_size_roundup(size);
ptr = kmalloc(size);
This allowed various crash as reported by syzbot [1]
and Kyle Zeng.
Problem is that if @size is bigger than 0x80000001,
kmalloc_size_roundup(size) returns 2^32.
kmalloc_reserve() uses a 32bit variable (obj_size),
so 2^32 is truncated to 0.
kmalloc(0) returns ZERO_SIZE_PTR which is not handled by
skb allocations.
Following trace can be triggered if a netdev->mtu is set
close to 0x7fffffff
We might in the future limit netdev->mtu to more sensible
limit (like KMALLOC_MAX_SIZE).
This patch is based on a syzbot report, and also a report
and tentative fix from Kyle Zeng.
[1]
BUG: KASAN: user-memory-access in __build_skb_around net/core/skbuff.c:294 [inline]
BUG: KASAN: user-memory-access in __alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
Write of size 32 at addr 00000000fffffd10 by task syz-executor.4/22554
CPU: 1 PID: 22554 Comm: syz-executor.4 Not tainted 6.1.39-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
Call trace:
dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:279
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:286
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x120/0x1a0 lib/dump_stack.c:106
print_report+0xe4/0x4b4 mm/kasan/report.c:398
kasan_report+0x150/0x1ac mm/kasan/report.c:495
kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:189
memset+0x40/0x70 mm/kasan/shadow.c:44
__build_skb_around net/core/skbuff.c:294 [inline]
__alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
alloc_skb include/linux/skbuff.h:1316 [inline]
igmpv3_newpack+0x104/0x1088 net/ipv4/igmp.c:359
add_grec+0x81c/0x1124 net/ipv4/igmp.c:534
igmpv3_send_cr net/ipv4/igmp.c:667 [inline]
igmp_ifc_timer_expire+0x1b0/0x1008 net/ipv4/igmp.c:810
call_timer_fn+0x1c0/0x9f0 kernel/time/timer.c:1474
expire_timers kernel/time/timer.c:1519 [inline]
__run_timers+0x54c/0x710 kernel/time/timer.c:1790
run_timer_softirq+0x28/0x4c kernel/time/timer.c:1803
_stext+0x380/0xfbc
____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:79
call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891
do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:84
invoke_softirq kernel/softirq.c:437 [inline]
__irq_exit_rcu+0x1c0/0x4cc kernel/softirq.c:683
irq_exit_rcu+0x14/0x78 kernel/softirq.c:695
el0_interrupt+0x7c/0x2e0 arch/arm64/kernel/entry-common.c:717
__el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:724
el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:729
el0t_64_irq+0x1a0/0x1a4 arch/arm64/kernel/entry.S:584
Fixes: 12d6c1d3a2 ("skbuff: Proactively round up to kmalloc bucket size")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the big set of tty and serial driver changes for 6.6-rc1.
Lots of cleanups in here this cycle, and some driver updates. Short
summary is:
- Jiri's continued work to make the tty code and apis be a bit more
sane with regards to modern kernel coding style and types
- cpm_uart driver updates
- n_gsm updates and fixes
- meson driver updates
- sc16is7xx driver updates
- 8250 driver updates for different hardware types
- qcom-geni driver fixes
- tegra serial driver change
- stm32 driver updates
- synclink_gt driver cleanups
- tty structure size reduction
All of these have been in linux-next this week with no reported issues.
The last bit of cleanups from Jiri and the tty structure size reduction
came in last week, a bit late but as they were just style changes and
size reductions, I figured they should get into this merge cycle so that
others can work on top of them with no merge conflicts.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH+jA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykKyACgldt6QeenTN+6dXIHS/eQHtTKZwMAn3arSeXI
QrUUnLFjOWyoX87tbMBQ
=LVw0
-----END PGP SIGNATURE-----
Merge tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big set of tty and serial driver changes for 6.6-rc1.
Lots of cleanups in here this cycle, and some driver updates. Short
summary is:
- Jiri's continued work to make the tty code and apis be a bit more
sane with regards to modern kernel coding style and types
- cpm_uart driver updates
- n_gsm updates and fixes
- meson driver updates
- sc16is7xx driver updates
- 8250 driver updates for different hardware types
- qcom-geni driver fixes
- tegra serial driver change
- stm32 driver updates
- synclink_gt driver cleanups
- tty structure size reduction
All of these have been in linux-next this week with no reported
issues. The last bit of cleanups from Jiri and the tty structure size
reduction came in last week, a bit late but as they were just style
changes and size reductions, I figured they should get into this merge
cycle so that others can work on top of them with no merge conflicts"
* tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits)
tty: shrink the size of struct tty_struct by 40 bytes
tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw()
tty: n_tty: extract ECHO_OP processing to a separate function
tty: n_tty: unify counts to size_t
tty: n_tty: use u8 for chars and flags
tty: n_tty: simplify chars_in_buffer()
tty: n_tty: remove unsigned char casts from character constants
tty: n_tty: move newline handling to a separate function
tty: n_tty: move canon handling to a separate function
tty: n_tty: use MASK() for masking out size bits
tty: n_tty: make n_tty_data::num_overrun unsigned
tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun()
tty: n_tty: use 'num' for writes' counts
tty: n_tty: use output character directly
tty: n_tty: make flow of n_tty_receive_buf_common() a bool
Revert "tty: serial: meson: Add a earlycon for the T7 SoC"
Documentation: devices.txt: Fix minors for ttyCPM*
Documentation: devices.txt: Remove ttySIOC*
Documentation: devices.txt: Remove ttyIOC*
serial: 8250_bcm7271: improve bcm7271 8250 port
...
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.
A new SKB flag IP6SKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in fib6_select_path() and use it in
ip6_can_use_hint() to check for the existence of the flag.
Fixes: 197dbf24e3 ("ipv6: introduce and uses route look hints for list input.")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.
A new SKB flag IPSKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in ip_mkroute_input() and use it in
ip_extract_route_hint() to check for the existence of the flag.
Fixes: 02b2494161 ("ipv4: use dst hint for ipv4 list receive")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit bf5c25d608 ("skbuff: in skb_segment, call zerocopy functions
once per nskb") added the call to zero copy functions in skb_segment().
The change introduced a bug in skb_segment() because skb_orphan_frags()
may possibly change the number of fragments or allocate new fragments
altogether leaving nrfrags and frag to point to the old values. This can
cause a panic with stacktrace like the one below.
[ 193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
[ 193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G O 5.15.123+ #26
[ 193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
[ 194.021892] Call Trace:
[ 194.027422] <TASK>
[ 194.072861] tcp_gso_segment+0x107/0x540
[ 194.082031] inet_gso_segment+0x15c/0x3d0
[ 194.090783] skb_mac_gso_segment+0x9f/0x110
[ 194.095016] __skb_gso_segment+0xc1/0x190
[ 194.103131] netem_enqueue+0x290/0xb10 [sch_netem]
[ 194.107071] dev_qdisc_enqueue+0x16/0x70
[ 194.110884] __dev_queue_xmit+0x63b/0xb30
[ 194.121670] bond_start_xmit+0x159/0x380 [bonding]
[ 194.128506] dev_hard_start_xmit+0xc3/0x1e0
[ 194.131787] __dev_queue_xmit+0x8a0/0xb30
[ 194.138225] macvlan_start_xmit+0x4f/0x100 [macvlan]
[ 194.141477] dev_hard_start_xmit+0xc3/0x1e0
[ 194.144622] sch_direct_xmit+0xe3/0x280
[ 194.147748] __dev_queue_xmit+0x54a/0xb30
[ 194.154131] tap_get_user+0x2a8/0x9c0 [tap]
[ 194.157358] tap_sendmsg+0x52/0x8e0 [tap]
[ 194.167049] handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
[ 194.173631] handle_tx+0xcd/0xe0 [vhost_net]
[ 194.176959] vhost_worker+0x76/0xb0 [vhost]
[ 194.183667] kthread+0x118/0x140
[ 194.190358] ret_from_fork+0x1f/0x30
[ 194.193670] </TASK>
In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
local variable in skb_segment() stale. This resulted in the code hitting
i >= nrfrags prematurely and trying to move to next frag_skb using
list_skb pointer, which was NULL, and caused kernel panic. Move the call
to zero copy functions before using frags and nr_frags.
Fixes: bf5c25d608 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reported-by: Amit Goyal <agoyal@purestorage.com>
Cc: stable@vger.kernel.org
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_bind_phc is read locklessly. Add corresponding annotations.
Fixes: d463126e23 ("net: sock: extend SO_TIMESTAMPING for PHC binding")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_tsflags can be read locklessly, add corresponding annotations.
Fixes: b9f40e21ef ("net-timestamp: move timestamp flags out of sk_flags")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
msk->rmem_fwd_alloc can be read locklessly.
Add mptcp_rmem_fwd_alloc_add(), similar to sk_forward_alloc_add(),
and appropriate READ_ONCE()/WRITE_ONCE() annotations.
Fixes: 6511882cdd ("mptcp: allocate fwd memory separately on the rx and tx path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every time sk->sk_forward_alloc is read locklessly,
add a READ_ONCE().
Add sk_forward_alloc_add() helper to centralize updates,
to reduce number of WRITE_ONCE().
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_sk_diag_fill() has been changed to use sk_forward_alloc_get(),
but sk_get_meminfo() was forgotten.
Fixes: 292e6077b0 ("net: introduce sk_forward_alloc_get()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZPD2qwAKCRDbK58LschI
gzy9APoCsV3B0rJCX2PnxoKmx7ZwAbEhWRHN3iDAGgEOwuAdLQEAi1Mafivr/4Rr
WLi6AQOy+Erv7dAQRq2KbR2yE8rkEgg=
=BJ9X
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-08-31
We've added 15 non-merge commits during the last 3 day(s) which contain
a total of 17 files changed, 468 insertions(+), 97 deletions(-).
The main changes are:
1) BPF selftest fixes: one flake and one related to clang18 testing,
from Yonghong Song.
2) Fix a d_path BPF selftest failure after fast-forward from Linus'
tree, from Jiri Olsa.
3) Fix a preempt_rt splat in sockmap when using raw_spin_lock_t,
from John Fastabend.
4) Fix a xsk_diag_fill use-after-free race during socket cleanup,
from Magnus Karlsson.
5) Fix xsk_build_skb to address a buggy dereference of an ERR_PTR(),
from Tirthendu Sarkar.
6) Fix a bpftool build warning when compiled with -Wtype-limits,
from Yafang Shao.
7) Several misc fixes and cleanups in standardization docs,
from David Vernet.
8) Fix BPF selftest install to consider no_alu32/cpuv4/bpf-gcc flavors,
from Björn Töpel.
9) Annotate a data race in bpf_long_memcpy for KCSAN, from Daniel Borkmann.
10) Extend documentation with a description for CO-RE relocations,
from Eduard Zingerman.
11) Fix several invalid escape sequence warnings in bpf_doc.py script,
from Vishal Chourasia.
12) Fix the instruction set doc wrt offset of BPF-to-BPF call,
from Will Hawkins.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Include build flavors for install target
bpf: Annotate bpf_long_memcpy with data_race
selftests/bpf: Fix d_path test
bpf, docs: Fix invalid escape sequence warnings in bpf_doc.py
xsk: Fix xsk_diag use-after-free error during socket cleanup
bpf, docs: s/eBPF/BPF in standards documents
bpf, docs: Add abi.rst document to standardization subdirectory
bpf, docs: Move linux-notes.rst to root bpf docs tree
bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t
docs/bpf: Add description for CO-RE relocations
bpf, docs: Correct source of offset for program-local call
selftests/bpf: Fix flaky cgroup_iter_sleepable subtest
xsk: Fix xsk_build_skb() error: 'skb' dereferencing possible ERR_PTR()
bpftool: Fix build warnings with -Wtype-limits
bpf: Prevent inlining of bpf_fentry_test7()
====================
Link: https://lore.kernel.org/r/20230831210019.14417-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
New Features:
* Enable the NFS v4.2 READ_PLUS operation by default
Stable Fixes:
* NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
* NFS: Fix a potential data corruption
Bugfixes:
* Fix various READ_PLUS issues including:
* smatch warnings
* xdr size calculations
* scratch buffer handling
* 32bit / highmem xdr page handling
* Fix checkpatch errors in file.c
* Fix redundant readdir request after an EOF
* Fix handling of COPY ERR_OFFLOAD_NO_REQ
* Fix assignment of xprtdata.cred
Cleanups:
* Remove unused xprtrdma function declarations
* Clean up an integer overflow check to avoid a warning
* Clean up #includes in dns_resolve.c
* Clean up nfs4_get_device_info so we don't pass a NULL pointer to __free_page()
* Clean up sunrpc TCP socket timeout configuration
* Guard against READDIR loops when entry names are too long
* Use EXCHID4_FLAG_USE_PNFS_DS for DS servers
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmTwzwYACgkQ18tUv7Cl
QOtIhBAA+BOh7MB6yjlyctFxABJiXz2x2Dehxy7Ox15LfnyStqQAUEpk35CXWvjC
iNxpZJ486+WrzM76WGEaRbECK9nTQLK1yacR3V1zpnDwHWIJA6VHN6qU4JAfSMu7
XbhWkHWry6d7PXhvqHlaiYvPX2pF39wUzfH+vLlzS2QLIkpT6LnG0zVRJTQvLCmq
zE5xD+NCQ1Dpo9VnouuzW7VVfm532hI7GQNrpo0E0vWKgeQD+/fOpDu23MW8A1Ua
ZgVMAc7vScgDZH8/20Ze5PH4jAEB4gwEIzjreQlIXr7Tf+mE7qn435lgOuvdMQCW
eHhdNriZ2X6HMLhNFFpup8bkRKGCCTooTHC1W66n9CuxIAuVT5DNwBbakpagHSZf
J4ho81hEgBfc5zppISVjV6eFK4brM0rF9AliaIw8r/qGcMmO1CILi9tLGiheiJcT
LuId7U2sE/vfIa6SiBt7rx37/MkrgLlAgjpk4dCRJQW+gKVBi09sMGnDlgaRvCZz
T0WCsK4DgI9q2rScpwJYJbNWbC2Q8qUtYWW9LSRvwhbNdm/VbRnEHWA7eOwqqm8r
KkkF4chyoTJqpnF3SjxT/lyFk6GwsD+wXafOmEeuFA6Si3dHDU9i3aUf+cCXhwRI
uUzCUHYcCKnv4QVGPuAbIdxMgueNCuLoeWgTClVlqidv7GRyz7Y=
=rjmq
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Enable the NFS v4.2 READ_PLUS operation by default
Stable Fixes:
- NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
- NFS: Fix a potential data corruption
Bugfixes:
- Fix various READ_PLUS issues including:
- smatch warnings
- xdr size calculations
- scratch buffer handling
- 32bit / highmem xdr page handling
- Fix checkpatch errors in file.c
- Fix redundant readdir request after an EOF
- Fix handling of COPY ERR_OFFLOAD_NO_REQ
- Fix assignment of xprtdata.cred
Cleanups:
- Remove unused xprtrdma function declarations
- Clean up an integer overflow check to avoid a warning
- Clean up #includes in dns_resolve.c
- Clean up nfs4_get_device_info so we don't pass a NULL pointer
to __free_page()
- Clean up sunrpc TCP socket timeout configuration
- Guard against READDIR loops when entry names are too long
- Use EXCHID4_FLAG_USE_PNFS_DS for DS servers"
* tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (22 commits)
pNFS: Fix assignment of xprtdata.cred
NFSv4.2: fix handling of COPY ERR_OFFLOAD_NO_REQ
NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN
NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server
NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver
SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt()
SUNRPC: Allow specification of TCP client connect timeout at setup
SUNRPC: Refactor and simplify connect timeout
SUNRPC: Set the TCP_SYNCNT to match the socket timeout
NFS: Fix a potential data corruption
nfs: fix redundant readdir request after get eof
nfs/blocklayout: Use the passed in gfp flags
filemap: Fix errors in file.c
NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
NFS: Move common includes outside ifdef
SUNRPC: clean up integer overflow check
xprtrdma: Remove unused function declaration rpcrdma_bc_post_recv()
NFS: Enable the READ_PLUS operation by default
SUNRPC: kmap() the xdr pages during decode
NFSv4.2: Rework scratch handling for READ_PLUS (again)
...
I'm thrilled to announce that the Linux in-kernel NFS server now
offers NFSv4 write delegations. A write delegation enables a client
to cache data and metadata for a single file more aggressively,
reducing network round trips and server workload. Many thanks to Dai
Ngo for contributing this facility, and to Jeff Layton and Neil
Brown for reviewing and testing it.
This release also sees the removal of all support for DES- and
triple-DES-based Kerberos encryption types in the kernel's SunRPC
implementation. These encryption types have been deprecated by the
Internet community for years and are considered insecure. This
change affects both the in-kernel NFS client and server.
The server's UDP and TCP socket transports have now fully adopted
David Howells' new bio_vec iterator so that no more than one
sendmsg() call is needed to transmit each RPC message. In
particular, this helps kTLS optimize record boundaries when sending
RPC-with-TLS replies, and it takes the server a baby step closer to
handling file I/O via folios.
We've begun work on overhauling the SunRPC thread scheduler to
remove a costly linked-list walk when looking for an idle RPC
service thread to wake. The pre-requisites are included in this
release. Thanks to Neil Brown for his ongoing work on this
improvement.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmTwoa0ACgkQM2qzM29m
f5cZvw/8CmFVNC27aMrJEhRRhwwrXbLzUkWh9GCYkG98PHYiLxLTvZ6qELXAax/a
UjSgIDSRcWl4z8M/tyBQtgsw7NADr+7XWqEoXR7HZ5pEEC/KNGM0oQWQ92ojjKYy
JmHdB02uaDJfcd9ioFTU13cw7q2BQfoe2xLI8yqis2vcVSu92AM7aIw+cvJIpwQB
inA3TIIsYTV/gPByXSfEtvmYACadoFiMvfvYwaWhjFS9MdSzFmcVG0Dp3EFIig29
odmWEofcz6uIvUWvUswWEGdoSu7uOKIztSuAI4PlTwaofUaSKG6e5kmtpr3cLERD
Uhg2lm5JgqkXBd7QHObNimJ4DtQzFwHmhA08qo8rd/zba75mn/Hr5IF0q3Rxs99J
SRYHcAeP8afKn5Ge0yzoTgCNcqhfz8KLRfoCQX49mljr+muNxld4nMklD2KdUwJi
XEB512/q3E4nUgopXZiSJYQYAq1CfdR5WpGipZ9X0XK9HZBDF/qhXGtk1YQuNWyj
ZxJS3bfBza4oVIvP5/ehjCIQwOvqkcrC5zZGDIgDvw9Q6L3L1wqmVntsdCLCLRcJ
jB4IOsj+DECfJ6w2vP2SZ3GeMtFnyuTQjhUTkjPuAKTBBiKo4Tj0o/agpfDYbWZy
1l3a2yH5jqJgkm4MaVh3YHRJGc0ub0ccpIrs3QQ4jvjMLQ/3Gcs=
=XGHs
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"I'm thrilled to announce that the Linux in-kernel NFS server now
offers NFSv4 write delegations. A write delegation enables a client to
cache data and metadata for a single file more aggressively, reducing
network round trips and server workload. Many thanks to Dai Ngo for
contributing this facility, and to Jeff Layton and Neil Brown for
reviewing and testing it.
This release also sees the removal of all support for DES- and
triple-DES-based Kerberos encryption types in the kernel's SunRPC
implementation. These encryption types have been deprecated by the
Internet community for years and are considered insecure. This change
affects both the in-kernel NFS client and server.
The server's UDP and TCP socket transports have now fully adopted
David Howells' new bio_vec iterator so that no more than one sendmsg()
call is needed to transmit each RPC message. In particular, this helps
kTLS optimize record boundaries when sending RPC-with-TLS replies, and
it takes the server a baby step closer to handling file I/O via
folios.
We've begun work on overhauling the SunRPC thread scheduler to remove
a costly linked-list walk when looking for an idle RPC service thread
to wake. The pre-requisites are included in this release. Thanks to
Neil Brown for his ongoing work on this improvement"
* tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits)
Documentation: Add missing documentation for EXPORT_OP flags
SUNRPC: Remove unused declaration rpc_modcount()
SUNRPC: Remove unused declarations
NFSD: da_addr_body field missing in some GETDEVICEINFO replies
SUNRPC: Remove return value of svc_pool_wake_idle_thread()
SUNRPC: make rqst_should_sleep() idempotent()
SUNRPC: Clean up svc_set_num_threads
SUNRPC: Count ingress RPC messages per svc_pool
SUNRPC: Deduplicate thread wake-up code
SUNRPC: Move trace_svc_xprt_enqueue
SUNRPC: Add enum svc_auth_status
SUNRPC: change svc_xprt::xpt_flags bits to enum
SUNRPC: change svc_rqst::rq_flags bits to enum
SUNRPC: change svc_pool::sp_flags bits to enum
SUNRPC: change cache_head.flags bits to enum
SUNRPC: remove timeout arg from svc_recv()
SUNRPC: change svc_recv() to return void.
SUNRPC: call svc_process() from svc_recv().
nfsd: separate nfsd_last_thread() from nfsd_put()
nfsd: Simplify code around svc_exit_thread() call in nfsd()
...
Fix a use-after-free error that is possible if the xsk_diag interface
is used after the socket has been unbound from the device. This can
happen either due to the socket being closed or the device
disappearing. In the early days of AF_XDP, the way we tested that a
socket was not bound to a device was to simply check if the netdevice
pointer in the xsk socket structure was NULL. Later, a better system
was introduced by having an explicit state variable in the xsk socket
struct. For example, the state of a socket that is on the way to being
closed and has been unbound from the device is XSK_UNBOUND.
The commit in the Fixes tag below deleted the old way of signalling
that a socket is unbound, setting dev to NULL. This in the belief that
all code using the old way had been exterminated. That was
unfortunately not true as the xsk diagnostics code was still using the
old way and thus does not work as intended when a socket is going
down. Fix this by introducing a test against the state variable. If
the socket is in the state XSK_UNBOUND, simply abort the diagnostic's
netlink operation.
Fixes: 18b1ab7aa7 ("xsk: Fix race at socket teardown")
Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20230831100119.17408-1-magnus.karlsson@gmail.com
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmTv1cgACgkQ1V2XiooU
IOSkexAAr7A9dsFLmP5gESXxkMIeJ+ZdwPeKd+XYm1qe+3AFAPd6fo1UxNKIoCD2
aEff/MJy+SImc3khnjRJ1swXZNI1FFOQMNhywqVvWdpT/Z4SB+yt3bYIXM9CG7qQ
X1s5n8zlP1WntCk716LGdwEcaA+hcmMTgVPaVPPJOZpNBQsZaB9TjuPMcI05zTBz
xkNwRAZFxssCwS0/bDqHH8guC95AxDdzgZRjGL9y1786y10/qNet9/WWxcx+MwTp
K8xA3WUPWiBMcY1N1amYb44tzMLWaLedGGzcuDFMth8s+pyxGOJM/QsNUyNG0qPr
9I4bIZWjgsi6OAFXLJQacXH0hXohChIyXFTq3yq09M5AG5EKQi0I9Da1olOtYER7
6yvaEFQICyGWcY9eg1tlqr6ZioKx/3g9Xa54jPcldXl3U0+qhuUj6qIkXrGbnahy
yizTpozEmMxFevdMbjCEZ6dRWixjFmB66KeVLuoyBpZXHoXvyGbCKkz/GJ45bFW8
gVgSTLZYtQYYCWA2CIdr3ucXlzibWnhh6b+yB3IYncHUitXu029IA51eeOuuEL9N
XAgExjAUU5GwMo6iPFEnIsPM5scYNkwnt90SaW6DaPVqAahqyZ2e84IYzw7zod2v
z1IGGp+tjUXIsBMXzLX96JBXWlK6w5nMDnYag4UUQia3X3fUZmc=
=2mzW
-----END PGP SIGNATURE-----
Merge tag 'nf-23-08-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix mangling of TCP options with non-linear skbuff, from Xiao Liang.
2) OOB read in xt_sctp due to missing sanitization of array length field.
From Wander Lairson Costa.
3) OOB read in xt_u32 due to missing sanitization of array length field.
Also from Wander Lairson Costa.
All of them above, always broken for several releases.
4) Missing audit log for set element reset command, from Phil Sutter.
5) Missing audit log for rule reset command, also from Phil.
These audit log support are missing in 6.5.
* tag 'nf-23-08-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Audit log rule reset
netfilter: nf_tables: Audit log setelem reset
netfilter: xt_u32: validate user space input
netfilter: xt_sctp: validate the flag_info count
netfilter: nft_exthdr: Fix non-linear header modification
====================
Link: https://lore.kernel.org/r/20230830235935.465690-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Resetting rules' stateful data happens outside of the transaction logic,
so 'get' and 'dump' handlers have to emit audit log entries themselves.
Fixes: 8daa8fde3f ("netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since set element reset is not integrated into nf_tables' transaction
logic, an explicit log call is needed, similar to NFT_MSG_GETOBJ_RESET
handling.
For the sake of simplicity, catchall element reset will always generate
a dedicated log entry. This relieves nf_tables_dump_set() from having to
adjust the logged element count depending on whether a catchall element
was found or not.
Fixes: 079cd63321 ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The xt_u32 module doesn't validate the fields in the xt_u32 structure.
An attacker may take advantage of this to trigger an OOB read by setting
the size fields with a value beyond the arrays boundaries.
Add a checkentry function to validate the structure.
This was originally reported by the ZDI project (ZDI-CAN-18408).
Fixes: 1b50b8a371 ("[NETFILTER]: Add u32 match")
Cc: stable@vger.kernel.org
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
sctp_mt_check doesn't validate the flag_count field. An attacker can
take advantage of that to trigger a OOB read and leak memory
information.
Add the field validation in the checkentry function.
Fixes: 2e4e6a17af ("[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables")
Cc: stable@vger.kernel.org
Reported-by: Lucas Leong <wmliang@infosec.exchange>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>