Commit graph

72539 commits

Author SHA1 Message Date
Eric Dumazet
ac3ad19584 net: fix __dev_kfree_skb_any() vs drop monitor
dev_kfree_skb() is aliased to consume_skb().

When a driver is dropping a packet by calling dev_kfree_skb_any()
we should propagate the drop reason instead of pretending
the packet was consumed.

Note: Now we have enum skb_drop_reason we could remove
enum skb_free_reason (for linux-6.4)

v2: added an unlikely(), suggested by Yunsheng Lin.

Fixes: e6247027e5 ("net: introduce dev_consume_skb_any()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-24 11:02:07 +00:00
Linus Torvalds
d2980d8d82 There is no particular theme here - mainly quick hits all over the tree.
Most notable is a set of zlib changes from Mikhail Zaslonko which enhances
 and fixes zlib's use of S390 hardware support: "lib/zlib: Set of s390
 DFLTCC related patches for kernel zlib".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/QC4QAKCRDdBJ7gKXxA
 jtKdAQCbDCBdY8H45d1fONzQW2UDqCPnOi77MpVUxGL33r+1SAEA807C7rvDEmlf
 yP1Ft+722fFU5jogVU8ZFh+vapv2/gI=
 =Q9YK
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "There is no particular theme here - mainly quick hits all over the
  tree.

  Most notable is a set of zlib changes from Mikhail Zaslonko which
  enhances and fixes zlib's use of S390 hardware support: 'lib/zlib: Set
  of s390 DFLTCC related patches for kernel zlib'"

* tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (55 commits)
  Update CREDITS file entry for Jesper Juhl
  sparc: allow PM configs for sparc32 COMPILE_TEST
  hung_task: print message when hung_task_warnings gets down to zero.
  arch/Kconfig: fix indentation
  scripts/tags.sh: fix the Kconfig tags generation when using latest ctags
  nilfs2: prevent WARNING in nilfs_dat_commit_end()
  lib/zlib: remove redundation assignement of avail_in dfltcc_gdht()
  lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default
  lib/zlib: DFLTCC always switch to software inflate for Z_PACKET_FLUSH option
  lib/zlib: DFLTCC support inflate with small window
  lib/zlib: Split deflate and inflate states for DFLTCC
  lib/zlib: DFLTCC not writing header bits when avail_out == 0
  lib/zlib: fix DFLTCC ignoring flush modes when avail_in == 0
  lib/zlib: fix DFLTCC not flushing EOBS when creating raw streams
  lib/zlib: implement switching between DFLTCC and software
  lib/zlib: adjust offset calculation for dfltcc_state
  nilfs2: replace WARN_ONs for invalid DAT metadata block requests
  scripts/spelling.txt: add "exsits" pattern and fix typo instances
  fs: gracefully handle ->get_block not mapping bh in __mpage_writepage
  cramfs: Kconfig: fix spelling & punctuation
  ...
2023-02-23 17:55:40 -08:00
Linus Torvalds
3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Eric Van Hensbergen
2a03472262
net/9p: Adjust maximum MSIZE to account for p9 header
Add maximum p9 header size to MSIZE to make sure we can
have page aligned data.

Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
2023-02-23 22:39:36 +00:00
Xin Long
68ba446395 sctp: add a refcnt in sctp_stream_priorities to avoid a nested loop
With this refcnt added in sctp_stream_priorities, we don't need to
traverse all streams to check if the prio is used by other streams
when freeing one stream's prio in sctp_sched_prio_free_sid(). This
can avoid a nested loop (up to 65535 * 65535), which may cause a
stuck as Ying reported:

    watchdog: BUG: soft lockup - CPU#23 stuck for 26s! [ksoftirqd/23:136]
    Call Trace:
     <TASK>
     sctp_sched_prio_free_sid+0xab/0x100 [sctp]
     sctp_stream_free_ext+0x64/0xa0 [sctp]
     sctp_stream_free+0x31/0x50 [sctp]
     sctp_association_free+0xa5/0x200 [sctp]

Note that it doesn't need to use refcount_t type for this counter,
as its accessing is always protected under the sock lock.

v1->v2:
 - add a check in sctp_sched_prio_set to avoid the possible prio_head
   refcnt overflow.

Fixes: 9ed7bfc795 ("sctp: fix memory leak in sctp_stream_outq_migrate()")
Reported-by: Ying Xu <yinxu@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/825eb0c905cb864991eba335f4a2b780e543f06b.1677085641.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-23 12:59:40 -08:00
Lu Wei
4cc59f3869 ipv6: Add lwtunnel encap size of all siblings in nexthop calculation
In function rt6_nlmsg_size(), the length of nexthop is calculated
by multipling the nexthop length of fib6_info and the number of
siblings. However if the fib6_info has no lwtunnel but the siblings
have lwtunnels, the nexthop length is less than it should be, and
it will trigger a warning in inet6_rt_notify() as follows:

WARNING: CPU: 0 PID: 6082 at net/ipv6/route.c:6180 inet6_rt_notify+0x120/0x130
......
Call Trace:
 <TASK>
 fib6_add_rt2node+0x685/0xa30
 fib6_add+0x96/0x1b0
 ip6_route_add+0x50/0xd0
 inet6_rtm_newroute+0x97/0xa0
 rtnetlink_rcv_msg+0x156/0x3d0
 netlink_rcv_skb+0x5a/0x110
 netlink_unicast+0x246/0x350
 netlink_sendmsg+0x250/0x4c0
 sock_sendmsg+0x66/0x70
 ___sys_sendmsg+0x7c/0xd0
 __sys_sendmsg+0x5d/0xb0
 do_syscall_64+0x3f/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

This bug can be reproduced by script:

ip -6 addr add 2002::2/64 dev ens2
ip -6 route add 100::/64 via 2002::1 dev ens2 metric 100

for i in 10 20 30 40 50 60 70;
do
	ip link add link ens2 name ipv_$i type ipvlan
	ip -6 addr add 2002::$i/64 dev ipv_$i
	ifconfig ipv_$i up
done

for i in 10 20 30 40 50 60;
do
	ip -6 route append 100::/64 encap ip6 dst 2002::$i via 2002::1
dev ipv_$i metric 100
done

ip -6 route append 100::/64 via 2002::1 dev ipv_70 metric 100

This patch fixes it by adding nexthop_len of every siblings using
rt6_nh_nlmsg_size().

Fixes: beb1afac51 ("net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute")
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230222083629.335683-2-luwei32@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-23 10:54:13 +01:00
Jakub Kicinski
fd2a55e74a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix broken listing of set elements when table has an owner.

2) Fix conntrack refcount leak in ctnetlink with related conntrack
   entries, from Hangyu Hua.

3) Fix use-after-free/double-free in ctnetlink conntrack insert path,
   from Florian Westphal.

4) Fix ip6t_rpfilter with VRF, from Phil Sutter.

5) Fix use-after-free in ebtables reported by syzbot, also from Florian.

6) Use skb->len in xt_length to deal with IPv6 jumbo packets,
   from Xin Long.

7) Fix NETLINK_LISTEN_ALL_NSID with ctnetlink, from Florian Westphal.

8) Fix memleak in {ip_,ip6_,arp_}tables in ENOMEM error case,
   from Pavel Tikhomirov.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: x_tables: fix percpu counter block leak on error path when creating new netns
  netfilter: ctnetlink: make event listener tracking global
  netfilter: xt_length: use skb len to match in length_mt6
  netfilter: ebtables: fix table blob use-after-free
  netfilter: ip6t_rpfilter: Fix regression with VRF interfaces
  netfilter: conntrack: fix rmmod double-free race
  netfilter: ctnetlink: fix possible refcount leak in ctnetlink_create_conntrack()
  netfilter: nf_tables: allow to fetch set elements when table has an owner
====================

Link: https://lore.kernel.org/r/20230222092137.88637-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-22 21:25:23 -08:00
Linus Torvalds
d8ca6dbb8d NFS Client Updates for Linux 6.3
New Features:
   * Convert the read and write paths to use folios
 
 Bugfixes and Cleanups:
   * Fix tracepoint state manager flag printing
   * Fix disabling swap files
   * Fix NFSv4 client identifier sysfs path in the documentation
   * Don't clear NFS_CAP_COPY if server returns NFS4ERR_OFFLOAD_DENIED
   * Treat GETDEVICEINFO errors as a layout failure
   * Replace kmap_atomic() calls with kmap_local_page()
   * Constify sunrpc sysfs kobj_type structures
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmP2lSQACgkQ18tUv7Cl
 QOuaPw//TNLuEMiiMtM6EXFPc2p2tNMfM+EeMjYKZkS5WnvszVOZRzQfXz4dvvif
 R2BIHT+C6AWkPOxnbuhIk86acy/wRuAPD9OMyp+3PNa4oEqjyIk5QaTvgYeeRBNe
 H7haAqCiEBx/g8F9DladWLAwpilTSPpupdmiXwTS7Q7wB6NkKqc1eAJ+BTuRYAsm
 19p2nGTzenWrut4HMOOmuAVrA2OcWtzgYzdJlY19lWRmzfCvX1gh3i2JxGutmnl3
 tduviVBdu5QGnTKHJiP853kzs3VwEHIFSO+5z137JWKTm3IgPxW/u3zf+ijmw22t
 vjbtFAhb4+u5juchvVKyIX4lClPSKZincQ6dn9soOm7qJcxYWNm40DzxAF21GUAq
 d4tp+zBoe9pfKxZrylp6YxIchLQYhU+dL0hhqbKzfOAFOg28k1JsJubl/wRvWKVe
 LGbvFOrUYo8arPOKfxBWMf4HCouu/vGq/qng+U/Ppjw3h8gD32aniyP3qN9Fm1JA
 rFAMKTnD4I3r3x1E4HX3icUMwil36zBr9cPglQBYD5DW54ConEW8e6hvQmjTdQHP
 EgN9+PSYzYXuKR/Fij7BYOymUD9grT1+5hVaOPzlM4SY3jMy2novcQGoGiLVOd0t
 PGQ5047qbESFD7VkW4GBAYftrSMLiU+l5KL3aN1JRtprxu6NaYA=
 =39ue
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.3-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:

   - Convert the read and write paths to use folios

  Bugfixes and Cleanups:

   - Fix tracepoint state manager flag printing

   - Fix disabling swap files

   - Fix NFSv4 client identifier sysfs path in the documentation

   - Don't clear NFS_CAP_COPY if server returns NFS4ERR_OFFLOAD_DENIED

   - Treat GETDEVICEINFO errors as a layout failure

   - Replace kmap_atomic() calls with kmap_local_page()

   - Constify sunrpc sysfs kobj_type structures"

* tag 'nfs-for-6.3-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (25 commits)
  fs/nfs: Replace kmap_atomic() with kmap_local_page() in dir.c
  pNFS/filelayout: treat GETDEVICEINFO errors as layout failure
  Documentation: Fix sysfs path for the NFSv4 client identifier
  nfs42: do not fail with EIO if ssc returns NFS4ERR_OFFLOAD_DENIED
  NFS: fix disabling of swap
  SUNRPC: make kobj_type structures constant
  nfs4trace: fix state manager flag printing
  NFS: Remove unnecessary check in nfs_read_folio()
  NFS: Improve tracing of nfs_wb_folio()
  NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio()
  NFS: fix up nfs_release_folio() to try to release the page
  NFS: Clean up O_DIRECT request allocation
  NFS: Fix up nfs_vm_page_mkwrite() for folios
  NFS: Convert nfs_write_begin/end to use folios
  NFS: Remove unused function nfs_wb_page()
  NFS: Convert buffered writes to use folios
  NFS: Convert the function nfs_wb_page() to use folios
  NFS: Convert buffered reads to use folios
  NFS: Add a helper nfs_wb_folio()
  NFS: Convert the remaining pagelist helper functions to support folios
  ...
2023-02-22 14:47:20 -08:00
Linus Torvalds
9fc2f99030 NFSD 6.3 Release Notes
Two significant security enhancements are part of this release:
 
 * NFSD's RPC header encoding and decoding, including RPCSEC GSS
   and gssproxy header parsing, has been overhauled to make it
   more memory-safe.
 
 * Support for Kerberos AES-SHA2-based encryption types has been
   added for both the NFS client and server. This provides a clean
   path for deprecating and removing insecure encryption types
   based on DES and SHA-1. AES-SHA2 is also FIPS-140 compliant, so
   that NFS with Kerberos may now be used on systems with fips
   enabled.
 
 In addition to these, NFSD is now able to handle crossing into an
 auto-mounted mount point on an exported NFS mount. A number of
 fixes have been made to NFSD's server-side copy implementation.
 
 RPC metrics have been converted to per-CPU variables. This helps
 reduce unnecessary cross-CPU and cross-node memory bus traffic,
 and significantly reduces noise when KCSAN is enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmPzgiYACgkQM2qzM29m
 f5dB2A//eqjpj+FgAN+UjygrwMC4ahAsPX3Sc3FG8/lTAiao3NFVFY2gxAiCPyVE
 CFk+tUyfL23oXvbyfIBe3LhxSBOf621xU6up2OzqAzJqh1Q9iUWB6as3I14to8ZU
 sWpxXo5ofwk1hzkbrvOAVkyfY0emwsr00iBeWMawkpBe8FZEQA31OYj3/xHr6bBI
 zEVlZPBZAZlp0DZ74tb+bBLs/EOnqKj+XLWcogCH13JB3sn2umF6cQNkYgsxvHGa
 TNQi4LEdzWZGme242LfBRiGGwm1xuVIjlAhYV/R1wIjaknE3QBzqfXc6lJx74WII
 HaqpRJGrKqdo7B+1gaXCl/AMS7YluED1CBrxuej0wBG7l2JEB7m2MFMQ4LTQjgsn
 nrr3P70DgbB4LuPCPyUS7dtsMmUXabIqP7niiCR4T1toH6lBmHAgEi4cFmkzg7Cd
 EoFzn888mtDpfx4fghcsRWS5oKXEzbPJfu5+IZOD63+UB+NGpi0Xo2s23sJPK8vz
 kqK/X63JYOUxWUvK0zkj/c/wW1cLqIaBwnSKbShou5/BL+cZVI+uJYrnEesgpoB2
 5fh/cZv3hdcoOPO7OfcjCLQYy4J6RCWajptnk/hcS3lMvBTBrnq697iAqCVURDKU
 Xfmlf7XbBwje+sk4eHgqVGEqqVjrEmoqbmA2OS44WSS5LDvxXdI=
 =ZG/7
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Two significant security enhancements are part of this release:

   - NFSD's RPC header encoding and decoding, including RPCSEC GSS and
     gssproxy header parsing, has been overhauled to make it more
     memory-safe.

   - Support for Kerberos AES-SHA2-based encryption types has been added
     for both the NFS client and server. This provides a clean path for
     deprecating and removing insecure encryption types based on DES and
     SHA-1. AES-SHA2 is also FIPS-140 compliant, so that NFS with
     Kerberos may now be used on systems with fips enabled.

  In addition to these, NFSD is now able to handle crossing into an
  auto-mounted mount point on an exported NFS mount. A number of fixes
  have been made to NFSD's server-side copy implementation.

  RPC metrics have been converted to per-CPU variables. This helps
  reduce unnecessary cross-CPU and cross-node memory bus traffic, and
  significantly reduces noise when KCSAN is enabled"

* tag 'nfsd-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (121 commits)
  NFSD: Clean up nfsd_symlink()
  NFSD: copy the whole verifier in nfsd_copy_write_verifier
  nfsd: don't fsync nfsd_files on last close
  SUNRPC: Fix occasional warning when destroying gss_krb5_enctypes
  nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
  NFSD: fix problems with cleanup on errors in nfsd4_copy
  nfsd: fix race to check ls_layouts
  nfsd: don't hand out delegation on setuid files being opened for write
  SUNRPC: Remove ->xpo_secure_port()
  SUNRPC: Clean up the svc_xprt_flags() macro
  nfsd: remove fs/nfsd/fault_inject.c
  NFSD: fix leaked reference count of nfsd4_ssc_umount_item
  nfsd: clean up potential nfsd_file refcount leaks in COPY codepath
  nfsd: zero out pointers after putting nfsd_files on COPY setup error
  SUNRPC: Fix whitespace damage in svcauth_unix.c
  nfsd: eliminate __nfs4_get_fd
  nfsd: add some kerneldoc comments for stateid preprocessing functions
  nfsd: eliminate find_deleg_file_locked
  nfsd: don't take nfsd4_copy ref for OP_OFFLOAD_STATUS
  SUNRPC: Add encryption self-tests
  ...
2023-02-22 14:21:40 -08:00
Linus Torvalds
6c71297eaf for-linus-2023022201
-----BEGIN PGP SIGNATURE-----
 
 iQJSBAABCAA8FiEEoEVH9lhNrxiMPSyI7MXwXhnZSjYFAmP15aoeHGJlbmphbWlu
 LnRpc3NvaXJlc0ByZWRoYXQuY29tAAoJEOzF8F4Z2Uo2nwkP/Rcr5lZYKQ59ezoJ
 PwHx/0wO1Qpgd4fRwD5Mvxynmfoq20A6FZFpmtjUjNcP3dm3X2UJtXE3HkDECCFP
 5daqTFOswFiPFtcMlIl+SHxNgIIlTodbqx9MAFj/7n5aihB54JpTHWsgbENj35Y1
 RLHYDi0+wj69y9ctOkqKGWHp8Uf220RWrD7zZf7AJAc5cwot1kM00RSy9dSAJ0vB
 riZdCQqYwXbf4I1uFzthS6AdIIWcpmpZyYFnsF7F2xQADqCNXr1MTmG0uC+Ey/J/
 0PZXjkyMO/pNwxarRiXmBKnsJJlJajxRXGtpgKYZu8AnIaXuNu/z987Y1pZ/kYis
 OQSKib2bGWzT2MycqiuVVrOChIMvqmk2aPRvg73jojpSySYTKn1Jp/XHyXS8qkkQ
 HJ/u6VPpe42GdfOGM7V9ig+80z/5D5u1XJECoPpxyKHWQ8S7ZlczQjVT+a8nUuBV
 hPTiqAYE9NT6SQJ0b5z2uhBdGRzvAbCZzDCgvjE87zsRmLzFk/fzMdMPZrlADKMJ
 3qu7ey2GYOBNfnDcJmPu5HmK/A9BaPZiZAMakqjGpezZbe+LGBNskXTRwPpNC+Dh
 11pna0ns+vlxeT7nonO0JsYvsKWy0pPcBvhyUEHWsmHgyagRLIsvg01ezB/Ivu0O
 xYj3UPVEGTiwHBt9xnaZcIjRSPSZ
 =cBvC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2023022201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID updates from Benjamin Tissoires:

 - HID-BPF infrastructure: this allows to start using HID-BPF. Note that
   the mechanism to ship HID-BPF program through the kernel tree is
   still not implemented yet (but is planned).

   This should be a no-op for 99% of users. Also we are gaining
   kselftests for the HID tree (Benjamin Tissoires)

 - Some UAF fixes in workers when using uhid (Pietro Borrello & Benjamin
   Tissoires)

 - Constify hid_ll_driver (Thomas Weißschuh)

 - Allow more custom IIO sensors through HID (Philipp Jungkamp)

 - Logitech HID++ fixes for scroll wheel, protocol and debug (Bastien
   Nocera)

 - Some new device support: Steam Deck (Vicki Pfau), UClogic (José
   Expósito), Logitech G923 Xbox Edition steering wheel (Walt Holman),
   EVision keyboards (Philippe Valembois)

 - other assorted code cleanups and fixes

* tag 'for-linus-2023022201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (99 commits)
  HID: mcp-2221: prevent UAF in delayed work
  hid: bigben_probe(): validate report count
  HID: asus: use spinlock to safely schedule workers
  HID: asus: use spinlock to protect concurrent accesses
  HID: bigben: use spinlock to safely schedule workers
  HID: bigben_worker() remove unneeded check on report_field
  HID: bigben: use spinlock to protect concurrent accesses
  HID: logitech-hidpp: Add myself to authors
  HID: logitech-hidpp: Retry commands when device is busy
  HID: logitech-hidpp: Add more debug statements
  HID: Add support for Logitech G923 Xbox Edition steering wheel
  HID: logitech-hidpp: Add Signature M650
  HID: logitech-hidpp: Remove HIDPP_QUIRK_NO_HIDINPUT quirk
  HID: logitech-hidpp: Don't restart communication if not necessary
  HID: logitech-hidpp: Add constants for HID++ 2.0 error codes
  Revert "HID: logitech-hidpp: add a module parameter to keep firmware gestures"
  HID: logitech-hidpp: Hard-code HID++ 1.0 fast scroll support
  HID: i2c-hid: goodix: Add mainboard-vddio-supply
  dt-bindings: HID: i2c-hid: goodix: Add mainboard-vddio-supply
  HID: i2c-hid: goodix: Stop tying the reset line to the regulator
  ...
2023-02-22 11:24:42 -08:00
Benjamin Tissoires
904e28c6de Merge branch 'for-6.3/hid-bpf' into for-linus
Initial support of HID-BPF (Benjamin Tissoires)

The history is a little long for this series, as it was intended to be
sent for v6.2. However some last minute issues forced us to postpone it
to v6.3.

Conflicts:
* drivers/hid/i2c-hid/Kconfig:
  commit bf7660dab3 ("HID: stop drivers from selecting CONFIG_HID")
  conflicts with commit 2afac81dd1 ("HID: fix I2C_HID not selected
  when I2C_HID_OF_ELAN is")
  the resolution is simple enough: just drop the "default" and "select"
  lines as the new commit from Arnd is doing
2023-02-22 10:44:31 +01:00
Benjamin Tissoires
06db2af35e Merge branch 'for-6.3/hid-core' into for-linus
- constify hid_ll_driver (Thomas Weißschuh)
- map standard Battery System Charging to upower (José Expósito)
- couple of assorted fixes and new handling of HID usages (Jingyuan
  Liang & Ronald Tschalär)
2023-02-22 10:27:57 +01:00
Pavel Tikhomirov
0af8c09c89 netfilter: x_tables: fix percpu counter block leak on error path when creating new netns
Here is the stack where we allocate percpu counter block:

  +-< __alloc_percpu
    +-< xt_percpu_counter_alloc
      +-< find_check_entry # {arp,ip,ip6}_tables.c
        +-< translate_table

And it can be leaked on this code path:

  +-> ip6t_register_table
    +-> translate_table # allocates percpu counter block
    +-> xt_register_table # fails

there is no freeing of the counter block on xt_register_table fail.
Note: xt_percpu_counter_free should be called to free it like we do in
do_replace through cleanup_entry helper (or in __ip6t_unregister_table).

Probability of hitting this error path is low AFAICS (xt_register_table
can only return ENOMEM here, as it is not replacing anything, as we are
creating new netns, and it is hard to imagine that all previous
allocations succeeded and after that one in xt_register_table failed).
But it's worth fixing even the rare leak.

Fixes: 71ae0dff02 ("netfilter: xtables: use percpu rule counters")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-22 10:11:27 +01:00
Linus Torvalds
5b7c4cabbb Networking changes for 6.3.
Core
 ----
 
  - Add dedicated kmem_cache for typical/small skb->head, avoid having
    to access struct page at kfree time, and improve memory use.
 
  - Introduce sysctl to set default RPS configuration for new netdevs.
 
  - Define Netlink protocol specification format which can be used
    to describe messages used by each family and auto-generate parsers.
    Add tools for generating kernel data structures and uAPI headers.
 
  - Expose all net/core sysctls inside netns.
 
  - Remove 4s sleep in netpoll if carrier is instantly detected on boot.
 
  - Add configurable limit of MDB entries per port, and port-vlan.
 
  - Continue populating drop reasons throughout the stack.
 
  - Retire a handful of legacy Qdiscs and classifiers.
 
 Protocols
 ---------
 
  - Support IPv4 big TCP (TSO frames larger than 64kB).
 
  - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
    on socket by socket basis.
 
  - Track and report in procfs number of MPTCP sockets used.
 
  - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP
    path manager.
 
  - IPv6: don't check net.ipv6.route.max_size and rely on garbage
    collection to free memory (similarly to IPv4).
 
  - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
 
  - ICMP: add per-rate limit counters.
 
  - Add support for user scanning requests in ieee802154.
 
  - Remove static WEP support.
 
  - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
    reporting.
 
  - WiFi 7 EHT channel puncturing support (client & AP).
 
 BPF
 ---
 
  - Add a rbtree data structure following the "next-gen data structure"
    precedent set by recently added linked list, that is, by using
    kfunc + kptr instead of adding a new BPF map type.
 
  - Expose XDP hints via kfuncs with initial support for RX hash and
    timestamp metadata.
 
  - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key
    to better support decap on GRE tunnel devices not operating
    in collect metadata.
 
  - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
 
  - Remove the need for trace_printk_lock for bpf_trace_printk
    and bpf_trace_vprintk helpers.
 
  - Extend libbpf's bpf_tracing.h support for tracing arguments of
    kprobes/uprobes and syscall as a special case.
 
  - Significantly reduce the search time for module symbols
    by livepatch and BPF.
 
  - Enable cpumasks to be used as kptrs, which is useful for tracing
    programs tracking which tasks end up running on which CPUs in
    different time intervals.
 
  - Add support for BPF trampoline on s390x and riscv64.
 
  - Add capability to export the XDP features supported by the NIC.
 
  - Add __bpf_kfunc tag for marking kernel functions as kfuncs.
 
  - Add cgroup.memory=nobpf kernel parameter option to disable BPF
    memory accounting for container environments.
 
 Netfilter
 ---------
 
  - Remove the CLUSTERIP target. It has been marked as obsolete
    for years, and we still have WARN splats wrt. races of
    the out-of-band /proc interface installed by this target.
 
  - Add 'destroy' commands to nf_tables. They are identical to
    the existing 'delete' commands, but do not return an error if
    the referenced object (set, chain, rule...) did not exist.
 
 Driver API
 ----------
 
  - Improve cpumask_local_spread() locality to help NICs set the right
    IRQ affinity on AMD platforms.
 
  - Separate C22 and C45 MDIO bus transactions more clearly.
 
  - Introduce new DCB table to control DSCP rewrite on egress.
 
  - Support configuration of Physical Layer Collision Avoidance (PLCA)
    Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
    shared medium Ethernet.
 
  - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
    preemption of low priority frames by high priority frames.
 
  - Add support for controlling MACSec offload using netlink SET.
 
  - Rework devlink instance refcounts to allow registration and
    de-registration under the instance lock. Split the code into multiple
    files, drop some of the unnecessarily granular locks and factor out
    common parts of netlink operation handling.
 
  - Add TX frame aggregation parameters (for USB drivers).
 
  - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
    messages with notifications for debug.
 
  - Allow offloading of UDP NEW connections via act_ct.
 
  - Add support for per action HW stats in TC.
 
  - Support hardware miss to TC action (continue processing in SW from
    a specific point in the action chain).
 
  - Warn if old Wireless Extension user space interface is used with
    modern cfg80211/mac80211 drivers. Do not support Wireless Extensions
    for Wi-Fi 7 devices at all. Everyone should switch to using nl80211
    interface instead.
 
  - Improve the CAN bit timing configuration. Use extack to return error
    messages directly to user space, update the SJW handling, including
    the definition of a new default value that will benefit CAN-FD
    controllers, by increasing their oscillator tolerance.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - nVidia BlueField-3 support (control traffic driver)
    - Ethernet support for imx93 SoCs
    - Motorcomm yt8531 gigabit Ethernet PHY
    - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
    - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
    - Amlogic gxl MDIO mux
 
  - WiFi:
    - RealTek RTL8188EU (rtl8xxxu)
    - Qualcomm Wi-Fi 7 devices (ath12k)
 
  - CAN:
    - Renesas R-Car V4H
 
 Drivers
 -------
 
  - Bluetooth:
    - Set Per Platform Antenna Gain (PPAG) for Intel controllers.
 
  - Ethernet NICs:
    - Intel (1G, igc):
      - support TSN / Qbv / packet scheduling features of i226 model
    - Intel (100G, ice):
      - use GNSS subsystem instead of TTY
      - multi-buffer XDP support
      - extend support for GPIO pins to E823 devices
    - nVidia/Mellanox:
      - update the shared buffer configuration on PFC commands
      - implement PTP adjphase function for HW offset control
      - TC support for Geneve and GRE with VF tunnel offload
      - more efficient crypto key management method
      - multi-port eswitch support
    - Netronome/Corigine:
      - add DCB IEEE support
      - support IPsec offloading for NFP3800
    - Freescale/NXP (enetc):
      - enetc: support XDP_REDIRECT for XDP non-linear buffers
      - enetc: improve reconfig, avoid link flap and waiting for idle
      - enetc: support MAC Merge layer
    - Other NICs:
      - sfc/ef100: add basic devlink support for ef100
      - ionic: rx_push mode operation (writing descriptors via MMIO)
      - bnxt: use the auxiliary bus abstraction for RDMA
      - r8169: disable ASPM and reset bus in case of tx timeout
      - cpsw: support QSGMII mode for J721e CPSW9G
      - cpts: support pulse-per-second output
      - ngbe: add an mdio bus driver
      - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
      - r8152: handle devices with FW with NCM support
      - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
      - virtio-net: support multi buffer XDP
      - virtio/vsock: replace virtio_vsock_pkt with sk_buff
      - tsnep: XDP support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add support for latency TLV (in FW control messages)
    - Microchip (sparx5):
      - separate explicit and implicit traffic forwarding rules, make
        the implicit rules always active
      - add support for egress DSCP rewrite
      - IS0 VCAP support (Ingress Classification)
      - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.)
      - ES2 VCAP support (Egress Access Control)
      - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - add MAB (port auth) offload support
      - enable PTP receive for mv88e6390
    - NXP (ocelot):
      - support MAC Merge layer
      - support for the the vsc7512 internal copper phys
    - Microchip:
      - lan9303: convert to PHYLINK
      - lan966x: support TC flower filter statistics
      - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
      - lan937x: support Credit Based Shaper configuration
      - ksz9477: support Energy Efficient Ethernet
    - other:
      - qca8k: convert to regmap read/write API, use bulk operations
      - rswitch: Improve TX timestamp accuracy
 
  - Intel WiFi (iwlwifi):
    - EHT (Wi-Fi 7) rate reporting
    - STEP equalizer support: transfer some STEP (connection to radio
      on platforms with integrated wifi) related parameters from the
      BIOS to the firmware.
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - IPQ5018 support
    - Fine Timing Measurement (FTM) responder role support
    - channel 177 support
 
  - MediaTek WiFi (mt76):
    - per-PHY LED support
    - mt7996: EHT (Wi-Fi 7) support
    - Wireless Ethernet Dispatch (WED) reset support
    - switch to using page pool allocator
 
  - RealTek WiFi (rtw89):
    - support new version of Bluetooth co-existance
 
  - Mobile:
    - rmnet: support TX aggregation.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP1VIYACgkQMUZtbf5S
 IrvsChAApz0rNL/sPKxXTEfxZ1tN7D3sYxYKQPomxvl5BV+MvicrLddJy3KmzEFK
 nnJNO3nuRNuH422JQ/ylZ4mGX1opa6+5QJb0UINImXUI7Fm8HHBIuPGkv7d5CheZ
 7JexFqjPJXUy9nPyh1Rra+IA9AcRd2U7jeGEZR38wb99bHJQj5Bzdk20WArEB0el
 n44aqg49LXH71bSeXRz77x5SjkwVtYiccQxLcnmTbjLU2xVraLvI2J+wAhHnVXWW
 9lrU1+V4Ex2Xcd1xR0L0cHeK+meP1TrPRAeF+JDpVI3a/zJiE7cZjfHdG/jH5xWl
 leZJqghVozrZQNtewWWO7XhUFhMDgFu3W/1vNLjSHPZEqaz1JpM67J1+ql6s63l4
 LMWoXbcYZz+SL9ZRCoPkbGue/5fKSHv8/Jl9Sh58+eTS+c/zgN8uFGRNFXLX1+EP
 n8uvt985PxMd6x1+dHumhOUzxnY4Sfi1vjitSunTsNFQ3Cmp4SO0IfBVJWfLUCuC
 xz5hbJGJJbSpvUsO+HWyCg83E5OWghRE/Onpt2jsQSZCrO9HDg4FRTEf3WAMgaqc
 edb5KfbRZPTJQM08gWdluXzSk1nw3FNP2tXW4XlgUrEbjb+fOk0V9dQg2gyYTxQ1
 Nhvn8ZQPi6/GMMELHAIPGmmW1allyOGiAzGlQsv8EmL+OFM6WDI=
 =xXhC
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Add dedicated kmem_cache for typical/small skb->head, avoid having
     to access struct page at kfree time, and improve memory use.

   - Introduce sysctl to set default RPS configuration for new netdevs.

   - Define Netlink protocol specification format which can be used to
     describe messages used by each family and auto-generate parsers.
     Add tools for generating kernel data structures and uAPI headers.

   - Expose all net/core sysctls inside netns.

   - Remove 4s sleep in netpoll if carrier is instantly detected on
     boot.

   - Add configurable limit of MDB entries per port, and port-vlan.

   - Continue populating drop reasons throughout the stack.

   - Retire a handful of legacy Qdiscs and classifiers.

  Protocols:

   - Support IPv4 big TCP (TSO frames larger than 64kB).

   - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
     on socket by socket basis.

   - Track and report in procfs number of MPTCP sockets used.

   - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
     manager.

   - IPv6: don't check net.ipv6.route.max_size and rely on garbage
     collection to free memory (similarly to IPv4).

   - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).

   - ICMP: add per-rate limit counters.

   - Add support for user scanning requests in ieee802154.

   - Remove static WEP support.

   - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
     reporting.

   - WiFi 7 EHT channel puncturing support (client & AP).

  BPF:

   - Add a rbtree data structure following the "next-gen data structure"
     precedent set by recently added linked list, that is, by using
     kfunc + kptr instead of adding a new BPF map type.

   - Expose XDP hints via kfuncs with initial support for RX hash and
     timestamp metadata.

   - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
     better support decap on GRE tunnel devices not operating in collect
     metadata.

   - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.

   - Remove the need for trace_printk_lock for bpf_trace_printk and
     bpf_trace_vprintk helpers.

   - Extend libbpf's bpf_tracing.h support for tracing arguments of
     kprobes/uprobes and syscall as a special case.

   - Significantly reduce the search time for module symbols by
     livepatch and BPF.

   - Enable cpumasks to be used as kptrs, which is useful for tracing
     programs tracking which tasks end up running on which CPUs in
     different time intervals.

   - Add support for BPF trampoline on s390x and riscv64.

   - Add capability to export the XDP features supported by the NIC.

   - Add __bpf_kfunc tag for marking kernel functions as kfuncs.

   - Add cgroup.memory=nobpf kernel parameter option to disable BPF
     memory accounting for container environments.

  Netfilter:

   - Remove the CLUSTERIP target. It has been marked as obsolete for
     years, and we still have WARN splats wrt races of the out-of-band
     /proc interface installed by this target.

   - Add 'destroy' commands to nf_tables. They are identical to the
     existing 'delete' commands, but do not return an error if the
     referenced object (set, chain, rule...) did not exist.

  Driver API:

   - Improve cpumask_local_spread() locality to help NICs set the right
     IRQ affinity on AMD platforms.

   - Separate C22 and C45 MDIO bus transactions more clearly.

   - Introduce new DCB table to control DSCP rewrite on egress.

   - Support configuration of Physical Layer Collision Avoidance (PLCA)
     Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
     shared medium Ethernet.

   - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
     preemption of low priority frames by high priority frames.

   - Add support for controlling MACSec offload using netlink SET.

   - Rework devlink instance refcounts to allow registration and
     de-registration under the instance lock. Split the code into
     multiple files, drop some of the unnecessarily granular locks and
     factor out common parts of netlink operation handling.

   - Add TX frame aggregation parameters (for USB drivers).

   - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
     messages with notifications for debug.

   - Allow offloading of UDP NEW connections via act_ct.

   - Add support for per action HW stats in TC.

   - Support hardware miss to TC action (continue processing in SW from
     a specific point in the action chain).

   - Warn if old Wireless Extension user space interface is used with
     modern cfg80211/mac80211 drivers. Do not support Wireless
     Extensions for Wi-Fi 7 devices at all. Everyone should switch to
     using nl80211 interface instead.

   - Improve the CAN bit timing configuration. Use extack to return
     error messages directly to user space, update the SJW handling,
     including the definition of a new default value that will benefit
     CAN-FD controllers, by increasing their oscillator tolerance.

  New hardware / drivers:

   - Ethernet:
      - nVidia BlueField-3 support (control traffic driver)
      - Ethernet support for imx93 SoCs
      - Motorcomm yt8531 gigabit Ethernet PHY
      - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
      - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
      - Amlogic gxl MDIO mux

   - WiFi:
      - RealTek RTL8188EU (rtl8xxxu)
      - Qualcomm Wi-Fi 7 devices (ath12k)

   - CAN:
      - Renesas R-Car V4H

  Drivers:

   - Bluetooth:
      - Set Per Platform Antenna Gain (PPAG) for Intel controllers.

   - Ethernet NICs:
      - Intel (1G, igc):
         - support TSN / Qbv / packet scheduling features of i226 model
      - Intel (100G, ice):
         - use GNSS subsystem instead of TTY
         - multi-buffer XDP support
         - extend support for GPIO pins to E823 devices
      - nVidia/Mellanox:
         - update the shared buffer configuration on PFC commands
         - implement PTP adjphase function for HW offset control
         - TC support for Geneve and GRE with VF tunnel offload
         - more efficient crypto key management method
         - multi-port eswitch support
      - Netronome/Corigine:
         - add DCB IEEE support
         - support IPsec offloading for NFP3800
      - Freescale/NXP (enetc):
         - support XDP_REDIRECT for XDP non-linear buffers
         - improve reconfig, avoid link flap and waiting for idle
         - support MAC Merge layer
      - Other NICs:
         - sfc/ef100: add basic devlink support for ef100
         - ionic: rx_push mode operation (writing descriptors via MMIO)
         - bnxt: use the auxiliary bus abstraction for RDMA
         - r8169: disable ASPM and reset bus in case of tx timeout
         - cpsw: support QSGMII mode for J721e CPSW9G
         - cpts: support pulse-per-second output
         - ngbe: add an mdio bus driver
         - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
         - r8152: handle devices with FW with NCM support
         - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
         - virtio-net: support multi buffer XDP
         - virtio/vsock: replace virtio_vsock_pkt with sk_buff
         - tsnep: XDP support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add support for latency TLV (in FW control messages)
      - Microchip (sparx5):
         - separate explicit and implicit traffic forwarding rules, make
           the implicit rules always active
         - add support for egress DSCP rewrite
         - IS0 VCAP support (Ingress Classification)
         - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
           etc.)
         - ES2 VCAP support (Egress Access Control)
         - support for Per-Stream Filtering and Policing (802.1Q,
           8.6.5.1)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - add MAB (port auth) offload support
         - enable PTP receive for mv88e6390
      - NXP (ocelot):
         - support MAC Merge layer
         - support for the the vsc7512 internal copper phys
      - Microchip:
         - lan9303: convert to PHYLINK
         - lan966x: support TC flower filter statistics
         - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
         - lan937x: support Credit Based Shaper configuration
         - ksz9477: support Energy Efficient Ethernet
      - other:
         - qca8k: convert to regmap read/write API, use bulk operations
         - rswitch: Improve TX timestamp accuracy

   - Intel WiFi (iwlwifi):
      - EHT (Wi-Fi 7) rate reporting
      - STEP equalizer support: transfer some STEP (connection to radio
        on platforms with integrated wifi) related parameters from the
        BIOS to the firmware.

   - Qualcomm 802.11ax WiFi (ath11k):
      - IPQ5018 support
      - Fine Timing Measurement (FTM) responder role support
      - channel 177 support

   - MediaTek WiFi (mt76):
      - per-PHY LED support
      - mt7996: EHT (Wi-Fi 7) support
      - Wireless Ethernet Dispatch (WED) reset support
      - switch to using page pool allocator

   - RealTek WiFi (rtw89):
      - support new version of Bluetooth co-existance

   - Mobile:
      - rmnet: support TX aggregation"

* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
  page_pool: add a comment explaining the fragment counter usage
  net: ethtool: fix __ethtool_dev_mm_supported() implementation
  ethtool: pse-pd: Fix double word in comments
  xsk: add linux/vmalloc.h to xsk.c
  sefltests: netdevsim: wait for devlink instance after netns removal
  selftest: fib_tests: Always cleanup before exit
  net/mlx5e: Align IPsec ASO result memory to be as required by hardware
  net/mlx5e: TC, Set CT miss to the specific ct action instance
  net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  net/mlx5: Refactor tc miss handling to a single function
  net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  net/sched: flower: Support hardware miss to tc action
  net/sched: flower: Move filter handle initialization earlier
  net/sched: cls_api: Support hardware miss to tc action
  net/sched: Rename user cookie and act cookie
  sfc: fix builds without CONFIG_RTC_LIB
  sfc: clean up some inconsistent indentings
  net/mlx4_en: Introduce flexible array to silence overflow warning
  net: lan966x: Fix possible deadlock inside PTP
  net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
  ...
2023-02-21 18:24:12 -08:00
Linus Torvalds
36289a03bc This update includes the following changes:
API:
 
 - Use kmap_local instead of kmap_atomic.
 - Change request callback to take void pointer.
 - Print FIPS status in /proc/crypto (when enabled).
 
 Algorithms:
 
 - Add rfc4106/gcm support on arm64.
 - Add ARIA AVX2/512 support on x86.
 
 Drivers:
 
 - Add TRNG driver for StarFive SoC.
 - Delete ux500/hash driver (subsumed by stm32/hash).
 - Add zlib support in qat.
 - Add RSA support in aspeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmPzAiwACgkQxycdCkmx
 i6et8xAAoO3w5MZFGXMzWsYhfSZFdceXBEQfDR7JOCdHxpMIQhw0FLlb0uttFk6m
 SeWrdP9wiifBDoCmw7qffFJml8ZftPL/XeXjob2d9v7jKbPyw3lDSIdsNfN/5EEL
 oIc9915zwrgawvahPAa+PQ4Ue03qRjUyOcV42dpd1W3NYhzDVHoK5OUU+mEFYDvx
 Sgw/YUugKf0VXkVDFzG5049+CPcheyRZqclAo9jyl2eZiXujgUyV33nxRCtqIA+t
 7jlHKwi+6QzFHY0CX5BvShR8xyEuH5MLoU3H/jYGXnRb3nEpRYAEO4VZchIHqF0F
 Y6pKIKc6Q8OyIVY8RsjQY3hioCqYnQFZ5Xtc1zGtOYEitVLbkmItMG0mVn0XOfyt
 gJDi6gkEw5uPUbEQdI4R1xEgJ8eCckMsOJ+uRxqTm+uLqNDxPbsB9bohKniMogXV
 lDlVXjU23AA9VeKtqU8FvWjfgqsN47X4aoq1j4/4aI7X9F7P9FOP21TZloP7+ssj
 PFrzNaRXUrMEsvyS1wqPegIh987lj6WkH4hyU0wjzaIq4IQELidHsSXFS12iWIPH
 kTEoC/trAVoYSr0zXKWUCs4h/x0FztVNbjs4KiDP2FLXX1RzeVZ0WlaXZhryHr+n
 1+8yCuS6tVofAbSX0wNkZdf0x5+3CIBw4kqSIvjKDPYYEfIDaT0=
 =dMYe
 -----END PGP SIGNATURE-----

Merge tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto update from Herbert Xu:
 "API:
   - Use kmap_local instead of kmap_atomic
   - Change request callback to take void pointer
   - Print FIPS status in /proc/crypto (when enabled)

  Algorithms:
   - Add rfc4106/gcm support on arm64
   - Add ARIA AVX2/512 support on x86

  Drivers:
   - Add TRNG driver for StarFive SoC
   - Delete ux500/hash driver (subsumed by stm32/hash)
   - Add zlib support in qat
   - Add RSA support in aspeed"

* tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (156 commits)
  crypto: x86/aria-avx - Do not use avx2 instructions
  crypto: aspeed - Fix modular aspeed-acry
  crypto: hisilicon/qm - fix coding style issues
  crypto: hisilicon/qm - update comments to match function
  crypto: hisilicon/qm - change function names
  crypto: hisilicon/qm - use min() instead of min_t()
  crypto: hisilicon/qm - remove some unused defines
  crypto: proc - Print fips status
  crypto: crypto4xx - Call dma_unmap_page when done
  crypto: octeontx2 - Fix objects shared between several modules
  crypto: nx - Fix sparse warnings
  crypto: ecc - Silence sparse warning
  tls: Pass rec instead of aead_req into tls_encrypt_done
  crypto: api - Remove completion function scaffolding
  tls: Remove completion function scaffolding
  tipc: Remove completion function scaffolding
  net: ipv6: Remove completion function scaffolding
  net: ipv4: Remove completion function scaffolding
  net: macsec: Remove completion function scaffolding
  dm: Remove completion function scaffolding
  ...
2023-02-21 18:10:50 -08:00
Linus Torvalds
b8878e5a5c hyperv-next for v6.3.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmPzgDgTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXrc7CACfG4SSd8KkWU/y8Q66Irxdau0a3ETD
 KL4UNRKGIyKujufgFsme79O6xVSSsCNSay449wk20hqn8lnwbSRi9pUwmLn29hfd
 CMFleWIqgwGFfC1do5DRF1vrt1siuG/jVE07mWsEwuY2iHx/es+H7LiQKidhkndZ
 DhXRqoi7VYiJv5fRSumpkUJrMZiI96o9Mk09HUksdMwCn3+7RQEqHnlTH5KOozKF
 iMroDB72iNw5Na/USZwWL2EDRptENam3lFkPBeDPqNw0SbG4g65JGPR9DSa0Lkbq
 AGCJQkdU33mcYQG5MY7R4K1evufpOl/apqLW7h92j45Znr9ok6Vr2c1R
 =J1VT
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20230220' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - allow Linux to run as the nested root partition for Microsoft
   Hypervisor (Jinank Jain and Nuno Das Neves)

 - clean up the return type of callback functions (Dawei Li)

* tag 'hyperv-next-signed-20230220' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Fix hv_get/set_register for nested bringup
  Drivers: hv: Make remove callback of hyperv driver void returned
  Drivers: hv: Enable vmbus driver for nested root partition
  x86/hyperv: Add an interface to do nested hypercalls
  Drivers: hv: Setup synic registers in case of nested root partition
  x86/hyperv: Add support for detecting nested hypervisor
2023-02-21 16:59:23 -08:00
Florian Westphal
fdf6491193 netfilter: ctnetlink: make event listener tracking global
pernet tracking doesn't work correctly because other netns might have
set NETLINK_LISTEN_ALL_NSID on its event socket.

In this case its expected that events originating in other net
namespaces are also received.

Making pernet-tracking work while also honoring NETLINK_LISTEN_ALL_NSID
requires much more intrusive changes both in netlink and nfnetlink,
f.e. adding a 'setsockopt' callback that lets nfnetlink know that the
event socket entered (or left) ALL_NSID mode.

Move to global tracking instead: if there is an event socket anywhere
on the system, all net namespaces which have conntrack enabled and
use autobind mode will allocate the ecache extension.

netlink_has_listeners() returns false only if the given group has no
subscribers in any net namespace, the 'net' argument passed to
nfnetlink_has_listeners is only used to derive the protocol (nfnetlink),
it has no other effect.

For proper NETLINK_LISTEN_ALL_NSID-aware pernet tracking of event
listeners a new netlink_has_net_listeners() is also needed.

Fixes: 90d1daa458 ("netfilter: conntrack: add nf_conntrack_events autodetect mode")
Reported-by: Bryce Kahle <bryce.kahle@datadoghq.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-22 00:28:47 +01:00
Xin Long
05c07c0c6c netfilter: xt_length: use skb len to match in length_mt6
For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
length for the jumbo packets, it may mismatch.

To fix this, we can just use skb->len instead of parsing exthdrs, as
the hbh exthdr parsing has been done before coming to length_mt6 in
ip6_rcv_core() and br_validate_ipv6() and also the packet has been
trimmed according to the correct IPv6 (ext)hdr length there, and skb
len is trustable in length_mt6().

Note that this patch is especially needed after the IPv6 BIG TCP was
supported in kernel, which is using IPv6 Jumbo packets. Besides, to
match the packets greater than 65535 more properly, a v1 revision of
xt_length may be needed to extend "min, max" to u32 in the future,
and for now the IPv6 Jumbo packets can be matched by:

  # ip6tables -m length ! --length 0:65535

Fixes: 7c4e983c4f ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-22 00:28:39 +01:00
Florian Westphal
e58a171d35 netfilter: ebtables: fix table blob use-after-free
We are not allowed to return an error at this point.
Looking at the code it looks like ret is always 0 at this
point, but its not.

t = find_table_lock(net, repl->name, &ret, &ebt_mutex);

... this can return a valid table, with ret != 0.

This bug causes update of table->private with the new
blob, but then frees the blob right away in the caller.

Syzbot report:

BUG: KASAN: vmalloc-out-of-bounds in __ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
Read of size 4 at addr ffffc90005425000 by task kworker/u4:4/74
Workqueue: netns cleanup_net
Call Trace:
 kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
 __ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
 ebt_unregister_table+0x35/0x40 net/bridge/netfilter/ebtables.c:1372
 ops_exit_list+0xb0/0x170 net/core/net_namespace.c:169
 cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613
...

ip(6)tables appears to be ok (ret should be 0 at this point) but make
this more obvious.

Fixes: c58dd2dd44 ("netfilter: Can't fail and free after table replacement")
Reported-by: syzbot+f61594de72d6705aea03@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-22 00:22:31 +01:00
Phil Sutter
efb056e5f1 netfilter: ip6t_rpfilter: Fix regression with VRF interfaces
When calling ip6_route_lookup() for the packet arriving on the VRF
interface, the result is always the real (slave) interface. Expect this
when validating the result.

Fixes: acc641ab95 ("netfilter: rpfilter/fib: Populate flowic_l3mdev field")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-22 00:22:20 +01:00
Florian Westphal
e6d57e9ff0 netfilter: conntrack: fix rmmod double-free race
nf_conntrack_hash_check_insert() callers free the ct entry directly, via
nf_conntrack_free.

This isn't safe anymore because
nf_conntrack_hash_check_insert() might place the entry into the conntrack
table and then delteted the entry again because it found that a conntrack
extension has been removed at the same time.

In this case, the just-added entry is removed again and an error is
returned to the caller.

Problem is that another cpu might have picked up this entry and
incremented its reference count.

This results in a use-after-free/double-free, once by the other cpu and
once by the caller of nf_conntrack_hash_check_insert().

Fix this by making nf_conntrack_hash_check_insert() not fail anymore
after the insertion, just like before the 'Fixes' commit.

This is safe because a racing nf_ct_iterate() has to wait for us
to release the conntrack hash spinlocks.

While at it, make the function return -EAGAIN in the rmmod (genid
changed) case, this makes nfnetlink replay the command (suggested
by Pablo Neira).

Fixes: c56716c69c ("netfilter: extensions: introduce extension genid count")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-22 00:19:38 +01:00
Hangyu Hua
ac4893980b netfilter: ctnetlink: fix possible refcount leak in ctnetlink_create_conntrack()
nf_ct_put() needs to be called to put the refcount got by
nf_conntrack_find_get() to avoid refcount leak when
nf_conntrack_hash_check_insert() fails.

Fixes: 7d367e0668 ("netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-22 00:06:19 +01:00
Linus Torvalds
4a7d37e824 hardening updates for v6.3-rc1
- Replace 0-length and 1-element arrays with flexible arrays in various
   subsystems (Paulo Miguel Almeida, Stephen Rothwell, Kees Cook)
 
 - randstruct: Disable Clang 15 support (Eric Biggers)
 
 - GCC plugins: Drop -std=gnu++11 flag (Sam James)
 
 - strpbrk(): Refactor to use strchr() (Andy Shevchenko)
 
 - LoadPin LSM: Allow root filesystem switching when non-enforcing
 
 - fortify: Use dynamic object size hints when available
 
 - ext4: Fix CFI function prototype mismatch
 
 - Nouveau: Fix DP buffer size arguments
 
 - hisilicon: Wipe entire crypto DMA pool on error
 
 - coda: Fully allocate sig_inputArgs
 
 - UBSAN: Improve arm64 trap code reporting
 
 - copy_struct_from_user(): Add minimum bounds check on kernel buffer size
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmPv1Y8WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJg5UD/9x3Lx0EG3iL4qPtjmohaXd899r
 AzP1ysoxYnmo/cY0//W3DPCJrUaVlTm7M2xXOpzi7YPVD8Jcofzy6Uxm9BiG/OJ9
 bla7uQixlDMA2MBmWzAXhM7337WgEtBcr6kbXk6rHFnzmk8CdAY3wjmLmiefxEWT
 gkdeJlbkBFynssSF2nejgCvr/ZyiWQr2V9hRdEavLQH/MDS785bmNwbLyUNqK+eo
 gOtuyjyV90t+cSIN0bF7gOCFGf1ivKA/+GNFrob0jY0Fy2kGx1I2wQMn9yzjzerC
 o6Majz9r+7Z7xIaz2Pm9nDaWyZDI05RfoRpQZ9dSEJ+zYgbFBFpDpJShcJvSpNa0
 POqeR400n/6VWBcbk7UU0s7VCVU13IsOFhBSVMQM5FfzIcUkj0/VBm0Jm0ODrpM9
 13/nKyAkvHkH0uSJbQjn79rXvEvqQyi5f28emm2CuhiHHUiDEUdsmMD7fE8UXo4r
 U8dgfwTOLLQBKmOQJcgiLo8iLDPhatZKYQAZ7LMY9kbHLsJlRVxfzY9PriNCuI5o
 XuMLJG33TrlUDfqQrKeSJ9srVRiiIBAzoWnIfIVE3Xb46LqFNXVRdJCt4A2678jn
 gYIzkQ2HbVe2chUhUyjsjGTjmmeX9qZG0UOlhRQ0RvWFxi390wwYqhkSaOEGtDGv
 QbVh0Lb86m3H/G+M9g==
 =XnVa
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "Beyond some specific LoadPin, UBSAN, and fortify features, there are
  other fixes scattered around in various subsystems where maintainers
  were okay with me carrying them in my tree or were non-responsive but
  the patches were reviewed by others:

   - Replace 0-length and 1-element arrays with flexible arrays in
     various subsystems (Paulo Miguel Almeida, Stephen Rothwell, Kees
     Cook)

   - randstruct: Disable Clang 15 support (Eric Biggers)

   - GCC plugins: Drop -std=gnu++11 flag (Sam James)

   - strpbrk(): Refactor to use strchr() (Andy Shevchenko)

   - LoadPin LSM: Allow root filesystem switching when non-enforcing

   - fortify: Use dynamic object size hints when available

   - ext4: Fix CFI function prototype mismatch

   - Nouveau: Fix DP buffer size arguments

   - hisilicon: Wipe entire crypto DMA pool on error

   - coda: Fully allocate sig_inputArgs

   - UBSAN: Improve arm64 trap code reporting

   - copy_struct_from_user(): Add minimum bounds check on kernel buffer
     size"

* tag 'hardening-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  randstruct: disable Clang 15 support
  uaccess: Add minimum bounds check on kernel buffer size
  arm64: Support Clang UBSAN trap codes for better reporting
  coda: Avoid partial allocation of sig_inputArgs
  gcc-plugins: drop -std=gnu++11 to fix GCC 13 build
  lib/string: Use strchr() in strpbrk()
  crypto: hisilicon: Wipe entire pool on error
  net/i40e: Replace 0-length array with flexible array
  io_uring: Replace 0-length array with flexible array
  ext4: Fix function prototype mismatch for ext4_feat_ktype
  i915/gvt: Replace one-element array with flexible-array member
  drm/nouveau/disp: Fix nvif_outp_acquire_dp() argument size
  LoadPin: Allow filesystem switch when not enforcing
  LoadPin: Move pin reporting cleanly out of locking
  LoadPin: Refactor sysctl initialization
  LoadPin: Refactor read-only check into a helper
  ARM: ixp4xx: Replace 0-length arrays with flexible arrays
  fortify: Use __builtin_dynamic_object_size() when available
  rxrpc: replace zero-lenth array with DECLARE_FLEX_ARRAY() helper
2023-02-21 11:07:23 -08:00
Jakub Kicinski
d1fabc68f8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Per-next-PR merge.

net/smc/af_smc.c
  b5dd4d6981 ("net/smc: llc_conf_mutex refactor, replace it with rw_semaphore")
  e40b801b36 ("net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()")
https://lore.kernel.org/all/20230221124008.6303c330@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21 09:29:25 -08:00
Vladimir Oltean
a00da30c05 net: ethtool: fix __ethtool_dev_mm_supported() implementation
The MAC Merge layer is supported when ops->get_mm() returns 0.
The implementation was changed during review, and in this process, a bug
was introduced.

Link: https://lore.kernel.org/netdev/20230111161706.1465242-5-vladimir.oltean@nxp.com/
Fixes: 04692c9020 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Link: https://lore.kernel.org/all/20230220122343.1156614-2-vladimir.oltean@nxp.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21 09:05:01 -08:00
Bo Liu
7ec077744a ethtool: pse-pd: Fix double word in comments
Remove the repeated word "for" in comments.

Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20230221083036.2414-1-liubo03@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21 09:02:13 -08:00
Xuan Zhuo
951bce29c8 xsk: add linux/vmalloc.h to xsk.c
Fix the failure of the compilation under the sh4.

Because we introduced remap_vmalloc_range() earlier, this has caused
the compilation failure on the sh4 platform. So this introduction of the
header file of linux/vmalloc.h.

config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20230221/202302210041.kpPQLlNQ-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=9f78bf330a66cd400b3e00f370f597e9fa939207
        git remote add net-next https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
        git fetch --no-tags net-next master
        git checkout 9f78bf330a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh SHELL=/bin/bash net/

Fixes: 9f78bf330a ("xsk: support use vaddr as ring")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302210041.kpPQLlNQ-lkp@intel.com/
Link: https://lore.kernel.org/r/20230221075140.46988-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21 09:00:09 -08:00
Paul Blakey
606c7c43d0 net/sched: flower: Support hardware miss to tc action
To support hardware miss to tc action in actions on the flower
classifier, implement the required getting of filter actions,
and setup filter exts (actions) miss by giving it the filter's
handle and actions.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:46:10 -08:00
Paul Blakey
08a0063df3 net/sched: flower: Move filter handle initialization earlier
To support miss to action during hardware offload the filter's
handle is needed when setting up the actions (tcf_exts_init()),
and before offloading.

Move filter handle initialization earlier.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:46:10 -08:00
Paul Blakey
80cd22c35c net/sched: cls_api: Support hardware miss to tc action
For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.

CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.

Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:46:10 -08:00
Paul Blakey
db4b49025c net/sched: Rename user cookie and act cookie
struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.

Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:46:10 -08:00
Jakub Kicinski
871489dd01 Merge tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:

====================
pull-request: ieee802154-next 2023-02-20

Miquel Raynal build upon his earlier work and introduced two new
features into the ieee802154 stack. Beaconing to announce existing
PAN's and passive scanning to discover the beacons and associated
PAN's. The matching changes to the userspace configuration tool
have been posted as well and will be released together with the
kernel release.

Arnd Bergmann and Dmitry Torokhov worked on converting the
at86rf230 and cc2520 drivers away from the unused platform_data
usage and towards the new gpiod API. (I had to add a revert as
Dmitry found a regression on an already pushed tree on my side).

Changes since v1 (pull request 2023-02-02)
- Netlink API extack and NLA_POLICY* usage as suggested by Jakub
- Removed always true condition found by kernel test robot
- Simplify device removal with running background job for scanning
- Fix problems with beacon sending in some cases by using the MLME
  tx path

* tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
  ieee802154: Drop device trackers
  mac802154: Fix an always true condition
  mac802154: Send beacons using the MLME Tx path
  ieee802154: Change error code on monitor scan netlink request
  ieee802154: Convert scan error messages to extack
  ieee802154: Use netlink policies when relevant on scan parameters
  ieee802154: at86rf230: switch to using gpiod API
  ieee802154: at86rf230: drop support for platform data
  Revert "at86rf230: convert to gpio descriptors"
  cc2520: move to gpio descriptors
  mac802154: Avoid superfluous endianness handling
  at86rf230: convert to gpio descriptors
  mac802154: Handle basic beaconing
  ieee802154: Add support for user beaconing requests
  mac802154: Handle passive scanning
  mac802154: Add MLME Tx locked helpers
  mac802154: Prepare forcing specific symbol duration
  ieee802154: Introduce a helper to validate a channel
  ieee802154: Define a beacon frame header
  ieee802154: Add support for user scanning requests
====================

Link: https://lore.kernel.org/r/20230220213749.386451-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:40:52 -08:00
Kuniyuki Iwashima
be9832c2e9 net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
Commit 2c02d41d71 ("net/ulp: prevent ULP without clone op from entering
the LISTEN status") guarantees that all ULP listeners have clone() op, so
we no longer need to test it in inet_clone_ulp().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:31:49 -08:00
Jakub Kicinski
ee8d72a157 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+/uBgAKCRDbK58LschI
 g0ngAPwJHd1RicBuy2C4fLv0nGKZtmYZBAnTGlI2RisPxU6BRwEAwUDLHuc5K6nR
 j261okOxOy/MRxdN1NhmR6Qe7nMyQAk=
 =tYU+
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-02-17

We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).

The main changes are:

1) Add a rbtree data structure following the "next-gen data structure"
   precedent set by recently-added linked-list, that is, by using
   kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.

2) Add a new benchmark for hashmap lookups to BPF selftests,
   from Anton Protopopov.

3) Fix bpf_fib_lookup to only return valid neighbors and add an option
   to skip the neigh table lookup, from Martin KaFai Lau.

4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
   accouting for container environments, from Yafang Shao.

5) Batch of ice multi-buffer and driver performance fixes,
   from Alexander Lobakin.

6) Fix a bug in determining whether global subprog's argument is
   PTR_TO_CTX, which is based on type names which breaks kprobe progs,
   from Andrii Nakryiko.

7) Prep work for future -mcpu=v4 LLVM option which includes usage of
   BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
   from Eduard Zingerman.

8) More prep work for later building selftests with Memory Sanitizer
   in order to detect usages of undefined memory, from Ilya Leoshkevich.

9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
   dereference via sendmsg(), from Maciej Fijalkowski.

10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.

11) Fix BPF memory allocator in combination with BPF hashtab where it could
    corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.

12) Fix LoongArch BPF JIT to always use 4 instructions for function
    address so that instruction sequences don't change between passes,
    from Hengqi Chen.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
  selftests/bpf: Add bpf_fib_lookup test
  bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
  riscv, bpf: Add bpf trampoline support for RV64
  riscv, bpf: Add bpf_arch_text_poke support for RV64
  riscv, bpf: Factor out emit_call for kernel and bpf context
  riscv: Extend patch_text for multiple instructions
  Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
  selftests/bpf: Add global subprog context passing tests
  selftests/bpf: Convert test_global_funcs test to test_loader framework
  bpf: Fix global subprog context argument resolution logic
  LoongArch, bpf: Use 4 instructions for function address in JIT
  bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
  bpf: Disable bh in bpf_test_run for xdp and tc prog
  xsk: check IFF_UP earlier in Tx path
  Fix typos in selftest/bpf files
  selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
  ...
====================

Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:31:14 -08:00
Linus Torvalds
5b0ed59649 for-6.3/block-2023-02-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPvfncQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpob2EADXJxcr2jjYHm/7cjKkyuVX8fr80dNdMeuY
 JFdsjG1k6Uj73BVhQQWYTcs/PsrWBHWRsv6uz4WgOELj55eXmf5Q0kJszyUeJW33
 /DjqLvtoppVcYf80xE13wKvCfn73BjwQo6xkGM0qAYn15eaXiD/Ax3xC6eJlsBeK
 PEw7EJyhacbSxZa/1D2B6+mqII1jUQWProTCc3udZ4JHi3WvdWa3Rda0qCqHl4a1
 +K2aP2YTFIRPxBzfMNa/CafWVIFubTdht+4Ds6R60RImzB9e0VUBfcsiUyW5Zg7L
 Fwv7ptXuWrALwVNdW56Oz1QikBxn2pdRR2HMLwKJW1MD8kP9r8LMm2jV5Rhiwe0B
 OQsGRYkOzBvw+bxeP5fvk0iPGVMz6ActH4gkraA5QdLqayDaFYOadlhqz0uRo5SH
 Fb42Vl658K/MHDSIk8U58TNkmrsIJsBGohXI9DOGINPPvv3XOPi4Q1HmXkGRmii0
 y+lNU/QEGh7xXXew29SPP76uQpQaYfC7NxXCMw/OpOMwehzjsjshmM2lpxi8zsgt
 PJUmfHv5qxCplNmTJXmUpmX7sS7550HUdu9FJb13DM+gzKg8bk9jWVuLrzqrVlG5
 1hKWEl1+heg1heRfaIuJVLbPI0au6Sb4uqhih/PHyrP9TWIoAruDbDJM65GKTxyE
 2uEgcHzHQw==
 =poRc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe updates via Christoph:
      - Small improvements to the logging functionality (Amit Engel)
      - Authentication cleanups (Hannes Reinecke)
      - Cleanup and optimize the DMA mapping cod in the PCIe driver
        (Keith Busch)
      - Work around the command effects for Format NVM (Keith Busch)
      - Misc cleanups (Keith Busch, Christoph Hellwig)
      - Fix and cleanup freeing single sgl (Keith Busch)

 - MD updates via Song:
      - Fix a rare crash during the takeover process
      - Don't update recovery_cp when curr_resync is ACTIVE
      - Free writes_pending in md_stop
      - Change active_io to percpu

 - Updates to drbd, inching us closer to unifying the out-of-tree driver
   with the in-tree one (Andreas, Christoph, Lars, Robert)

 - BFQ update adding support for multi-actuator drives (Paolo, Federico,
   Davide)

 - Make brd compliant with REQ_NOWAIT (me)

 - Fix for IOPOLL and queue entering, fixing stalled IO waiting on
   timeouts (me)

 - Fix for REQ_NOWAIT with multiple bios (me)

 - Fix memory leak in blktrace cleanup (Greg)

 - Clean up sbitmap and fix a potential hang (Kemeng)

 - Clean up some bits in BFQ, and fix a bug in the request injection
   (Kemeng)

 - Clean up the request allocation and issue code, and fix some bugs
   related to that (Kemeng)

 - ublk updates and fixes:
      - Add support for unprivileged ublk (Ming)
      - Improve device deletion handling (Ming)
      - Misc (Liu, Ziyang)

 - s390 dasd fixes (Alexander, Qiheng)

 - Improve utility of request caching and fixes (Anuj, Xiao)

 - zoned cleanups (Pankaj)

 - More constification for kobjs (Thomas)

 - blk-iocost cleanups (Yu)

 - Remove bio splitting from drivers that don't need it (Christoph)

 - Switch blk-cgroups to use struct gendisk. Some of this is now
   incomplete as select late reverts were done. (Christoph)

 - Add bvec initialization helpers, and convert callers to use that
   rather than open-coding it (Christoph)

 - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
   Matthew, Ulf, Zhong)

* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
  brd: use radix_tree_maybe_preload instead of radix_tree_preload
  block: use proper return value from bio_failfast()
  block: bio-integrity: Copy flags when bio_integrity_payload is cloned
  block: Fix io statistics for cgroup in throttle path
  brd: mark as nowait compatible
  brd: check for REQ_NOWAIT and set correct page allocation mask
  brd: return 0/-error from brd_insert_page()
  block: sync mixed merged request's failfast with 1st bio's
  Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
  Revert "blk-cgroup: pass a gendisk to blkg_lookup"
  Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
  Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
  Revert "blk-cgroup: move the cgroup information to struct gendisk"
  nvme-pci: remove iod use_sgls
  nvme-pci: fix freeing single sgl
  block: ublk: check IO buffer based on flag need_get_data
  s390/dasd: Fix potential memleak in dasd_eckd_init()
  s390/dasd: sort out physical vs virtual pointers usage
  block: Remove the ALLOC_CACHE_SLACK constant
  block: make kobj_type structures constant
  ...
2023-02-20 14:27:21 -08:00
Linus Torvalds
05e6295f7b fs.idmapped.v6.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY+5NlQAKCRCRxhvAZXjc
 orOaAP9i2h3OJy95nO2Fpde0Bt2UT+oulKCCcGlvXJ8/+TQpyQD/ZQq47gFQ0EAz
 Br5NxeyGeecAb0lHpFz+CpLGsxMrMwQ=
 =+BG5
 -----END PGP SIGNATURE-----

Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfs idmapping updates from Christian Brauner:

 - Last cycle we introduced the dedicated struct mnt_idmap type for
   mount idmapping and the required infrastucture in 256c8aed2b ("fs:
   introduce dedicated idmap type for mounts"). As promised in last
   cycle's pull request message this converts everything to rely on
   struct mnt_idmap.

   Currently we still pass around the plain namespace that was attached
   to a mount. This is in general pretty convenient but it makes it easy
   to conflate namespaces that are relevant on the filesystem with
   namespaces that are relevant on the mount level. Especially for
   non-vfs developers without detailed knowledge in this area this was a
   potential source for bugs.

   This finishes the conversion. Instead of passing the plain namespace
   around this updates all places that currently take a pointer to a
   mnt_userns with a pointer to struct mnt_idmap.

   Now that the conversion is done all helpers down to the really
   low-level helpers only accept a struct mnt_idmap argument instead of
   two namespace arguments.

   Conflating mount and other idmappings will now cause the compiler to
   complain loudly thus eliminating the possibility of any bugs. This
   makes it impossible for filesystem developers to mix up mount and
   filesystem idmappings as they are two distinct types and require
   distinct helpers that cannot be used interchangeably.

   Everything associated with struct mnt_idmap is moved into a single
   separate file. With that change no code can poke around in struct
   mnt_idmap. It can only be interacted with through dedicated helpers.
   That means all filesystems are and all of the vfs is completely
   oblivious to the actual implementation of idmappings.

   We are now also able to extend struct mnt_idmap as we see fit. For
   example, we can decouple it completely from namespaces for users that
   don't require or don't want to use them at all. We can also extend
   the concept of idmappings so we can cover filesystem specific
   requirements.

   In combination with the vfs{g,u}id_t work we finished in v6.2 this
   makes this feature substantially more robust and thus difficult to
   implement wrong by a given filesystem and also protects the vfs.

 - Enable idmapped mounts for tmpfs and fulfill a longstanding request.

   A long-standing request from users had been to make it possible to
   create idmapped mounts for tmpfs. For example, to share the host's
   tmpfs mount between multiple sandboxes. This is a prerequisite for
   some advanced Kubernetes cases. Systemd also has a range of use-cases
   to increase service isolation. And there are more users of this.

   However, with all of the other work going on this was way down on the
   priority list but luckily someone other than ourselves picked this
   up.

   As usual the patch is tiny as all the infrastructure work had been
   done multiple kernel releases ago. In addition to all the tests that
   we already have I requested that Rodrigo add a dedicated tmpfs
   testsuite for idmapped mounts to xfstests. It is to be included into
   xfstests during the v6.3 development cycle. This should add a slew of
   additional tests.

* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
  shmem: support idmapped mounts for tmpfs
  fs: move mnt_idmap
  fs: port vfs{g,u}id helpers to mnt_idmap
  fs: port fs{g,u}id helpers to mnt_idmap
  fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
  fs: port i_{g,u}id_{needs_}update() to mnt_idmap
  quota: port to mnt_idmap
  fs: port privilege checking helpers to mnt_idmap
  fs: port inode_owner_or_capable() to mnt_idmap
  fs: port inode_init_owner() to mnt_idmap
  fs: port acl to mnt_idmap
  fs: port xattr to mnt_idmap
  fs: port ->permission() to pass mnt_idmap
  fs: port ->fileattr_set() to pass mnt_idmap
  fs: port ->set_acl() to pass mnt_idmap
  fs: port ->get_acl() to pass mnt_idmap
  fs: port ->tmpfile() to pass mnt_idmap
  fs: port ->rename() to pass mnt_idmap
  fs: port ->mknod() to pass mnt_idmap
  fs: port ->mkdir() to pass mnt_idmap
  ...
2023-02-20 11:53:11 -08:00
Chuck Lever
2172e84ea0 SUNRPC: Fix occasional warning when destroying gss_krb5_enctypes
I'm guessing that the warning fired because there's some code path
that is called on module unload where the gss_krb5_enctypes file
was never set up.

name 'gss_krb5_enctypes'
WARNING: CPU: 0 PID: 6187 at fs/proc/generic.c:712 remove_proc_entry+0x38d/0x460 fs/proc/generic.c:712

destroy_krb5_enctypes_proc_entry net/sunrpc/auth_gss/svcauth_gss.c:1543 [inline]
gss_svc_shutdown_net+0x7d/0x2b0 net/sunrpc/auth_gss/svcauth_gss.c:2120
ops_exit_list+0xb0/0x170 net/core/net_namespace.c:169
setup_net+0x9bd/0xe60 net/core/net_namespace.c:356
copy_net_ns+0x320/0x6b0 net/core/net_namespace.c:483
create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
copy_namespaces+0x410/0x500 kernel/nsproxy.c:179
copy_process+0x311d/0x76b0 kernel/fork.c:2272
kernel_clone+0xeb/0x9a0 kernel/fork.c:2684
__do_sys_clone+0xba/0x100 kernel/fork.c:2825
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Reported-by: syzbot+04a8437497bcfb4afa95@syzkaller.appspotmail.com
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:57 -05:00
Chuck Lever
319951eba0 SUNRPC: Remove ->xpo_secure_port()
There's no need for the cost of this extra virtual function call
during every RPC transaction: the RQ_SECURE bit can be set properly
in ->xpo_recvfrom() instead.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:55 -05:00
Chuck Lever
ecfa398773 SUNRPC: Fix whitespace damage in svcauth_unix.c
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:52 -05:00
Chuck Lever
c4a9f0552c SUNRPC: Add encryption self-tests
With the KUnit infrastructure recently added, we are free to define
other unit tests particular to our implementation. As an example,
I've added a self-test that encrypts then decrypts a string, and
checks the result.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever
4d2d15c0f1 SUNRPC: Add RFC 8009 encryption KUnit tests
RFC 8009 provides sample encryption results. Add KUnit tests to
ensure our implementation derives the expected results for the
provided sample input.

I hate how large this test is, but using non-standard key usage
values means rfc8009_encrypt_case() can't simply reuse ->import_ctx
to allocate and key its ciphers; and the test provides its own
confounders, which means krb5_etm_encrypt() can't be used directly.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever
003caf4f8c SUNRPC: Add RFC 8009 checksum KUnit tests
RFC 8009 provides sample checksum results. Add KUnit tests to ensure
our implementation derives the expected results for the provided
sample input.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever
fcbad14b58 SUNRPC: Add KDF-HMAC-SHA2 Kunit tests
RFC 8009 provides sample key derivation results, so Kunit tests are
added to ensure our implementation derives the expected keys for the
provided sample input.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever
b958cff6b2 SUNRPC: Add encryption KUnit tests for the RFC 6803 encryption types
Add tests for the new-to-RPCSEC Camellia cipher.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever
02142b2ca8 SUNRPC: Add checksum KUnit tests for the RFC 6803 encryption types
Test the new-to-RPCSEC CMAC digest algorithm.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:50 -05:00
Chuck Lever
35f6e42e81 SUNRPC: Add KDF KUnit tests for the RFC 6803 encryption types
The Camellia enctypes use a new KDF, so add some tests to ensure it
is working properly.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:50 -05:00
Chuck Lever
e1a9a3849d SUNRPC: Add Kunit tests for RFC 3962-defined encryption/decryption
Add Kunit tests for ENCTYPE_AES128_CTS_HMAC_SHA1_96. The test
vectors come from RFC 3962 Appendix B.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:50 -05:00
Chuck Lever
6eb6b8a446 SUNRPC: Add KUnit tests RFC 3961 Key Derivation
RFC 3961 Appendix A provides tests for the KDF specified in that
document as well as other parts of Kerberos. The other three usage
scenarios in Section 10 are not implemented by the Linux kernel's
RPCSEC GSS Kerberos 5 mechanism, so tests are not added for those.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:50 -05:00
Chuck Lever
ddd8c1f975 SUNRPC: Export get_gss_krb5_enctype()
I plan to add KUnit tests that will need enctype profile
information. Export the enctype profile lookup function.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:48 -05:00
Chuck Lever
eebd8c2d19 SUNRPC: Add KUnit tests for rpcsec_krb5.ko
The Kerberos RFCs provide test vectors to verify the operation of
an implementation. Introduce a KUnit test framework to exercise the
Linux kernel's implementation of Kerberos.

Start with test cases for the RFC 3961-defined n-fold function. The
sample vectors for that are found in RFC 3961 Section 10.

Run the GSS Kerberos 5 mechanism's unit tests with this command:

$ ./tools/testing/kunit/kunit.py run \
	--kunitconfig ./net/sunrpc/.kunitconfig

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:46 -05:00
Chuck Lever
6e460c230d SUNRPC: Move remaining internal definitions to gss_krb5_internal.h
The goal is to leave only protocol-defined items in gss_krb5.h so
that it can be easily replaced by a generic header. Implementation
specific items are moved to the new internal header.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:46 -05:00
Chuck Lever
6e6d9eee0e SUNRPC: Advertise support for the Camellia encryption types
Add the RFC 6803 encryption types to the string of integers that is
reported to gssd during upcalls. This enables gssd to utilize keys
with these encryption types when support for them is built into the
kernel.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:44 -05:00
Chuck Lever
45b4ef46b5 SUNRPC: Add KDF_FEEDBACK_CMAC
The Camellia enctypes use the KDF_FEEDBACK_CMAC Key Derivation
Function defined in RFC 6803 Section 3.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever
3394682fba SUNRPC: Support the Camellia enctypes
RFC 6803 defines two encryption types that use Camellia ciphers (RFC
3713) and CMAC digests. Implement support for those in SunRPC's GSS
Kerberos 5 mechanism.

There has not been an explicit request to support these enctypes.
However, this new set of enctypes provides a good alternative to the
AES-SHA1 enctypes that are to be deprecated at some point.

As this implementation is still a "beta", the default is to not
build it automatically.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever
f26ec6b1b1 SUNRPC: Advertise support for RFC 8009 encryption types
Add the RFC 8009 encryption types to the string of integers that is
reported to gssd during upcalls. This enables gssd to utilize keys
with these encryption types when support for them is built into the
kernel.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=400
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever
0d5b5a0f32 SUNRPC: Add RFC 8009 encryption and decryption functions
RFC 8009 enctypes use different crypt formulae than previous
Kerberos 5 encryption types. Section 1 of RFC 8009 explains the
reason for this change:

> The new types conform to the framework specified in [RFC3961],
> but do not use the simplified profile, as the simplified profile
> is not compliant with modern cryptographic best practices such as
> calculating Message Authentication Codes (MACs) over ciphertext
> rather than plaintext.

Add new .encrypt and .decrypt functions to handle this variation.

The new approach described above is referred to as Encrypt-then-MAC
(or EtM). Hence the names of the new functions added here are
prefixed with "krb5_etm_".

A critical second difference with previous crypt formulae is that
the cipher state is included in the computed HMAC. Note however that
for RPCSEC, the initial cipher state is easy to compute on both
initiator and acceptor because it is always all zeroes.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever
ae2e4d2bae SUNRPC: Add KDF-HMAC-SHA2
The RFC 8009 encryption types use a different key derivation
function than the RFC 3962 encryption types. The new key derivation
function is defined in Section 3 of RFC 8009.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever
a40cf7530d SUNRPC: Add gk5e definitions for RFC 8009 encryption types
Fill in entries in the supported_gss_krb5_enctypes array for the
encryption types defined in RFC 8009. These new enctypes use the
SHA-256 and SHA-384 message digest algorithms (as defined in
FIPS-180) instead of the deprecated SHA-1 algorithm, and are thus
more secure.

Note that NIST has scheduled SHA-1 for deprecation:

https://www.nist.gov/news-events/news/2022/12/nist-retires-sha-1-cryptographic-algorithm

Thus these new encryption types are placed under a separate CONFIG
option to enable distributors to separately introduce support for
the AES-SHA2 enctypes and deprecate support for the current set of
AES-SHA1 encryption types as their user space allows.

As this implementation is still a "beta", the default is to not
build it automatically.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever
dfb632432a SUNRPC: Refactor CBC with CTS into helpers
Cryptosystem profile enctypes all use cipher block chaining
with ciphertext steal (CBC-with-CTS). However enctypes that are
currently supported in the Linux kernel SunRPC implementation
use only the encrypt-&-MAC approach. The RFC 8009 enctypes use
encrypt-then-MAC, which performs encryption and checksumming in
a different order.

Refactor to make it possible to share the CBC with CTS encryption
and decryption mechanisms between e&M and etM enctypes.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever
af664fc902 SUNRPC: Add new subkey length fields
The aes256-cts-hmac-sha384-192 enctype specifies the length of its
checksum and integrity subkeys as 192 bits, but the length of its
encryption subkey (Ke) as 256 bits. Add new fields to struct
gss_krb5_enctype that specify the key lengths individually, and
where needed, use the correct new field instead of ->keylength.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever
8b3a09f345 SUNRPC: Parametrize the key length passed to context_v2_alloc_cipher()
Although the Kerberos specs have always listed separate subkey
lengths, the Linux kernel's SunRPC GSS Kerberos enctype profiles
assume the base key and the derived keys have identical lengths.

The aes256-cts-hmac-sha384-192 enctype specifies the length of its
checksum and integrity subkeys as 192 bits, but the length of its
encryption subkey (Ke) as 256 bits.

To support that enctype, parametrize context_v2_alloc_cipher() so
that each of its call sites can pass in its desired key length. For
now it will be the same length as before (gk5e->keylength), but a
subsequent patch will change this.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever
ec4aaab39a SUNRPC: Clean up cipher set up for v1 encryption types
De-duplicate some common code.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever
2691a27d9b SUNRPC: Hoist KDF into struct gss_krb5_enctype
Each Kerberos enctype can have a different KDF. Refactor the key
derivation path to support different KDFs for the enctypes
introduced in subsequent patches.

In particular, expose the key derivation function in struct
gss_krb5_enctype instead of the enctype's preferred random-to-key
function. The latter is usually the identity function and is only
ever called during key derivation, so have each KDF call it
directly.

A couple of extra clean-ups:
- Deduplicate the set_cdata() helper
- Have ->derive_key return negative errnos, in accordance with usual
  kernel coding conventions

This patch is a little bigger than I'd like, but these are all
mechanical changes and they are all to the same areas of code. No
behavior change is intended.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever
ae6ad5d0b7 SUNRPC: Rename .encrypt_v2 and .decrypt_v2 methods
Clean up: there is now only one encrypt and only one decrypt method,
thus there is no longer a need for the v2-suffixed method names.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever
d50b8152c9 SUNRPC: Remove ->encrypt and ->decrypt methods from struct gss_krb5_enctype
Clean up: ->encrypt is set to only one value. Replace the two
remaining call sites with direct calls to krb5_encrypt().

There have never been any call sites for the ->decrypt() method.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:41 -05:00
Chuck Lever
dfe9a12345 SUNRPC: Enable rpcsec_gss_krb5.ko to be built without CRYPTO_DES
Because the DES block cipher has been deprecated by Internet
standard, highly secure configurations might require that DES
support be blacklisted or not installed. NFS Kerberos should still
be able to work correctly with only the AES-based enctypes in that
situation.

Also note that MIT Kerberos has begun a deprecation process for DES
encryption types. Their README for 1.19.3 states:

> Beginning with the krb5-1.19 release, a warning will be issued
> if initial credentials are acquired using the des3-cbc-sha1
> encryption type.  In future releases, this encryption type will
> be disabled by default and eventually removed.
>
> Beginning with the krb5-1.18 release, single-DES encryption
> types have been removed.

Aside from the CONFIG option name change, there are two important
policy changes:

1. The 'insecure enctype' group is now disabled by default.
   Distributors have to take action to enable support for deprecated
   enctypes. Implementation of these enctypes will be removed in a
   future kernel release.

2. des3-cbc-sha1 is now considered part of the 'insecure enctype'
   group, having been deprecated by RFC 8429, and is thus disabled
   by default

After this patch is applied, SunRPC support can be built with
Kerberos 5 support but without CRYPTO_DES enabled in the kernel.
And, when these enctypes are disabled, the Linux kernel's SunRPC
RPCSEC GSS implementation fully complies with BCP 179 / RFC 6649
and BCP 218 / RFC 8429.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:41 -05:00
Chuck Lever
17781b2ce4 SUNRPC: Replace KRB5_SUPPORTED_ENCTYPES macro
Now that all consumers of the KRB5_SUPPORTED_ENCTYPES macro are
within the SunRPC layer, the macro can be replaced with something
private and more flexible.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:39 -05:00
Chuck Lever
bdb12fb157 SUNRPC: Add /proc/net/rpc/gss_krb5_enctypes file
I would like to replace the KRB5_SUPPORTED_ENCTYPES macro so that
there is finer granularity about what enctype support is built in
to the kernel and then advertised by it.

The /proc/fs/nfsd/supported_krb5_enctypes file is a legacy API
that advertises supported enctypes to rpc.svcgssd (I think?). It
simply prints the value of the KRB5_SUPPORTED_ENCTYPES macro, so it
will need to be replaced with something that can instead display
exactly which enctypes are configured and built into the SunRPC
layer.

Completely decommissioning such APIs is hard. Instead, add a file
that is managed by SunRPC's GSS Kerberos mechanism, which is
authoritative about enctype support status. A subsequent patch will
replace /proc/fs/nfsd/supported_krb5_enctypes with a symlink to this
new file.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:37 -05:00
Chuck Lever
279a67cdd4 SUNRPC: Remove another switch on ctx->enctype
Replace another switch on encryption type so that it does not have
to be modified when adding or removing support for an enctype.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever
e01b2c79f4 SUNRPC: Refactor the GSS-API Per Message calls in the Kerberos mechanism
Replace a number of switches on encryption type so that all of them don't
have to be modified when adding or removing support for an enctype.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever
8270dbfceb SUNRPC: Obscure Kerberos integrity keys
There's no need to keep the integrity keys around if we instead
allocate and key a pair of ahashes and keep those. This not only
enables the subkeys to be destroyed immediately after deriving
them, but it makes the Kerberos integrity code path more efficient.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever
2dbe0cac3c SUNRPC: Obscure Kerberos signing keys
There's no need to keep the signing keys around if we instead allocate
and key an ahash and keep that. This not only enables the subkeys to
be destroyed immediately after deriving them, but it makes the
Kerberos signing code path more efficient.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever
9f0b49f933 SUNRPC: Obscure Kerberos encryption keys
The encryption subkeys are not used after the cipher transforms have
been allocated and keyed. There is no need to retain them in struct
krb5_ctx.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever
7989a4f4ab SUNRPC: Refactor set-up for aux_cipher
Hoist the name of the aux_cipher into struct gss_krb5_enctype to
prepare for obscuring the encryption keys just after they are
derived.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever
01c4e32632 SUNRPC: Obscure Kerberos session key
ctx->Ksess is never used after import has completed. Obscure it
immediately so it cannot be re-used or copied.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever
7f675ca775 SUNRPC: Improve Kerberos confounder generation
Other common Kerberos implementations use a fully random confounder
for encryption. The reason for this is explained in the new comment
added by this patch. The current get_random_bytes() implementation
does not exhaust system entropy.

Since confounder generation is part of Kerberos itself rather than
the GSS-API Kerberos mechanism, the function is renamed and moved.

Note that light top-down analysis shows that the SHA-1 transform
is by far the most CPU-intensive part of encryption. Thus we do not
expect this change to result in a significant performance impact.
However, eventually it might be necessary to generate an independent
stream of confounders for each Kerberos context to help improve I/O
parallelism.

Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever
4be416a5f2 SUNRPC: Remove .conflen field from struct gss_krb5_enctype
Now that arcfour-hmac is gone, the confounder length is again the
same as the cipher blocksize for every implemented enctype. The
gss_krb5_enctype::conflen field is no longer necessary.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever
f03640a1a9 SUNRPC: Remove .blocksize field from struct gss_krb5_enctype
It is not clear from documenting comments, specifications, or code
usage what value the gss_krb5_enctype.blocksize field is supposed
to store. The "encryption blocksize" depends only on the cipher
being used, so that value can be derived where it's needed instead
of stored as a constant.

RFC 3961 Section 5.2 says:

> cipher block size, c
>    This is the block size of the block cipher underlying the
>    encryption and decryption functions indicated above, used for key
>    derivation and for the size of the message confounder and initial
>    vector.  (If a block cipher is not in use, some comparable
>    parameter should be determined.)  It must be at least 5 octets.
>
>    This is not actually an independent parameter; rather, it is a
>    property of the functions E and D.  It is listed here to clarify
>    the distinction between it and the message block size, m.

In the Linux kernel's implemenation of the SunRPC RPCSEC GSS
Kerberos 5 mechanism, the cipher block size, which is dependent on
the encryption and decryption transforms, is used only in
krb5_derive_key(), so it is straightforward to replace it.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever
ccf08bed6e SUNRPC: Replace pool stats with per-CPU variables
Eliminate the use of bus-locked operations in svc_xprt_enqueue(),
which is a hot path. Replace them with per-cpu variables to reduce
cross-CPU memory bus traffic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:32 -05:00
Chuck Lever
65ba3d2425 SUNRPC: Use per-CPU counters to tally server RPC counts
- Improves counting accuracy
 - Reduces cross-CPU memory traffic

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:32 -05:00
Chuck Lever
db1d61656c SUNRPC: Go back to using gsd->body_start
Now that svcauth_gss_prepare_to_wrap() no longer computes the
location of RPC header fields in the response buffer,
svcauth_gss_accept() can save the location of the databody
rather than the location of the verifier.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:31 -05:00
Chuck Lever
4bcf0343e8 SUNRPC: Set rq_accept_statp inside ->accept methods
To navigate around the space that svcauth_gss_accept() reserves
for the RPC payload body length and sequence number fields,
svcauth_gss_release() does a little dance with the reply's
accept_stat, moving the accept_stat value in the response buffer
down by two words.

Instead, let's have the ->accept() methods each set the proper
final location of the accept_stat to avoid having to move
things.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:31 -05:00
Chuck Lever
cee4db1945 SUNRPC: Refactor RPC server dispatch method
Currently, svcauth_gss_accept() pre-reserves response buffer space
for the RPC payload length and GSS sequence number before returning
to the dispatcher, which then adds the header's accept_stat field.

The problem is the accept_stat field is supposed to go before the
length and seq_num fields. So svcauth_gss_release() has to relocate
the accept_stat value (see svcauth_gss_prepare_to_wrap()).

To enable these fields to be added to the response buffer in the
correct (final) order, the pointer to the accept_stat has to be made
available to svcauth_gss_accept() so that it can set it before
reserving space for the length and seq_num fields.

As a first step, move the pointer to the location of the accept_stat
field into struct svc_rqst.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:31 -05:00
Chuck Lever
5f69d5f65a SUNRPC: Final clean-up of svc_process_common()
The @resv parameter is no longer used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:31 -05:00
Chuck Lever
649a692e0f SUNRPC: Convert RPC Reply header encoding to use xdr_stream
The main part of RPC header encoding and the formation of error
responses are now done using the xdr_stream helpers. Bounds checking
before each XDR data item is encoded makes the server's encoding
path safer against accidental buffer overflows.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:30 -05:00
Chuck Lever
fcef2afffe SUNRPC: Hoist init_encode out of svc_authenticate()
Now that each ->accept method has been converted, the
svcxdr_init_encode() calls can be hoisted back up into the generic
RPC server code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:29 -05:00
Chuck Lever
72a1e53a8b SUNRPC: Use xdr_stream for encoding GSS reply verifiers
Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:29 -05:00
Chuck Lever
b2f42f1d99 SUNRPC: Use xdr_stream to encode replies in server-side GSS upcall helpers
This code constructs replies to the decorated NULL procedure calls
that establish GSS contexts. Convert this code path to use struct
xdr_stream to encode such responses.

Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever
7bb0dfb223 SUNRPC: Convert unwrap data paths to use xdr_stream for replies
We're now moving svcxdr_init_encode() to /before/ the flavor's
->accept method has set rq_auth_slack. Add a helper that can
set rq_auth_slack /after/ svcxdr_init_encode() has been called.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever
df18f9ccb9 SUNRPC: Use xdr_stream to encode Reply verifier in svcauth_tls_accept()
Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever
3b03f3c5d4 SUNRPC: Use xdr_stream to encode Reply verifier in svcauth_unix_accept()
Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever
b2c88ca65a SUNRPC: Use xdr_stream to encode Reply verifier in svcauth_null_accept()
Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever
faca897816 SUNRPC: Move svcxdr_init_encode() into ->accept methods
Refactor: So that the overhaul of each ->accept method can be done
in separate smaller patches, temporarily move the
svcxdr_init_encode() call into those methods.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever
8dd41d70f3 SUNRPC: Push svcxdr_init_encode() into svc_process_common()
Now that all vs_dispatch functions invoke svcxdr_init_encode(), it
is common code and can be pushed down into the generic RPC server.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:27 -05:00
Chuck Lever
7b402c8db6 SUNRPC: Add XDR encoding helper for opaque_auth
RFC 5531 defines an MSG_ACCEPTED Reply message like this:

	struct accepted_reply {
		opaque_auth verf;
		union switch (accept_stat stat) {
		case SUCCESS:
		   ...

In the current server code, struct opaque_auth encoding is open-
coded. Introduce a helper that encodes an opaque_auth data item
within the context of a xdr_stream.

Done as part of hardening the server-side RPC header decoding and
encoding paths.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:27 -05:00
Chuck Lever
6d037b15e4 SUNRPC: Remove the rpc_stat variable in svc_process_common()
There's no RPC header field called rpc_stat; more precisely, the
variable appears to be recording an accept_stat value. But it looks
like we don't need to preserve this value at all, actually, so
simply remove the variable.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever
99d074d6b1 SUNRPC: Check rq_auth_stat when preparing to wrap a response
Commit 5b304bc5bf ("[PATCH] knfsd: svcrpc: gss: fix failure on
SVC_DENIED in integrity case") added a check to prevent wrapping an
RPC response if reply_stat == MSG_DENIED, assuming that the only way
to get to svcauth_gss_release() with that reply_stat value was if
the reject_stat was AUTH_ERROR (reject_stat == MISMATCH is handled
earlier in svc_process_common()).

The code there is somewhat confusing. For one thing, rpc_success is
an accept_stat value, not a reply_stat value. The correct reply_stat
value to look for is RPC_MSG_DENIED. It happens to be the same value
as rpc_success, so it all works out, but it's not terribly readable.

Since commit 438623a06b ("SUNRPC: Add svc_rqst::rq_auth_stat"),
the actual auth_stat value is stored in the svc_rqst, so that value
is now available to svcauth_gss_prepare_to_wrap() to make its
decision to wrap, based on direct information about the
authentication status of the RPC caller.

No behavior change is intended, this simply replaces some old code
with something that should be more self-documenting.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever
eb1b780f2f SUNRPC: Convert svcauth_gss_wrap_priv() to use xdr_stream()
Actually xdr_stream does not add value here because of how
gss_wrap() works. This is just a clean-up patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever
a84cfbcd5a SUNRPC: Add @head and @tail variables in svcauth_gss_wrap_priv()
Simplify the references to the head and tail iovecs for readability.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever
ba8b13e5f4 SUNRPC: Record gss_wrap() errors in svcauth_gss_wrap_priv()
Match the error reporting in the other unwrap and wrap functions.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00