Commit graph

39752 commits

Author SHA1 Message Date
Geliang Tang
45bcc03465 selftests: mptcp: diag: return KSFT_FAIL not test_cnt
The test counter 'test_cnt' should not be returned in diag.sh, e.g. what
if only the 4th test fail? Will do 'exit 4' which is 'exit ${KSFT_SKIP}',
the whole test will be marked as skipped instead of 'failed'!

So we should do ret=${KSFT_FAIL} instead.

Fixes: df62f2ec3d ("selftests/mptcp: add diag interface tests")
Cc: stable@vger.kernel.org
Fixes: 42fb6cddec ("selftests: mptcp: more stable diag tests")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04 13:05:15 +00:00
Linus Torvalds
e4f7900095 powerpc fixes for 6.8 #5
- Fix IOMMU table initialisation when doing kdump over SR-IOV.
 
  - Fix incorrect RTAS function name for resetting TCE tables.
 
  - Fix fpu_signal selftest failures since a recent change.
 
 Thanks to: Gaurav Batra, Nathan Lynch.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmXkauwTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLTSD/4/boNe5MtrGgk6RtxnyWJv/p9KCMcC
 rRTa5pSR6HFVK3m89V17O0onL8aKyIljO5P3rDS8X4SUhr41Z9b9/FUoHv277E7a
 f7hgX4/901DYiMLJEt9jkfmM30IxTYkPmlft0Uus/NiesNdTcdQOO4UScSysnZac
 0HM1POp32KSC2HQRc/i+WIshRnaZcC+f0PPTU/qfS/u7/pwK4eekWdayLNvvvSvH
 TfjV5Hu4JVDF7hBLsoY4PdqVQGVD3t3d1D5+UrhHuYzPx+Afc8rIDfJx/+o339W6
 ZTXrRPOfiticfhHQvMX1AgsYr3/A0BbZj/wCvv+pPsCjZPfox9XsW6CgiMKUxVDf
 ifrBhvNx0fICMf+cEjH9q2dcGwMba7dZTX5HBlSXR4xLeNitUI8pt6bYCaqw7UjH
 ohwl9aAI7Rl1hcW6qBaviKaIDqhbmj3N4B4ZdgMAKj2gnovbF9gUSG3ARLwEjsqB
 qfd0c3x6UUThj4vYfGC/iI1z1LCXC8myqof6EGArLTc13R9vFv+ycx5FwERvAxtY
 ALNBh6LMIkKI5z8ZrGWULoXHoS2QrE1SXpQ5ooH2g9n+vLibd37JdJNcrEMf/TJZ
 PluhRCJcEWkw98aUSzIoRFIFpZ2wMg/uzg3KZePhRus3GhgttTqFTbrcKHhyONBO
 pq/EDB8FizZ9/A==
 =MSDF
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix IOMMU table initialisation when doing kdump over SR-IOV

 - Fix incorrect RTAS function name for resetting TCE tables

 - Fix fpu_signal selftest failures since a recent change

Thanks to Gaurav Batra and Nathan Lynch.

* tag 'powerpc-6.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix fpu_signal failures
  powerpc/rtas: use correct function name for resetting TCE tables
  powerpc/pseries/iommu: IOMMU table is not initialized for kdump over SR-IOV
2024-03-03 09:47:19 -08:00
Jakub Kicinski
4b2765ae41 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZeEKVAAKCRDbK58LschI
 g7oYAQD5Jlv4fIVTvxvfZrTTZ2tU+OsPa75mc8SDKwpash3YygEA8kvESy8+t6pg
 D6QmSf1DIZdFoSp/bV+pfkNWMeR8gwg=
 =mTAj
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-02-29

We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).

The main changes are:

1) Extend the BPF verifier to enable static subprog calls in spin lock
   critical sections, from Kumar Kartikeya Dwivedi.

2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
   in BPF global subprogs, from Andrii Nakryiko.

3) Larger batch of riscv BPF JIT improvements and enabling inlining
   of the bpf_kptr_xchg() for RV64, from Pu Lehui.

4) Allow skeleton users to change the values of the fields in struct_ops
   maps at runtime, from Kui-Feng Lee.

5) Extend the verifier's capabilities of tracking scalars when they
   are spilled to stack, especially when the spill or fill is narrowing,
   from Maxim Mikityanskiy & Eduard Zingerman.

6) Various BPF selftest improvements to fix errors under gcc BPF backend,
   from Jose E. Marchesi.

7) Avoid module loading failure when the module trying to register
   a struct_ops has its BTF section stripped, from Geliang Tang.

8) Annotate all kfuncs in .BTF_ids section which eventually allows
   for automatic kfunc prototype generation from bpftool, from Daniel Xu.

9) Several updates to the instruction-set.rst IETF standardization
   document, from Dave Thaler.

10) Shrink the size of struct bpf_map resp. bpf_array,
    from Alexei Starovoitov.

11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
    from Benjamin Tissoires.

12) Fix bpftool to be more portable to musl libc by using POSIX's
    basename(), from Arnaldo Carvalho de Melo.

13) Add libbpf support to gcc in CORE macro definitions,
    from Cupertino Miranda.

14) Remove a duplicate type check in perf_event_bpf_event,
    from Florian Lehner.

15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
    with notrace correctly, from Yonghong Song.

16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
    array to fix build warnings, from Kees Cook.

17) Fix resolve_btfids cross-compilation to non host-native endianness,
    from Viktor Malik.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
  selftests/bpf: Test if shadow types work correctly.
  bpftool: Add an example for struct_ops map and shadow type.
  bpftool: Generated shadow variables for struct_ops maps.
  libbpf: Convert st_ops->data to shadow type.
  libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
  bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
  bpf, arm64: use bpf_prog_pack for memory management
  arm64: patching: implement text_poke API
  bpf, arm64: support exceptions
  arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
  bpf: add is_async_callback_calling_insn() helper
  bpf: introduce in_sleepable() helper
  bpf: allow more maps in sleepable bpf programs
  selftests/bpf: Test case for lacking CFI stub functions.
  bpf: Check cfi_stubs before registering a struct_ops type.
  bpf: Clarify batch lookup/lookup_and_delete semantics
  bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
  bpf, docs: Fix typos in instruction-set.rst
  selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
  bpf: Shrink size of struct bpf_map/bpf_array.
  ...
====================

Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-02 20:50:59 -08:00
Michael Ellerman
380cb2f4df selftests/powerpc: Fix fpu_signal failures
My recent commit e5d00aaac6 ("selftests/powerpc: Check all FPRs in
fpu_preempt") inadvertently broke the fpu_signal test.

It needs to take into account that fpu_preempt now loads 32 FPRs, so
enlarge darray.

Also use the newly added randomise_darray() to properly randomise darray.

Finally the checking done in signal_fpu_sig() needs to skip checking
f30/f31, because they are used as scratch registers in check_all_fprs(),
called by preempt_fpu(), and so could hold other values when the signal
is taken.

Fixes: e5d00aaac6 ("selftests/powerpc: Check all FPRs in fpu_preempt")
Reported-by: Spoorthy <spoorthy@linux.ibm.com>
Depends-on: 2ba107f679 ("selftests/powerpc: Generate better bit patterns for FPU tests")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240301101035.1230024-1-mpe@ellerman.id.au
2024-03-01 22:15:30 +11:00
David Wei
8ee60f9c41 netdevsim: fix rtnetlink.sh selftest
I cleared IFF_NOARP flag from netdevsim dev->flags in order to support
skb forwarding. This breaks the rtnetlink.sh selftest
kci_test_ipsec_offload() test because ipsec does not connect to peers it
cannot transmit to.

Fix the issue by adding a neigh entry manually. ipsec_offload test now
successfully pass.

Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
David Wei
dfb429ea4f netdevsim: add selftest for forwarding skb between connected ports
Connect two netdevsim ports in different namespaces together, then send
packets between them using socat.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
Jakub Kicinski
c05bf0e933 selftests: ip_local_port_range: use XFAIL instead of SKIP
SCTP does not support IP_LOCAL_PORT_RANGE and we know it,
so use XFAIL instead of SKIP.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
2709473c93 selftests: kselftest_harness: support using xfail
Currently some tests report skip for things they expect to fail
e.g. when given combination of parameters is known to be unsupported.
This is confusing because in an ideal test environment and fully
featured kernel no tests should be skipped.

Selftest summary line already includes xfail and xpass counters,
e.g.:

  Totals: pass:725 fail:0 xfail:0 xpass:0 skip:0 error:0

but there's no way to use it from within the harness.

Add a new per-fixture+variant combination list of test cases
we expect to fail.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
378193eff3 selftests: kselftest_harness: let PASS / FAIL provide diagnostic
Switch to printing KTAP line for PASS / FAIL with ksft_test_result_code(),
this gives us the ability to report diagnostic messages.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
42ab727eb9 selftests: kselftest_harness: separate diagnostic message with # in ksft_test_result_code()
According to the spec we should always print a # if we add
a diagnostic message. Having the caller pass in the new line
as part of diagnostic message makes handling this a bit
counter-intuitive, so append the new line in the helper.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
732e203528 selftests: kselftest_harness: print test name for SKIP
Jakub points out that for parsers it's rather useful to always
have the test name on the result line. Currently if we SKIP
(or soon XFAIL or XPASS), we will print:

ok 17 # SKIP SCTP doesn't support IP_BIND_ADDRESS_NO_PORT

     ^
     no test name

Always print the test name.
KTAP format seems to allow or even call for it, per:
https://docs.kernel.org/dev-tools/ktap.html

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/all/87jzn6lnou.fsf@cloudflare.com/
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
fa1a53d836 selftests: kselftest: add ksft_test_result_code(), handling all exit codes
For generic test harness code it's more useful to deal with exit
codes directly, rather than having to switch on them and call
the right ksft_test_result_*() helper. Add such function to kselftest.h.

Note that "directive" and "diagnostic" are what ktap docs call
those parts of the message.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Jakub Kicinski
796a344fa4 selftests: kselftest_harness: use exit code to store skip
We always use skip in combination with exit_code being 0
(KSFT_PASS). This are basic KSFT / KTAP semantics.
Store the right KSFT_* code in exit_code directly.

This makes it easier to support tests reporting other
extended KSFT_* codes like XFAIL / XPASS.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Jakub Kicinski
69fe8ec4f6 selftests: kselftest_harness: save full exit code in metadata
Instead of tracking passed = 0/1 rename the field to exit_code
and invert the values so that they match the KSFT_* exit codes.
This will allow us to fold SKIP / XFAIL into the same value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Jakub Kicinski
38c957f070 selftests: kselftest_harness: generate test name once
Since we added variant support generating full test case
name takes 4 string arguments. We're about to need it
in another two places. Stop the duplication and print
once into a temporary buffer.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Jakub Kicinski
a724707976 selftests: kselftest_harness: use KSFT_* exit codes
Now that we no longer need low exit codes to communicate
assertion steps - use normal KSFT exit codes.

Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Mickaël Salaün
0710a1a73f selftests/harness: Merge TEST_F_FORK() into TEST_F()
Replace Landlock-specific TEST_F_FORK() with an improved TEST_F() which
brings four related changes:

Run TEST_F()'s tests in a grandchild process to make it possible to
drop privileges and delegate teardown to the parent.

Compared to TEST_F_FORK(), simplify handling of the test grandchild
process thanks to vfork(2), and makes it generic (e.g. no explicit
conversion between exit code and _metadata).

Compared to TEST_F_FORK(), run teardown even when tests failed with an
assert thanks to commit 63e6b2a423 ("selftests/harness: Run TEARDOWN
for ASSERT failures").

Simplify the test harness code by removing the no_print and step fields
which are not used.  I added this feature just after I made
kselftest_harness.h more broadly available but this step counter
remained even though it wasn't needed after all. See commit 369130b631
("selftests: Enhance kselftest_harness.h to print which assert failed").

Replace spaces with tabs in one line of __TEST_F_IMPL().

Cc: Günther Noack <gnoack@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Mickaël Salaün
e74048650e selftests/landlock: Redefine TEST_F() as TEST_F_FORK()
This has the effect of creating a new test process for either TEST_F()
or TEST_F_FORK(), which doesn't change tests but will ease potential
backports.  See next commit for the TEST_F_FORK() merge into TEST_F().

Cc: Günther Noack <gnoack@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:27 +00:00
Jakub Kicinski
65f5dd4f02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/mptcp/protocol.c
  adf1bb78da ("mptcp: fix snd_wnd initialization for passive socket")
  9426ce476a ("mptcp: annotate lockless access for RX path fields")
https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/

Adjacent changes:

drivers/dpll/dpll_core.c
  0d60d8df6f ("dpll: rely on rcu for netdev_dpll_pin()")
  e7f8df0e81 ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order")

drivers/net/veth.c
  1ce7d306ea ("veth: try harder when allocating queue memory")
  0bef512012 ("net: add netdev_lockdep_set_classes() to virtual drivers")

drivers/net/wireless/intel/iwlwifi/mvm/d3.c
  8c9bef26e9 ("wifi: iwlwifi: mvm: d3: implement suspend with MLO")
  78f65fbf42 ("wifi: iwlwifi: mvm: ensure offloading TID queue exists")

net/wireless/nl80211.c
  f78c137533 ("wifi: nl80211: reject iftype change with mesh ID change")
  414532d8aa ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29 14:24:56 -08:00
Kui-Feng Lee
0623e73317 selftests/bpf: Test if shadow types work correctly.
Change the values of fields, including scalar types and function pointers,
and check if the struct_ops map works as expected.

The test changes the field "test_2" of "testmod_1" from the pointer to
test_2() to pointer to test_3() and the field "data" to 13. The function
test_2() and test_3() both compute a new value for "test_2_result", but in
different way. By checking the value of "test_2_result", it ensures the
struct_ops map works as expected with changes through shadow types.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-6-thinker.li@gmail.com
2024-02-29 14:23:53 -08:00
Kui-Feng Lee
f2e81192e0 bpftool: Add an example for struct_ops map and shadow type.
The example in bpftool-gen.8 explains how to use the pointer of the shadow
type to change the value of a field of a struct_ops map.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-5-thinker.li@gmail.com
2024-02-29 14:23:53 -08:00
Kui-Feng Lee
a7b0fa352e bpftool: Generated shadow variables for struct_ops maps.
Declares and defines a pointer of the shadow type for each struct_ops map.

The code generator will create an anonymous struct type as the shadow type
for each struct_ops map. The shadow type is translated from the original
struct type of the map. The user of the skeleton use pointers of them to
access the values of struct_ops maps.

However, shadow types only supports certain types of fields, including
scalar types and function pointers. Any fields of unsupported types are
translated into an array of characters to occupy the space of the original
field. Function pointers are translated into pointers of the struct
bpf_program. Additionally, padding fields are generated to occupy the space
between two consecutive fields.

The pointers of shadow types of struct_osp maps are initialized when
*__open_opts() in skeletons are called. For a map called FOO, the user can
access it through the pointer at skel->struct_ops.FOO.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-4-thinker.li@gmail.com
2024-02-29 14:23:53 -08:00
Kui-Feng Lee
69e4a9d2b3 libbpf: Convert st_ops->data to shadow type.
Convert st_ops->data to the shadow type of the struct_ops map. The shadow
type of a struct_ops type is a variant of the original struct type
providing a way to access/change the values in the maps of the struct_ops
type.

bpf_map__initial_value() will return st_ops->data for struct_ops types. The
skeleton is going to use it as the pointer to the shadow type of the
original struct type.

One of the main differences between the original struct type and the shadow
type is that all function pointers of the shadow type are converted to
pointers of struct bpf_program. Users can replace these bpf_program
pointers with other BPF programs. The st_ops->progs[] will be updated
before updating the value of a map to reflect the changes made by users.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-3-thinker.li@gmail.com
2024-02-29 14:23:52 -08:00
Kui-Feng Lee
3644d28546 libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
For a struct_ops map, btf_value_type_id is the type ID of it's struct
type. This value is required by bpftool to generate skeleton including
pointers of shadow types. The code generator gets the type ID from
bpf_map__btf_value_type_id() in order to get the type information of the
struct type of a map.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-2-thinker.li@gmail.com
2024-02-29 14:23:52 -08:00
Kees Cook
896880ff30 bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
Replace deprecated 0-length array in struct bpf_lpm_trie_key with
flexible array. Found with GCC 13:

../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
  207 |                                        *(__be16 *)&key->data[i]);
      |                                                   ^~~~~~~~~~~~~
../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
  102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
      |                                                      ^
../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
   97 | #define be16_to_cpu __be16_to_cpu
      |                     ^~~~~~~~~~~~~
../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
  206 |                 u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
^
      |                            ^~~~~~~~~~~
In file included from ../include/linux/bpf.h:7:
../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
   82 |         __u8    data[0];        /* Arbitrary size */
      |                 ^~~~

And found at run-time under CONFIG_FORTIFY_SOURCE:

  UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49
  index 0 is out of range for type '__u8 [*]'

Changing struct bpf_lpm_trie_key is difficult since has been used by
userspace. For example, in Cilium:

	struct egress_gw_policy_key {
	        struct bpf_lpm_trie_key lpm_key;
	        __u32 saddr;
	        __u32 daddr;
	};

While direct references to the "data" member haven't been found, there
are static initializers what include the final member. For example,
the "{}" here:

        struct egress_gw_policy_key in_key = {
                .lpm_key = { 32 + 24, {} },
                .saddr   = CLIENT_IP,
                .daddr   = EXTERNAL_SVC_IP & 0Xffffff,
        };

To avoid the build time and run time warnings seen with a 0-sized
trailing array for struct bpf_lpm_trie_key, introduce a new struct
that correctly uses a flexible array for the trailing bytes,
struct bpf_lpm_trie_key_u8. As part of this, include the "header"
portion (which is just the "prefixlen" member), so it can be used
by anything building a bpf_lpr_trie_key that has trailing members that
aren't a u8 flexible array (like the self-test[1]), which is named
struct bpf_lpm_trie_key_hdr.

Unfortunately, C++ refuses to parse the __struct_group() helper, so
it is not possible to define struct bpf_lpm_trie_key_hdr directly in
struct bpf_lpm_trie_key_u8, so we must open-code the union directly.

Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out,
and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment
to the UAPI header directing folks to the two new options.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Closes: https://paste.debian.net/hidden/ca500597/
Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1]
Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
2024-02-29 22:52:43 +01:00
Linus Torvalds
87adedeba5 Including fixes from bluetooth, WiFi and netfilter.
We have one outstanding issue with the stmmac driver, which may
 be a LOCKDEP false positive, not a blocker.
 
 Current release - regressions:
 
  - netfilter: nf_tables: re-allow NFPROTO_INET in
    nft_(match/target)_validate()
 
  - eth: ionic: fix error handling in PCI reset code
 
 Current release - new code bugs:
 
  - eth: stmmac: complete meta data only when enabled, fix null-deref
 
  - kunit: fix again checksum tests on big endian CPUs
 
 Previous releases - regressions:
 
  - veth: try harder when allocating queue memory
 
  - Bluetooth:
    - hci_bcm4377: do not mark valid bd_addr as invalid
    - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST
 
 Previous releases - always broken:
 
  - info leak in __skb_datagram_iter() on netlink socket
 
  - mptcp:
    - map v4 address to v6 when destroying subflow
    - fix potential wake-up event loss due to sndbuf auto-tuning
    - fix double-free on socket dismantle
 
  - wifi: nl80211: reject iftype change with mesh ID change
 
  - fix small out-of-bound read when validating netlink be16/32 types
 
  - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
 
  - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()
 
  - ip_tunnel: prevent perpetual headroom growth with huge number of
    tunnels on top of each other
 
  - mctp: fix skb leaks on error paths of mctp_local_output()
 
  - eth: ice: fixes for DPLL state reporting
 
  - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF
 
  - eth: dpaa: accept phy-interface-type = "10gbase-r" in the device tree
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXg6ioACgkQMUZtbf5S
 IrupHQ/+Jt9OK8AYiUUBpeE0E0pb4yHS4KuiGWChx2YECJCeeeU6Ko4gaPI6+Nyv
 mMh/3sVsLnX7w4OXp2HddMMiFGbd1ufIptS0T/EMhHbbg1h7Qr1jhpu8aM8pb9jM
 5DwjfTZijDaW84Oe+Kk9BOonxR6A+Df27O3PSEUtLk4JCy5nwEwUr9iCxgCla499
 3aLu5eWRw8PTSsJec4BK6hfCKWiA/6oBHS1pQPwYvWuBWFZe8neYHtvt3LUwo1HR
 DwN9gtMiGBzYSSQmk8V1diGIokn80G5Krdq4gXbhsLxIU0oEJA7ltGpqasxy/OCs
 KGLHcU5wCd3j42gZOzvBzzzj8RQyd2ZekyvCu7B5Rgy3fx6JWI1jLalsQ/tT9yQg
 VJgFM2AZBb1EEAw/P2DkVQ8Km8ZuVlGtzUoldvIY1deP1/LZFWc0PftA6ndT7Ldl
 wQwKPQtJ5DMzqEe3mwSjFkL+AiSmcCHCkpnGBIi4c7Ek2/GgT1HeUMwJPh0mBftz
 smlLch3jMH2YKk7AmH7l9o/Q9ypgvl+8FA+icLaX0IjtSbzz5Q7gNyhgE0w1Hdb2
 79q6SE3ETLG/dn75XMA1C0Wowrr60WKHwagMPUl57u9bchfUT8Ler/4Sd9DWn8Vl
 55YnGPWMLCkxgpk+DHXYOWjOBRszCkXrAA71NclMnbZ5cQ86JYY=
 =T2ty
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, WiFi and netfilter.

  We have one outstanding issue with the stmmac driver, which may be a
  LOCKDEP false positive, not a blocker.

  Current release - regressions:

   - netfilter: nf_tables: re-allow NFPROTO_INET in
     nft_(match/target)_validate()

   - eth: ionic: fix error handling in PCI reset code

  Current release - new code bugs:

   - eth: stmmac: complete meta data only when enabled, fix null-deref

   - kunit: fix again checksum tests on big endian CPUs

  Previous releases - regressions:

   - veth: try harder when allocating queue memory

   - Bluetooth:
      - hci_bcm4377: do not mark valid bd_addr as invalid
      - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST

  Previous releases - always broken:

   - info leak in __skb_datagram_iter() on netlink socket

   - mptcp:
      - map v4 address to v6 when destroying subflow
      - fix potential wake-up event loss due to sndbuf auto-tuning
      - fix double-free on socket dismantle

   - wifi: nl80211: reject iftype change with mesh ID change

   - fix small out-of-bound read when validating netlink be16/32 types

   - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back

   - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()

   - ip_tunnel: prevent perpetual headroom growth with huge number of
     tunnels on top of each other

   - mctp: fix skb leaks on error paths of mctp_local_output()

   - eth: ice: fixes for DPLL state reporting

   - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF

   - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device
     tree"

* tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
  dpll: fix build failure due to rcu_dereference_check() on unknown type
  kunit: Fix again checksum tests on big endian CPUs
  tls: fix use-after-free on failed backlog decryption
  tls: separate no-async decryption request handling from async
  tls: fix peeking with sync+async decryption
  tls: decrement decrypt_pending if no async completion will be called
  gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
  net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
  igb: extend PTP timestamp adjustments to i211
  rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
  tools: ynl: fix handling of multiple mcast groups
  selftests: netfilter: add bridge conntrack + multicast test case
  netfilter: bridge: confirm multicast packets before passing them up the stack
  netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
  Bluetooth: qca: Fix triggering coredump implementation
  Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
  Bluetooth: qca: Fix wrong event type for patch config command
  Bluetooth: Enforce validation on max value of connection interval
  Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
  Bluetooth: mgmt: Fix limited discoverable off timeout
  ...
2024-02-29 12:40:20 -08:00
Paolo Abeni
b611b776a9 netfilter pull request 24-02-29
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXfx4MACgkQ1V2XiooU
 IOSX+g//UHBfqYJASMMQJpWdMwWe7tB2m1LRzLYI+WUdUenK/MEylS7rNp/bwGkW
 42eDeGA0eov7kYNOY0rLB7lQBdUHwCpNZkdetTWFV9eHcEKA8cQ6OqcD1G8i41qg
 sCvObS+K/hq3f7fX9bJ9RvS5RvYoeuS1trw4mezhHwPS+1sj80v4FdqDOFCUqiT3
 65BfeoV65pVVteCRmJQxeeZ4Bepd4LRXW+VVyr3uXli/H87jqQOFxsOTqyXNEXIq
 jMYL0jnbYs0ARbNYXRYySLYQCWmbVXpfnt4JIBRP0S1e6Prby2hqUwJBeyNcXBAu
 CwBTjCEdLIV5G25EWTZWBYQdihct58s0GDRX078Sj/AozQJAWTxBEn0QLhKy2gvH
 2uspA0S2z1PS69hUvHfgGjDiBKw41T2O6D/12NBxI1DOYDLsk7ApE5tKqynUnUIj
 pOLUiolFnJd4JKnGZ/CTATpGi8KX/iSWdX8OElCpGOvKQgZyU8IXrydjcHnJz7b4
 AdsIfpjjZSdz2VU6ZmzLYJrWf6ukAchO5kYL2FIJt/eFEyGqDfwGL36FIO7YGcnu
 NPHtIF23Ldl+GIesc9UT08k+IOsfR9LMbUduJC6Dg63FDrEkFfOv+wXA1eURW3kS
 tq+eWs+QjlCeWG9FgW2NHj3+rGyjQbGOe+v1yTgl1x/BhXNV1cM=
 =2BRo
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin.

Patch #2 fixes an issue with bridge netfilter and broadcast/multicast
packets.

There is a day 0 bug in br_netfilter when used with connection tracking.

Conntrack assumes that an nf_conn structure that is not yet added to
hash table ("unconfirmed"), is only visible by the current cpu that is
processing the sk_buff.

For bridge this isn't true, sk_buff can get cloned in between, and
clones can be processed in parallel on different cpu.

This patch disables NAT and conntrack helpers for multicast packets.

Patch #3 adds a selftest to cover for the br_netfilter bug.

netfilter pull request 24-02-29

* tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: add bridge conntrack + multicast test case
  netfilter: bridge: confirm multicast packets before passing them up the stack
  netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
====================

Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29 12:16:08 +01:00
Ido Schimmel
8a7746982e selftests: vxlan_mdb: Avoid duplicate test names
Rename some test cases to avoid overlapping test names which is
problematic for the kernel test robot. No changes in the test's logic.

Suggested-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20240227170418.491442-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 20:14:49 -08:00
Jakub Kicinski
7c4a38bf1e tools: ynl: use MSG_DONTWAIT for getting notifications
To stick to libmnl wrappers in the past we had to use poll()
to check if there are any outstanding notifications on the socket.
This is no longer necessary, we can use MSG_DONTWAIT.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-16-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:45 -08:00
Jakub Kicinski
73395b4381 tools: ynl: remove the libmnl dependency
We don't use libmnl any more.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-15-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:45 -08:00
Jakub Kicinski
5ac6868daa tools: ynl: stop using mnl socket helpers
Most libmnl socket helpers can be replaced by direct calls to
the underlying libc API. We need portid, the netlink manpage
suggests we bind() address of zero.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-14-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:44 -08:00
Jakub Kicinski
50042e8051 tools: ynl: switch away from MNL_CB_*
Create a local version of the MNL_CB_* parser control values.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-13-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:44 -08:00
Jakub Kicinski
dd0973d71e tools: ynl: switch away from mnl_cb_t
All YNL parsing callbacks take struct ynl_parse_arg as the argument.
Make that official by using a local callback type instead of mnl_cb_t.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:44 -08:00
Jakub Kicinski
766c4b5460 tools: ynl: stop using mnl_cb_run2()
There's only one set of callbacks in YNL, for netlink control
messages, and most of them are trivial. So implement the message
walking directly without depending on mnl_cb_run2().

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:44 -08:00
Jakub Kicinski
1621378aab tools: ynl: use ynl_sock_read_msgs() for ACK handling
ynl_recv_ack() is simple and it's the only user of mnl_cb_run().
Now that ynl_sock_read_msgs() exists it's actually less code
to use ynl_sock_read_msgs() instead of being special.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:43 -08:00
Jakub Kicinski
2f22f0b313 tools: ynl: wrap recv() + mnl_cb_run2() into a single helper
All callers to mnl_cb_run2() call mnl_socket_recvfrom() right before.
Wrap the two in a helper, take typed arguments (struct ynl_parse_arg),
instead of hoping that all callers remember that parser error handling
requires yarg.

In case of ynl_sock_read_family() we will no longer check for kernel
returning no data, but that would be a kernel bug, not worth complicating
the code to catch this. Calling mnl_cb_run2() on an empty buffer
is legal and results in STOP (1).

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:43 -08:00
Jakub Kicinski
9c29a11316 tools: ynl-gen: remove unused parse code
Commit f2ba1e5e22 ("tools: ynl-gen: stop generating common notification handlers")
removed the last caller of the parse_cb_run() helper.
We no longer need to export ynl_cb_array.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:43 -08:00
Jakub Kicinski
d62c5d487c tools: ynl: make yarg the first member of struct ynl_dump_state
All YNL parsing code expects a pointer to struct ynl_parse_arg AKA yarg.
For dump was pass in struct ynl_dump_state, which works fine, because
struct ynl_dump_state and struct ynl_parse_arg have identical layout
for the members that matter.. but it's a bit hacky.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:43 -08:00
Jakub Kicinski
7600875f29 tools: ynl: create local ARRAY_SIZE() helper
libc doesn't have an ARRAY_SIZE() create one locally.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:42 -08:00
Jakub Kicinski
0b3ece4422 tools: ynl: create local nlmsg access helpers
Create helpers for accessing payloads of struct nlmsg.
Use them instead of the libmnl ones.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:42 -08:00
Jakub Kicinski
66fcdad088 tools: ynl: create local for_each helpers
Create ynl_attr_for_each*() iteration helpers.
Use them instead of the mnl ones.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:42 -08:00
Jakub Kicinski
5600c58038 tools: ynl: create local attribute helpers
Don't use mnl attr helpers, we're trying to remove the libmnl
dependency. Create both signed and unsigned helpers, libmnl
had unsigned helpers, so code generator no longer needs
the mnl_type() hack.

The new helpers are written from first principles, but are
hopefully not too buggy.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:42 -08:00
Jakub Kicinski
21f6986d19 tools: ynl: give up on libmnl for auto-ints
The temporary auto-int helpers are not really correct.
We can't treat signed and unsigned ints the same when
determining whether we need full 8B. I realized this
before sending the patch to add support in libmnl.
Unfortunately, that patch has not been merged,
so time to fix our local helpers. Use the mnl* name
for now, subsequent patches will address that.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:25:41 -08:00
Jakub Kicinski
b6c65eb20f tools: ynl: fix handling of multiple mcast groups
We never increment the group number iterator, so all groups
get recorded into index 0 of the mcast_groups[] array.

As a result YNL can only handle using the last group.
For example using the "netdev" sample on kernel with
page pool commands results in:

  $ ./samples/netdev
  YNL: Multicast group 'mgmt' not found

Most families have only one multicast group, so this hasn't
been noticed. Plus perhaps developers usually test the last
group which would have worked.

Fixes: 86878f14d7 ("tools: ynl: user space helpers")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:24:34 -08:00
Jakub Kicinski
3bfe90527d tools: ynl: protect from old OvS headers
Since commit 7c59c9c8f2 ("tools: ynl: generate code for ovs families")
we need relatively recent OvS headers to get YNL to compile.
Add the direct include workaround to fix compilation on less
up-to-date OSes like CentOS 9.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240226225806.1301152-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:24:11 -08:00
Florian Westphal
6523cf516c selftests: netfilter: add bridge conntrack + multicast test case
Add test case for multicast packet confirm race.
Without preceding patch, this should result in:

 WARNING: CPU: 0 PID: 38 at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm+0x3ed/0x5f0
 Workqueue: events_unbound macvlan_process_broadcast
 RIP: 0010:__nf_conntrack_confirm+0x3ed/0x5f0
  ? __nf_conntrack_confirm+0x3ed/0x5f0
  nf_confirm+0x2ad/0x2d0
  nf_hook_slow+0x36/0xd0
  ip_local_deliver+0xce/0x110
  __netif_receive_skb_one_core+0x4f/0x70
  process_backlog+0x8c/0x130
  [..]

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-29 00:22:48 +01:00
Puranjay Mohan
22fc0e80ae bpf, arm64: support exceptions
The prologue generation code has been modified to make the callback
program use the stack of the program marked as exception boundary where
callee-saved registers are already pushed.

As the bpf_throw function never returns, if it clobbers any callee-saved
registers, they would remain clobbered. So, the prologue of the
exception-boundary program is modified to push R23 and R24 as well,
which the callback will then recover in its epilogue.

The Procedure Call Standard for the Arm 64-bit Architecture[1] states
that registers r19 to r28 should be saved by the callee. BPF programs on
ARM64 already save all callee-saved registers except r23 and r24. This
patch adds an instruction in prologue of the  program to save these
two registers and another instruction in the epilogue to recover them.

These extra instructions are only added if bpf_throw() is used. Otherwise
the emitted prologue/epilogue remains unchanged.

[1] https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20240201125225.72796-3-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-27 13:54:17 -08:00
Jakub Kicinski
b819a8481a selftests: netdevsim: be less selective for FW for the devlink test
Commit 6151ff9c75 ("selftests: netdevsim: use suitable existing dummy
file for flash test") introduced a nice trick to the devlink flashing
test. Instead of user having to create a file under /lib/firmware
we just pick the first one that already exists.

Sadly, in AWS Linux there are no files directly under /lib/firmware,
only in subdirectories. Don't limit the search to -maxdepth 1.
We can use the %P print format to get the correct path for files
inside subdirectories:

$ find /lib/firmware -type f -printf '%P\n' | head -1
intel-ucode/06-1a-05

The full path is /lib/firmware/intel-ucode/06-1a-05

This works in GNU find, busybox doesn't have printf at all,
so we're not making it worse.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240224050658.930272-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27 15:20:08 +01:00
Geliang Tang
e8ddc5f255 selftests: mptcp: diag: change timeout_poll to 30
Even if it is set to 100ms from the beginning with commit
df62f2ec3d ("selftests/mptcp: add diag interface tests"), there is
no reason not to have it to 30ms like all the other tests. "diag.sh" is
not supposed to be slower than the other ones.

To maintain consistency with other scripts, this patch changes it to 30.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:42:12 -08:00
Geliang Tang
8c6f6b4bb5 selftests: mptcp: join: change capture/checksum as bool
To maintain consistency with other scripts, this patch changes vars
'capture' and 'checksum' as bool vars in mptcp_join.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-7-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:42:12 -08:00
Geliang Tang
fccf7c9224 selftests: mptcp: simult flows: define missing vars
The variables 'large', 'small', 'sout', 'cout', 'capout' and 'size' are
used in multiple functions, so they should be clearly defined as global
variables at the top of the file.

This patch redefines them at the beginning of simult_flows.sh.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-6-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:42:12 -08:00
Geliang Tang
488ccbe76c selftests: mptcp: netlink: drop duplicate var ret
The variable 'ret' are defined twice in pm_netlink.sh. This patch drops
this duplicate one that has been defined from the beginning, with
commit eedbc68532 ("selftests: add PM netlink functional tests")

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-5-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:42:12 -08:00
Matthieu Baerts (NGI0)
9da7483674 selftests: mptcp: lib: catch duplicated subtest entries
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.

When adding a new subtest entry, an error message is printed in case of
duplicated entries. If there were duplicated entries and if all features
were expected to work, the script exits with an error at the end, after
having printed all subtests in the TAP format. Thanks to that, the MPTCP
CI will catch such issues early.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-1-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:42:11 -08:00
Paolo Abeni
b4b51d36bb selftests: mptcp: explicitly trigger the listener diag code-path
The mptcp diag interface already experienced a few locking bugs
that lockdep and appropriate coverage have detected in advance.

Let's add a test-case triggering the relevant code path, to prevent
similar issues in the future.

Be careful to cope with very slow environments.

Note that we don't need an explicit timeout on the mptcp_connect
subprocess to cope with eventual bug/hang-up as the final cleanup
terminating the child processes will take care of that.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:56 -08:00
Geliang Tang
9480f388a2 selftests: mptcp: join: add ss mptcp support check
Commands 'ss -M' are used in script mptcp_join.sh to display only MPTCP
sockets. So it must be checked if ss tool supports MPTCP in this script.

Fixes: e274f71540 ("selftests: mptcp: add subflow limits test-cases")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-7-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:56 -08:00
Geliang Tang
7092dbee23 selftests: mptcp: rm subflow with v4/v4mapped addr
Now both a v4 address and a v4-mapped address are supported when
destroying a userspace pm subflow, this patch adds a second subflow
to "userspace pm add & remove address" test, and two subflows could
be removed two different ways, one with the v4mapped and one with v4.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387
Fixes: 48d73f609d ("selftests: mptcp: update userspace pm addr tests")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:55 -08:00
Jakub Kicinski
1a825e4cdf selftests: net: veth: test syncing GRO and XDP state while device is down
Test that we keep GRO flag in sync when XDP is disabled while
the device is closed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:34:13 +00:00
Linus Torvalds
ac389bc0ca cxl fixes for 6.8-rc6
- Fix NUMA initialization from ACPI CEDT.CFMWS
 
 - Fix region assembly failures due to async init order
 
 - Fix / simplify export of qos_class information
 
 - Fix cxl_acpi initialization vs single-window-init failures
 
 - Fix handling of repeated 'pci_channel_io_frozen' notifications
 
 - Workaround platforms that violate host-physical-address ==
   system-physical address assumptions
 
 - Defer CXL CPER notification handling to v6.9
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZdpH9gAKCRDfioYZHlFs
 ZwZlAQDE+PxTJnjCXDVnDylVF4yeJF2G/wSkH1CFVFVxa0OjhAD/ZFScS/nz/76l
 1IYYiiLqmVO5DdmJtfKtq16m7e1cZwc=
 =PuPF
 -----END PGP SIGNATURE-----

Merge tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Dan Williams:
 "A collection of significant fixes for the CXL subsystem.

  The largest change in this set, that bordered on "new development", is
  the fix for the fact that the location of the new qos_class attribute
  did not match the Documentation. The fix ends up deleting more code
  than it added, and it has a new unit test to backstop basic errors in
  this interface going forward. So the "red-diff" and unit test saved
  the "rip it out and try again" response.

  In contrast, the new notification path for firmware reported CXL
  errors (CXL CPER notifications) has a locking context bug that can not
  be fixed with a red-diff. Given where the release cycle stands, it is
  not comfortable to squeeze in that fix in these waning days. So, that
  receives the "back it out and try again later" treatment.

  There is a regression fix in the code that establishes memory NUMA
  nodes for platform CXL regions. That has an ack from x86 folks. There
  are a couple more fixups for Linux to understand (reassemble) CXL
  regions instantiated by platform firmware. The policy around platforms
  that do not match host-physical-address with system-physical-address
  (i.e. systems that have an address translation mechanism between the
  address range reported in the ACPI CEDT.CFMWS and endpoint decoders)
  has been softened to abort driver load rather than teardown the memory
  range (can cause system hangs). Lastly, there is a robustness /
  regression fix for cases where the driver would previously continue in
  the face of error, and a fixup for PCI error notification handling.

  Summary:

   - Fix NUMA initialization from ACPI CEDT.CFMWS

   - Fix region assembly failures due to async init order

   - Fix / simplify export of qos_class information

   - Fix cxl_acpi initialization vs single-window-init failures

   - Fix handling of repeated 'pci_channel_io_frozen' notifications

   - Workaround platforms that violate host-physical-address ==
     system-physical address assumptions

   - Defer CXL CPER notification handling to v6.9"

* tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/acpi: Fix load failures due to single window creation failure
  acpi/ghes: Remove CXL CPER notifications
  cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window
  cxl/test: Add support for qos_class checking
  cxl: Fix sysfs export of qos_class for memdev
  cxl: Remove unnecessary type cast in cxl_qos_class_verify()
  cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
  cxl/region: Allow out of order assembly of autodiscovered regions
  cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
  x86/numa: Fix the sort compare func used in numa_fill_memblks()
  x86/numa: Fix the address overlap check in numa_fill_memblks()
  cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
2024-02-24 15:53:40 -08:00
Jakub Kicinski
d662c5b3ce tools: ynl: fix header guards
devlink and ethtool have a trailing _ in the header guard. I must have
copy/pasted it into new guards, assuming it's a headers_install artifact.

This fixes build if system headers are old.

Fixes: 8f109e91b8 ("tools: ynl: include dpll and mptcp_pm in C codegen")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240222234831.179181-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23 18:17:58 -08:00
Jiri Pirko
e8a6c515ff tools: ynl: allow user to pass enum string instead of scalar value
During decoding of messages coming from kernel, attribute values are
converted to enum names in case the attribute type is enum of bitfield32.

However, when user constructs json message, he has to pass plain scalar
values. See "state" "selector" and "value" attributes in following
examples:

$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml --do pin-set --json '{"id": 0, "parent-device": {"parent-id": 0, "state": 1}}'
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml --do port-set --json '{"bus-name": "pci", "dev-name": "0000:08:00.1", "port-index": 98304, "port-function": {"caps": {"selector": 1, "value": 1 }}}'

Allow user to pass strings containing enum names, convert them to scalar
values to be encoded into Netlink message:

$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml --do pin-set --json '{"id": 0, "parent-device": {"parent-id": 0, "state": "connected"}}'
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml --do port-set --json '{"bus-name": "pci", "dev-name": "0000:08:00.1", "port-index": 98304, "port-function": {"caps": {"selector": ["roce-bit"], "value": ["roce-bit"] }}}'

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240222134351.224704-4-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23 18:16:44 -08:00
Jiri Pirko
ffe10a4546 tools: ynl: process all scalar types encoding in single elif statement
As a preparation to handle enums for scalar values, unify the processing
of all scalar types in a single elif statement.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240222134351.224704-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23 18:16:44 -08:00
Jiri Pirko
ac95b1fca0 tools: ynl: allow user to specify flag attr with bool values
The flag attr presence in Netlink message indicates value "true",
if it is missing in the message it means "false".

Allow user to specify attrname with value "true"/"false"
in json for flag attrs, treat "false" value properly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240222134351.224704-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23 18:16:43 -08:00
Linus Torvalds
95e73fb16a 14 hotfixes. 10 are cc:stable and the remainder address post-6.7 issues
or aren't considered appropriate for backporting.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZdfSoQAKCRDdBJ7gKXxA
 jtdcAQCEMwrTnSVOfP0WxixL11HK8a1bJdezHBLWFUDscm4BXgD9EV4mgQ2yPDA6
 ka+Lf9MucZSpJ2QYZA4GoDqP4wefawA=
 =zI6b
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "A batch of MM (and one non-MM) hotfixes.

  Ten are cc:stable and the remainder address post-6.7 issues or aren't
  considered appropriate for backporting"

* tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kasan: guard release_free_meta() shadow access with kasan_arch_is_ready()
  mm/damon/lru_sort: fix quota status loss due to online tunings
  mm/damon/reclaim: fix quota stauts loss due to online tunings
  MAINTAINERS: mailmap: update Shakeel's email address
  mm/damon/sysfs-schemes: handle schemes sysfs dir removal before commit_schemes_quota_goals
  mm: memcontrol: clarify swapaccount=0 deprecation warning
  mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array
  mm/zswap: invalidate duplicate entry when !zswap_enabled
  lib/Kconfig.debug: TEST_IOV_ITER depends on MMU
  mm/swap: fix race when skipping swapcache
  mm/swap_state: update zswap LRU's protection range with the folio locked
  selftests/mm: uffd-unit-test check if huge page size is 0
  mm/damon/core: check apply interval in damon_do_apply_schemes()
  mm: zswap: fix missing folio cleanup in writeback race path
2024-02-23 09:43:21 -08:00
Jakub Kicinski
fecc51559a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/udp.c
  f796feabb9 ("udp: add local "peek offset enabled" flag")
  56667da739 ("net: implement lockless setsockopt(SO_PEEK_OFF)")

Adjacent changes:

net/unix/garbage.c
  aa82ac51d6 ("af_unix: Drop oob_skb ref before purging queue in GC.")
  11498715f2 ("af_unix: Remove io_uring code for GC.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 15:29:26 -08:00
Kui-Feng Lee
e9bbda13a7 selftests/bpf: Test case for lacking CFI stub functions.
Ensure struct_ops rejects the registration of struct_ops types without
proper CFI stub functions.

bpf_test_no_cfi.ko is a module that attempts to register a struct_ops type
called "bpf_test_no_cfi_ops" with cfi_stubs of NULL and non-NULL value.
The NULL one should fail, and the non-NULL one should succeed. The module
can only be loaded successfully if these registrations yield the expected
results.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240222021105.1180475-3-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-22 12:26:41 -08:00
Linus Torvalds
4c36fbb46f iommufd for 6.8 rc
- Fix dirty tracking bitmap collection when using reporting bitmaps that
   are not neatly aligned to u64's or match the IO page table radix tree
   layout.
 
 - Add self tests to cover the cases that were found to be broken.
 
 - Add missing enforcement of invalidation type in the uapi.
 
 - Fix selftest config generation
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZddJHAAKCRCFwuHvBreF
 YaRyAQCuywXMOQsPcTXZk+bepQ0EacRZKfyPrIcKMaHC1QLwKgEA9ApiVSbai0Q+
 5IFX+rWrLUB4jiH5D12kmfhxgydFDAk=
 =CCxs
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:

 - Fix dirty tracking bitmap collection when using reporting bitmaps
   that are not neatly aligned to u64's or match the IO page table radix
   tree layout.

 - Add self tests to cover the cases that were found to be broken.

 - Add missing enforcement of invalidation type in the uapi.

 - Fix selftest config generation

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  selftests/iommu: fix the config fragment
  iommufd: Reject non-zero data_type if no data_len is provided
  iommufd/iova_bitmap: Consider page offset for the pages to be pinned
  iommufd/selftest: Add mock IO hugepages tests
  iommufd/selftest: Hugepage mock domain support
  iommufd/selftest: Refactor mock_domain_read_and_clear_dirty()
  iommufd/selftest: Refactor dirty bitmap tests
  iommufd/iova_bitmap: Handle recording beyond the mapped pages
  iommufd/selftest: Test u64 unaligned bitmaps
  iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array
  iommufd/iova_bitmap: Bounds check mapped::pages access
2024-02-22 11:53:09 -08:00
Martin Kelly
58fd62e0aa bpf: Clarify batch lookup/lookup_and_delete semantics
The batch lookup and lookup_and_delete APIs have two parameters,
in_batch and out_batch, to facilitate iterative
lookup/lookup_and_deletion operations for supported maps. Except NULL
for in_batch at the start of these two batch operations, both parameters
need to point to memory equal or larger than the respective map key
size, except for various hashmaps (hash, percpu_hash, lru_hash,
lru_percpu_hash) where the in_batch/out_batch memory size should be
at least 4 bytes.

Document these semantics to clarify the API.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240221211838.1241578-1-martin.kelly@crowdstrike.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-22 10:24:38 -08:00
Linus Torvalds
6714ebb922 Including fixes from bpf and netfilter.
Current release - regressions:
 
   - af_unix: fix another unix GC hangup
 
 Previous releases - regressions:
 
   - core: fix a possible AF_UNIX deadlock
 
   - bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready()
 
   - netfilter: nft_flow_offload: release dst in case direct xmit path is used
 
   - bridge: switchdev: ensure MDB events are delivered exactly once
 
   - l2tp: pass correct message length to ip6_append_data
 
   - dccp/tcp: unhash sk from ehash for tb2 alloc failure after check_estalblished()
 
   - tls: fixes for record type handling with PEEK
 
   - devlink: fix possible use-after-free and memory leaks in devlink_init()
 
 Previous releases - always broken:
 
   - bpf: fix an oops when attempting to read the vsyscall
   	 page through bpf_probe_read_kernel
 
   - sched: act_mirred: use the backlog for mirred ingress
 
   - netfilter: nft_flow_offload: fix dst refcount underflow
 
   - ipv6: sr: fix possible use-after-free and null-ptr-deref
 
   - mptcp: fix several data races
 
   - phonet: take correct lock to peek at the RX queue
 
 Misc:
 
   - handful of fixes and reliability improvements for selftests
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmXXKMMSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkmgAQAIV2NAVEvHVBtnm0Df9PuCcHQx6i9veS
 tGxOZMVwb5ePFI+dpiNyyn61koEiRuFLOm66pfJAuT5j5z6m4PEFfPZgtiVpCHVK
 4sz4UD4+jVLmYijv+YlWkPU3RWR0RejSkDbXwY5Y9Io/DWHhA2iq5IyMy2MncUPY
 dUc12ddEsYRH60Kmm2/96FcdbHw9Y64mDC8tIeIlCAQfng4U98EXJbCq9WXsPPlW
 vjwSKwRG76QGDugss9XkatQ7Bsva1qTobFGDOvBMQpMt+dr81pTGVi0c1h/drzvI
 EJaDO8jJU3Xy0pQ80beboCJ1KlVCYhWSmwlBMZUA1f0lA2m3U5UFEtHA5hHKs3Mi
 jNe/sgKXzThrro0fishAXbzrro2QDhCG3Vm4PRlOGexIyy+n0gIp1lHwEY1p2vX9
 RJPdt1e3xt/5NYRv6l2GVQYFi8Wd0endgzCdJeXk0OWQFLFtnxhG6ejpgxtgN0fp
 CzKU6orFpsddQtcEOdIzKMUA3CXYWAdQPXOE5Ptjoz3MXZsQqtMm3vN4and8jJ19
 8/VLsCNPp11bSRTmNY3Xt85e+gjIA2mRwgRo+ieL6b1x2AqNeVizlr6IZWYQ4TdG
 rUdlEX0IVmov80TSeQoWgtzTO7xMER+qN6FxAs3pQoUFjtol3pEURq9FQ2QZ8jW4
 5rKpNBrjKxdk
 =eUOc
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - af_unix: fix another unix GC hangup

  Previous releases - regressions:

   - core: fix a possible AF_UNIX deadlock

   - bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready()

   - netfilter: nft_flow_offload: release dst in case direct xmit path
     is used

   - bridge: switchdev: ensure MDB events are delivered exactly once

   - l2tp: pass correct message length to ip6_append_data

   - dccp/tcp: unhash sk from ehash for tb2 alloc failure after
     check_estalblished()

   - tls: fixes for record type handling with PEEK

   - devlink: fix possible use-after-free and memory leaks in
     devlink_init()

  Previous releases - always broken:

   - bpf: fix an oops when attempting to read the vsyscall page through
     bpf_probe_read_kernel

   - sched: act_mirred: use the backlog for mirred ingress

   - netfilter: nft_flow_offload: fix dst refcount underflow

   - ipv6: sr: fix possible use-after-free and null-ptr-deref

   - mptcp: fix several data races

   - phonet: take correct lock to peek at the RX queue

  Misc:

   - handful of fixes and reliability improvements for selftests"

* tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  l2tp: pass correct message length to ip6_append_data
  net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
  selftests: ioam: refactoring to align with the fix
  Fix write to cloned skb in ipv6_hop_ioam()
  phonet/pep: fix racy skb_queue_empty() use
  phonet: take correct lock to peek at the RX queue
  net: sparx5: Add spinlock for frame transmission from CPU
  net/sched: flower: Add lock protection when remove filter handle
  devlink: fix port dump cmd type
  net: stmmac: Fix EST offset for dwmac 5.10
  tools: ynl: don't leak mcast_groups on init error
  tools: ynl: make sure we always pass yarg to mnl_cb_run
  net: mctp: put sock on tag allocation failure
  netfilter: nf_tables: use kzalloc for hook allocation
  netfilter: nf_tables: register hooks last when adding new chain/flowtable
  netfilter: nft_flow_offload: release dst in case direct xmit path is used
  netfilter: nft_flow_offload: reset dst in route object after setting up flow
  netfilter: nf_tables: set dormant flag on hook register failure
  selftests: tls: add test for peeking past a record of a different type
  selftests: tls: add test for merging of same-type control messages
  ...
2024-02-22 09:57:58 -08:00
Eduard Zingerman
b546b57526 selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
This commit updates tcp_custom_syncookie.c:tcp_parse_option() to use
explicit packet offset (ctx->off) for packet access instead of ever
moving pointer (ctx->ptr), this reduces verification complexity:
- the tcp_parse_option() is passed as a callback to bpf_loop();
- suppose a checkpoint is created each time at function entry;
- the ctx->ptr is tracked by verifier as PTR_TO_PACKET;
- the ctx->ptr is incremented in tcp_parse_option(),
  thus umax_value field tracked for it is incremented as well;
- on each next iteration of tcp_parse_option()
  checkpoint from a previous iteration can't be reused
  for state pruning, because PTR_TO_PACKET registers are
  considered equivalent only if old->umax_value >= cur->umax_value;
- on the other hand, the ctx->off is a SCALAR,
  subject to widen_imprecise_scalars();
- it's exact bounds are eventually forgotten and it is tracked as
  unknown scalar at entry to tcp_parse_option();
- hence checkpoints created at the start of the function eventually
  converge.

The change is similar to one applied in [0] to xdp_synproxy_kern.c.

Comparing before and after with veristat yields following results:

File                             Insns (A)  Insns (B)  Insns      (DIFF)
-------------------------------  ---------  ---------  -----------------
test_tcp_custom_syncookie.bpf.o     466657      12423  -454234 (-97.34%)

[0] commit 977bc146d4 ("selftests/bpf: track tcp payload offset as scalar in xdp_synproxy")

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240222150300.14909-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-22 08:46:15 -08:00
Muhammad Usama Anjum
510325e5ac selftests/iommu: fix the config fragment
The config fragment doesn't follow the correct format to enable those
config options which make the config options getting missed while
merging with other configs.

➜ merge_config.sh -m .config tools/testing/selftests/iommu/config
Using .config as base
Merging tools/testing/selftests/iommu/config
➜ make olddefconfig
.config:5295:warning: unexpected data: CONFIG_IOMMUFD
.config:5296:warning: unexpected data: CONFIG_IOMMUFD_TEST

While at it, add CONFIG_FAULT_INJECTION as well which is needed for
CONFIG_IOMMUFD_TEST. If CONFIG_FAULT_INJECTION isn't present in base
config (such as x86 defconfig), CONFIG_IOMMUFD_TEST doesn't get enabled.

Fixes: 57f0988706 ("iommufd: Add a selftest")
Link: https://lore.kernel.org/r/20240222074934.71380-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-22 09:02:05 -04:00
Jeremy Kerr
109a533114 net: mctp: tests: Test that outgoing skbs have flow data populated
When CONFIG_MCTP_FLOWS is enabled, outgoing skbs should have their
SKB_EXT_MCTP extension set for drivers to consume.

Add two tests for local-to-output routing that check for the flow
extensions: one for the simple single-packet case, and one for
fragmentation.

We now make MCTP_TEST select MCTP_FLOWS, so we always get coverage of
these flow tests. The tests are skippable if MCTP_FLOWS is (otherwise)
disabled, but that would need manual config tweaking.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 13:32:55 +01:00
Paolo Abeni
fdcd4467ba bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZdaBCwAKCRDbK58LschI
 g3EhAP0d+S18mNabiEGz8efnE2yz3XcFchJgjiRS8WjOv75GvQEA6/sWncFjbc8k
 EqxPHmeJa19rWhQlFrmlyNQfLYGe4gY=
 =VkOs
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-02-22

The following pull-request contains BPF updates for your *net* tree.

We've added 11 non-merge commits during the last 24 day(s) which contain
a total of 15 files changed, 217 insertions(+), 17 deletions(-).

The main changes are:

1) Fix a syzkaller-triggered oops when attempting to read the vsyscall
   page through bpf_probe_read_kernel and friends, from Hou Tao.

2) Fix a kernel panic due to uninitialized iter position pointer in
   bpf_iter_task, from Yafang Shao.

3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel,
   from Martin KaFai Lau.

4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET)
   due to incorrect truesize accounting, from Sebastian Andrzej Siewior.

5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready,
   from Shigeru Yoshida.

6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be
   resolved, from Hari Bathini.

bpf-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()
  selftests/bpf: Add negtive test cases for task iter
  bpf: Fix an issue due to uninitialized bpf_iter_task
  selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel
  bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel
  selftest/bpf: Test the read of vsyscall page under x86-64
  x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
  x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
  bpf, scripts: Correct GPL license name
  xsk: Add truesize to skb_add_rx_frag().
  bpf: Fix warning for bpf_cpumask in verifier
====================

Link: https://lore.kernel.org/r/20240221231826.1404-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 10:04:47 +01:00
Justin Iurman
187bbb6968 selftests: ioam: refactoring to align with the fix
ioam6_parser uses a packet socket. After the fix to prevent writing to
cloned skb's, the receiver does not see its IOAM data anymore, which
makes input/forward ioam-selftests to fail. As a workaround,
ioam6_parser now uses an IPv6 raw socket and leverages ancillary data to
get hop-by-hop options. As a consequence, the hook is "after" the IOAM
data insertion by the receiver and all tests are working again.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 09:28:03 +01:00
Jakub Kicinski
5d78b73e85 tools: ynl: don't leak mcast_groups on init error
Make sure to free the already-parsed mcast_groups if
we don't get an ack from the kernel when reading family info.
This is part of the ynl_sock_create() error path, so we won't
get a call to ynl_sock_destroy() to free them later.

Fixes: 86878f14d7 ("tools: ynl: user space helpers")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240220161112.2735195-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 17:02:28 -08:00
Jakub Kicinski
e4fe082c38 tools: ynl: make sure we always pass yarg to mnl_cb_run
There is one common error handler in ynl - ynl_cb_error().
It expects priv to be a pointer to struct ynl_parse_arg AKA yarg.
To avoid potential crashes if we encounter a stray NLMSG_ERROR
always pass yarg as priv (or a struct which has it as the first
member).

ynl_cb_null() has a similar problem directly - it expects yarg
but priv passed by the caller is ys.

Found by code inspection.

Fixes: 86878f14d7 ("tools: ynl: user space helpers")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240220161112.2735195-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 17:02:28 -08:00
Sabrina Dubroca
2bf6172632 selftests: tls: add test for peeking past a record of a different type
If we queue 3 records:
 - record 1, type DATA
 - record 2, some other type
 - record 3, type DATA
the current code can look past the 2nd record and merge the 2 data
records.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/4623550f8617c239581030c13402d3262f2bd14f.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 14:25:52 -08:00
Sabrina Dubroca
7b2a4c2a62 selftests: tls: add test for merging of same-type control messages
Two consecutive control messages of the same type should never be
merged into one large received blob of data.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/018f1633d5471684c65def5fe390de3b15c3d683.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21 14:25:51 -08:00
Alexei Starovoitov
01dbd7d872 selftests/bpf: Remove intermediate test files.
The test of linking process creates several intermediate files.
Remove them once the build is over.

This reduces the number of files in selftests/bpf/ directory
from ~4400 to ~2600.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240220231102.49090-1-alexei.starovoitov@gmail.com
2024-02-21 13:49:14 -08:00
David S. Miller
e199c4ba82 wireless-next patches for v6.9
The second "new features" pull request for v6.9.  Lots of iwlwifi and
 stack changes this time. And naturally smaller changes to other drivers.
 
 We also twice merged wireless into wireless-next to avoid conflicts
 between the trees.
 
 Major changes:
 
 stack
 
 * mac80211: negotiated TTLM request support
 
 * SPP A-MSDU support
 
 * mac80211: wider bandwidth OFDMA config support
 
 iwlwifi
 
 * kunit tests
 
 * bump FW API to 89 for AX/BZ/SC devices
 
 * enable SPP A-MSDUs
 
 * support for new devices
 
 ath12k
 
 * refactoring in preparation for Multi-Link Operation (MLO) support
 
 * 1024 Block Ack window size support
 
 * provide firmware wmi logs via a trace event
 
 ath11k
 
 * 36 bit DMA mask support
 
 * support 6 GHz station power modes: Low Power Indoor (LPI), Standard
   Power) SP and Very Low Power (VLP)
 
 rtl8xxxu
 
 * TP-Link TL-WN823N V2 support
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmXU2PgRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZuzZAf+NsvOkkhIoMG3rYmqli9ELEgupBIEoTwo
 2favVGBbLOPIlvUJab3ZZ8Bsntpk3deRmISN27whNm5B3+36c7DKn3aYauVwUNs2
 Qb99f3HXkGZQJ8DdKLZMviXXMgKfXzpVISwzD7HdV/GhkVX4LZ/MFzv1zrvLAC/J
 LN5K6xKUqbgRJ1kAWbEoJpRCzNtKwx9GHAsO1vhL69yjBAqKkHivV9LE+BNjoXEz
 g/LD0z05JqWDyxJ7yud3+DiBlZtvpmK9oa9gpWnuF8sdvkywyBdP/ipfDDLgbCzY
 vKF1IUy5GNJSt5+AQS+zO0a8HrwzHR+XG8w5sCEKpjh3Nj0cxtFJ5w==
 =Bnyy
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2024-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.9

The second "new features" pull request for v6.9.  Lots of iwlwifi and
stack changes this time. And naturally smaller changes to other drivers.

We also twice merged wireless into wireless-next to avoid conflicts
between the trees.

Major changes:

stack

* mac80211: negotiated TTLM request support

* SPP A-MSDU support

* mac80211: wider bandwidth OFDMA config support

iwlwifi

* kunit tests

* bump FW API to 89 for AX/BZ/SC devices

* enable SPP A-MSDUs

* support for new devices

ath12k

* refactoring in preparation for Multi-Link Operation (MLO) support

* 1024 Block Ack window size support

* provide firmware wmi logs via a trace event

ath11k

* 36 bit DMA mask support

* support 6 GHz station power modes: Low Power Indoor (LPI), Standard
  Power) SP and Very Low Power (VLP)

rtl8xxxu

* TP-Link TL-WN823N V2 support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21 11:48:20 +00:00
Dan Williams
40de53fd00 Merge branch 'for-6.8/cxl-cper' into for-6.8/cxl
Pick up CXL CPER notification removal for v6.8-rc6, to return in a later
merge window.
2024-02-20 22:57:35 -08:00
Terry Tritton
7efa6f2c80 selftests/mm: uffd-unit-test check if huge page size is 0
If HUGETLBFS is not enabled then the default_huge_page_size function will
return 0 and cause a divide by 0 error. Add a check to see if the huge page
size is 0 and skip the hugetlb tests if it is.

Link: https://lkml.kernel.org/r/20240205145055.3545806-2-terry.tritton@linaro.org
Fixes: 16a45b57cb ("selftests/mm: add framework for uffd-unit-test")
Signed-off-by: Terry Tritton <terry.tritton@linaro.org>
Cc: Peter Griffin <peter.griffin@linaro.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-20 14:20:48 -08:00
Yafang Shao
5c138a8a4a selftests/bpf: Add negtive test cases for task iter
Incorporate a test case to assess the handling of invalid flags or
task__nullable parameters passed to bpf_iter_task_new(). Prior to the
preceding commit, this scenario could potentially trigger a kernel panic.
However, with the previous commit, this test case is expected to function
correctly.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240217114152.1623-3-laoar.shao@gmail.com
2024-02-19 12:28:15 +01:00
Martin KaFai Lau
3f00e4a9c9 selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel
This selftest is based on a Alexei's test adopted from an internal
user to troubleshoot another bug. During this exercise, a separate
racing bug was discovered between bpf_timer_cancel_and_free
and bpf_timer_cancel. The details can be found in the previous
patch.

This patch is to add a selftest that can trigger the bug.
I can trigger the UAF everytime in my qemu setup with KASAN. The idea
is to have multiple user space threads running in a tight loop to exercise
both bpf_map_update_elem (which calls into bpf_timer_cancel_and_free)
and bpf_timer_cancel.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20240215211218.990808-2-martin.lau@linux.dev
2024-02-19 12:26:46 +01:00
Jiri Pirko
d0bcc15cba tools: ynl: don't access uninitialized attr_space variable
If message contains unknown attribute and user passes
"--process-unknown" command line option, _decode() gets called with space
arg set to None. In that case, attr_space variable is not initialized
used which leads to following trace:

Traceback (most recent call last):
  File "./tools/net/ynl/cli.py", line 77, in <module>
    main()
  File "./tools/net/ynl/cli.py", line 68, in main
    reply = ynl.dump(args.dump, attrs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tools/net/ynl/lib/ynl.py", line 909, in dump
    return self._op(method, vals, [], dump=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tools/net/ynl/lib/ynl.py", line 894, in _op
    rsp_msg = self._decode(decoded.raw_attrs, op.attr_set.name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tools/net/ynl/lib/ynl.py", line 639, in _decode
    self._rsp_add(rsp, attr_name, None, self._decode_unknown(attr))
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tools/net/ynl/lib/ynl.py", line 569, in _decode_unknown
    return self._decode(NlAttrs(attr.raw), None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tools/net/ynl/lib/ynl.py", line 630, in _decode
    search_attrs = SpaceAttrs(attr_space, rsp, outer_attrs)
                              ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'attr_space' where it is not associated with a value

Fix this by moving search_attrs assignment under the if statement
above it to make sure attr_space is initialized.

Fixes: bf8b832374 ("tools/net/ynl: Support sub-messages in nested attribute spaces")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-19 09:45:40 +00:00
Hangbin Liu
cd65c48d66 selftests: bonding: set active slave to primary eth1 specifically
In bond priority testing, we set the primary interface to eth1 and add
eth0,1,2 to bond in serial. This is OK in normal times. But when in
debug kernel, the bridge port that eth0,1,2 connected would start
slowly (enter blocking, forwarding state), which caused the primary
interface down for a while after enslaving and active slave changed.
Here is a test log from Jakub's debug test[1].

 [  400.399070][   T50] br0: port 1(s0) entered disabled state
 [  400.400168][   T50] br0: port 4(s2) entered disabled state
 [  400.941504][ T2791] bond0: (slave eth0): making interface the new active one
 [  400.942603][ T2791] bond0: (slave eth0): Enslaving as an active interface with an up link
 [  400.943633][ T2766] br0: port 1(s0) entered blocking state
 [  400.944119][ T2766] br0: port 1(s0) entered forwarding state
 [  401.128792][ T2792] bond0: (slave eth1): making interface the new active one
 [  401.130771][ T2792] bond0: (slave eth1): Enslaving as an active interface with an up link
 [  401.131643][   T69] br0: port 2(s1) entered blocking state
 [  401.132067][   T69] br0: port 2(s1) entered forwarding state
 [  401.346201][ T2793] bond0: (slave eth2): Enslaving as a backup interface with an up link
 [  401.348414][   T50] br0: port 4(s2) entered blocking state
 [  401.348857][   T50] br0: port 4(s2) entered forwarding state
 [  401.519669][  T250] bond0: (slave eth0): link status definitely down, disabling slave
 [  401.526522][  T250] bond0: (slave eth1): link status definitely down, disabling slave
 [  401.526986][  T250] bond0: (slave eth2): making interface the new active one
 [  401.629470][  T250] bond0: (slave eth0): link status definitely up
 [  401.630089][  T250] bond0: (slave eth1): link status definitely up
 [...]
 # TEST: prio (active-backup ns_ip6_target primary_reselect 1)         [FAIL]
 # Current active slave is eth2 but not eth1

Fix it by setting active slave to primary slave specifically before
testing.

[1] https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/464301/1-bond-options-sh/stdout

Fixes: 481b56e039 ("selftests: bonding: re-format bond option tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-19 09:11:35 +00:00
Matthieu Baerts (NGI0)
4103d84808 selftests: mptcp: diag: unique 'cestab' subtest names
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.

Some 'cestab' subtests from the diag selftest had the same names, e.g.:

    ....chk 0 cestab

Now the previous value is taken, to have different names, e.g.:

    ....chk 2->0 cestab after flush

While at it, the 'after flush' info is added, similar to what is done
with the 'in use' subtests. Also inspired by these 'in use' subtests,
'many' is displayed instead of a large number:

    many msk socket present                           [  ok  ]
    ....chk many msk in use                           [  ok  ]
    ....chk many cestab                               [  ok  ]
    ....chk many->0 msk in use after flush            [  ok  ]
    ....chk many->0 cestab after flush                [  ok  ]

Fixes: 81ab772819 ("selftests: mptcp: diag: check CURRESTAB counters")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:25:01 +00:00
Matthieu Baerts (NGI0)
645c1dc965 selftests: mptcp: diag: unique 'in use' subtest names
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.

Some 'in use' subtests from the diag selftest had the same names, e.g.:

    chk 0 msk in use after flush

Now the previous value is taken, to have different names, e.g.:

    chk 2->0 msk in use after flush

While at it, avoid repeating the full message, declare it once in the
helper.

Fixes: ce99025736 ("selftests: mptcp: diag: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:25:01 +00:00
Matthieu Baerts (NGI0)
2ef0d804c0 selftests: mptcp: userspace_pm: unique subtest names
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated names.

Some subtests from the userspace_pm selftest had the same names. That's
because different subflows are created (and deleted) between the same
pair of IP addresses.

Simply adding the destination port in the name is then enough to have
different names, because the destination port is always different.

Note that adding such info takes a bit more space, so we need to
increase a bit the width to print the name, simply to keep all the
'[ OK ]' aligned as before.

Fixes: f589234e1a ("selftests: mptcp: userspace_pm: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:25:00 +00:00
Matthieu Baerts (NGI0)
4d8e0dde04 selftests: mptcp: simult flows: fix some subtest names
The selftest was correctly recording all the results, but the 'reverse
direction' part was missing in the name when needed.

It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.

Fixes: 675d99338e ("selftests: mptcp: simult flows: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:25:00 +00:00
Matthieu Baerts (NGI0)
694bd45980 selftests: mptcp: diag: fix bash warnings on older kernels
Since the 'Fixes' commit mentioned below, the command that is executed
in __chk_nr() helper can return nothing if the feature is not supported.
This is the case when the MPTCP CURRESTAB counter is not supported.

To avoid this warning ...

  ./diag.sh: line 65: [: !=: unary operator expected

... we just need to surround '$nr' with double quotes, to support an
empty string when the feature is not supported.

Fixes: 81ab772819 ("selftests: mptcp: diag: check CURRESTAB counters")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:25:00 +00:00
Matthieu Baerts (NGI0)
662f084f33 selftests: mptcp: pm nl: avoid error msg on older kernels
Since the 'Fixes' commit mentioned below, and if the kernel being tested
doesn't support the 'fullmesh' flag, this error will be printed:

  netlink error -22 (Invalid argument)
  ./pm_nl_ctl: bailing out due to netlink error[s]

But that can be normal if the kernel doesn't support the feature, no
need to print this worrying error message while everything else looks
OK. So we can mute stderr. Failures will still be detected if any.

Fixes: 1dc88d241f ("selftests: mptcp: pm_nl_ctl: always look for errors")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:25:00 +00:00
Matthieu Baerts (NGI0)
d2a2547565 selftests: mptcp: pm nl: also list skipped tests
If the feature is not supported by older kernels, and instead of just
ignoring some tests, we should mark them as skipped, so we can still
track them.

Fixes: d85555ac11 ("selftests: mptcp: pm_netlink: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:25:00 +00:00
Linus Torvalds
c02197fc90 powerpc fixes for 6.8 #3
- Fix ftrace bug on boot caused by exit text sections with -fpatchable-function-entry.
 
  - Fix accuracy of stolen time on pseries since the switch to VIRT_CPU_ACCOUNTING_GEN.
 
  - Fix a crash in the IOMMU code when doing DLPAR remove.
 
  - Set pt_regs->link on scv entry to fix BPF stack unwinding.
 
  - Add missing PPC_FEATURE_BOOKE on 64-bit e5500/e6500, which broke gdb.
 
  - Fix boot on some 6xx platforms with STRICT_KERNEL_RWX enabled.
 
  - Fix build failures with KASAN enabled and 32KB stack size.
 
  - Some other minor fixes.
 
 Thanks to: Arnd Bergmann, Benjamin Gray, Christophe Leroy, David Engraf, Gaurav
 Batra, Jason Gunthorpe, Jiangfeng Xiao, Matthias Schiffer, Nathan Lynch, Naveen
 N Rao, Nicholas Piggin, Nysal Jan K.A, R Nageswara Sastry, Shivaprasad G Bhat,
 Shrikanth Hegde, Spoorthy, Srikar Dronamraju, Venkat Rao Bagalkote.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmXRQ64THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgNpvD/kBw+HCCTOCIG1R5qW76PE3zek5ikkn
 TmzmovQv51S2NH/NJ1vuy12/xs7kkiyKMcLi2G5Ua1HVaGtLRwn25pWsJZpWJii+
 inBPCn8lRaXiDNqPCtF3xMvypWtLEvoQnUtH9If6XXEfmzo5TfoRuJdH0TF6eEuM
 A6abVONL7qYt/zGM2RhRrkVexznFk3SfF1UvKoR+6LMGVhgdW66mTKwcEt9KPn2X
 hOjtBShXQYR315qv3FJQdUQooiwdIqM7IaZf32oFoG1U/iHz+3wzHcG+83iggZEa
 jQMxthyeFjLWExT8dKwiLrTuaCa8B0bRKLypGcub1yh396/xcHv4KrX8XNJ3nQoL
 nKcQOcPkcd+ZVAfigu7wGUS12CKmuFLUTXspgGp3CJQKzBUfLMaVqlAY/DnKEgmc
 stQVi8pOv1puAE3qS2FK7HR0AdLTu0BRTdw8xfTOyfLeoYiGQQRLYnBhxb9HtwW7
 HbVjpicE6VSth1BVGfgdmWH0n/a8cuuhXYOGzJ8ug1dCjgZc3zBISVx2B1yortri
 vypyMhZ8t4i6j8B2fFRSQ1O0PY/0NmoQ6Yg2JIwIjaO5IbWkyI/KjO5VgdZTkbuV
 8i4VLBHvSUUQwd1wBLeNQFD9nLnyJAYo7qvvtBCntmUx6ZNrPihXP4fRjz/la5rJ
 I3xlArKK088RMw==
 =TiJM
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "This is a bit of a big batch for rc4, but just due to holiday hangover
  and because I didn't send any fixes last week due to a late revert
  request. I think next week should be back to normal.

   - Fix ftrace bug on boot caused by exit text sections with
     '-fpatchable-function-entry'

   - Fix accuracy of stolen time on pseries since the switch to
     VIRT_CPU_ACCOUNTING_GEN

   - Fix a crash in the IOMMU code when doing DLPAR remove

   - Set pt_regs->link on scv entry to fix BPF stack unwinding

   - Add missing PPC_FEATURE_BOOKE on 64-bit e5500/e6500, which broke
     gdb

   - Fix boot on some 6xx platforms with STRICT_KERNEL_RWX enabled

   - Fix build failures with KASAN enabled and 32KB stack size

   - Some other minor fixes

  Thanks to Arnd Bergmann, Benjamin Gray, Christophe Leroy, David
  Engraf, Gaurav Batra, Jason Gunthorpe, Jiangfeng Xiao, Matthias
  Schiffer, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A,
  R Nageswara Sastry, Shivaprasad G Bhat, Shrikanth Hegde, Spoorthy,
  Srikar Dronamraju, and Venkat Rao Bagalkote"

* tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach
  powerpc/pseries: fix accuracy of stolen time
  powerpc/ftrace: Ignore ftrace locations in exit text sections
  powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E
  powerpc/kasan: Limit KASAN thread size increase to 32KB
  Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
  powerpc: 85xx: mark local functions static
  powerpc: udbg_memcons: mark functions static
  powerpc/kasan: Fix addr error caused by page alignment
  powerpc/6xx: set High BAT Enable flag on G2_LE cores
  selftests/powerpc/papr_vpd: Check devfd before get_system_loc_code()
  powerpc/64: Set task pt_regs->link to the LR value on scv entry
  powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
  powerpc/pseries/papr-sysparm: use u8 arrays for payloads
2024-02-17 16:59:31 -08:00
Dave Jiang
117132edc6 cxl/test: Add support for qos_class checking
Set a fake qos_class to a unique value in order to do simple testing of
qos_class for root decoders and mem devs via user cxl_test. A mock
function is added to set the fake qos_class values for memory device
and overrides cxl_endpoint_parse_cdat() in cxl driver code.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-5-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Linus Torvalds
683b783c20 ARM:
* Avoid dropping the page refcount twice when freeing an unlinked
   page-table subtree.
 
 * Don't source the VFIO Kconfig twice
 
 * Fix protected-mode locking order between kvm and vcpus
 
 RISC-V:
 
 * Fix steal-time related sparse warnings
 
 x86:
 
 * Cleanup gtod_is_based_on_tsc() to return "bool" instead of an "int"
 
 * Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if and only
   if the incoming events->nmi.pending is non-zero.  If the target vCPU is in
   the UNITIALIZED state, the spurious request will result in KVM exiting to
   userspace, which in turn causes QEMU to constantly acquire and release
   QEMU's global mutex, to the point where the BSP is unable to make forward
   progress.
 
 * Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl being
   incorrectly truncated, and ultimately causes KVM to think a fixed counter
   has already been disabled (KVM thinks the old value is '0').
 
 * Fix a stack leak in KVM_GET_MSRS where a failed MSR read from userspace
   that is ultimately ignored due to ignore_msrs=true doesn't zero the output
   as intended.
 
 Selftests cleanups and fixes:
 
 * Remove redundant newlines from error messages.
 
 * Delete an unused variable in the AMX test (which causes build failures when
   compiling with -Werror).
 
 * Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails with an
   error code other than ENOENT (a Hyper-V selftest bug resulted in an EMFILE,
   and the test eventually got skipped).
 
 * Fix TSC related bugs in several Hyper-V selftests.
 
 * Fix a bug in the dirty ring logging test where a sem_post() could be left
   pending across multiple runs, resulting in incorrect synchronization between
   the main thread and the vCPU worker thread.
 
 * Relax the dirty log split test's assertions on 4KiB mappings to fix false
   positives due to the number of mappings for memslot 0 (used for code and
   data that is NOT being dirty logged) changing, e.g. due to NUMA balancing.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmXPlokUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPs3AgApdYANmMEy2YaUZLYsQOEP388vLEf
 +CS9kChY6xWuYzdFPTpM4BqNVn46zPh+HDEHTCJy1eOLpeOg6HbaNGuF/1G98+HF
 COm7C2bWOrGAL/UMzPzciyEMQFE7c/h28Yuq/4XpyDNrFbnChYxPh9W4xexqoLhV
 QtGYU03guLCUsI5veY0rOrSJ5xEu9f8c63JH5JPahtbMB0uNoi0Kz7i86sbkkUg7
 OcTra+j/FyGVAWwEJ8Q2hcGlKn4DMeyQ/riUvPrfSarTqC6ZswKltg9EMSxNnojE
 LojijqRFjKklkXonnalVeDzJbG0OWHks8VO6JmCJdt0zwBRei0iLWi2LEg==
 =8/la
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM:

   - Avoid dropping the page refcount twice when freeing an unlinked
     page-table subtree.

   - Don't source the VFIO Kconfig twice

   - Fix protected-mode locking order between kvm and vcpus

  RISC-V:

   - Fix steal-time related sparse warnings

  x86:

   - Cleanup gtod_is_based_on_tsc() to return "bool" instead of an "int"

   - Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if
     and only if the incoming events->nmi.pending is non-zero. If the
     target vCPU is in the UNITIALIZED state, the spurious request will
     result in KVM exiting to userspace, which in turn causes QEMU to
     constantly acquire and release QEMU's global mutex, to the point
     where the BSP is unable to make forward progress.

   - Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl
     being incorrectly truncated, and ultimately causes KVM to think a
     fixed counter has already been disabled (KVM thinks the old value
     is '0').

   - Fix a stack leak in KVM_GET_MSRS where a failed MSR read from
     userspace that is ultimately ignored due to ignore_msrs=true
     doesn't zero the output as intended.

  Selftests cleanups and fixes:

   - Remove redundant newlines from error messages.

   - Delete an unused variable in the AMX test (which causes build
     failures when compiling with -Werror).

   - Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails
     with an error code other than ENOENT (a Hyper-V selftest bug
     resulted in an EMFILE, and the test eventually got skipped).

   - Fix TSC related bugs in several Hyper-V selftests.

   - Fix a bug in the dirty ring logging test where a sem_post() could
     be left pending across multiple runs, resulting in incorrect
     synchronization between the main thread and the vCPU worker thread.

   - Relax the dirty log split test's assertions on 4KiB mappings to fix
     false positives due to the number of mappings for memslot 0 (used
     for code and data that is NOT being dirty logged) changing, e.g.
     due to NUMA balancing"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked()
  RISC-V: KVM: Use correct restricted types
  RISC-V: paravirt: Use correct restricted types
  RISC-V: paravirt: steal_time should be static
  KVM: selftests: Don't assert on exact number of 4KiB in dirty log split test
  KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test
  KVM: x86: Fix KVM_GET_MSRS stack info leak
  KVM: arm64: Do not source virt/lib/Kconfig twice
  KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl
  KVM: x86: Make gtod_is_based_on_tsc() return 'bool'
  KVM: selftests: Make hyperv_clock require TSC based system clocksource
  KVM: selftests: Run clocksource dependent tests with hyperv_clocksource_tsc_page too
  KVM: selftests: Use generic sys_clocksource_is_tsc() in vmx_nested_tsc_scaling_test
  KVM: selftests: Generalize check_clocksource() from kvm_clock_test
  KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
  KVM: arm64: Fix circular locking dependency
  KVM: selftests: Fail tests when open() fails with !ENOENT
  KVM: selftests: Avoid infinite loop in hyperv_features when invtsc is missing
  KVM: selftests: Delete superfluous, unused "stage" variable in AMX test
  KVM: selftests: x86_64: Remove redundant newlines
  ...
2024-02-16 10:48:14 -08:00
Marcos Paulo de Souza
7648f0c91e selftests/bpf: Remove empty TEST_CUSTOM_PROGS
Commit f04a32b2c5 ("selftests/bpf: Do not use sign-file as testcase")
removed the TEST_CUSTOM_PROGS assignment, and removed it from being used
on TEST_GEN_FILES. Remove two leftovers from that cleanup. Found by
inspection.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexey Gladkov <legion@kernel.org>
Link: https://lore.kernel.org/bpf/20240216-bpf-selftests-custom-progs-v1-1-f7cf281a1fda@suse.com
2024-02-16 18:08:26 +01:00
Jakub Kicinski
52f671db18 net/sched: act_mirred: use the backlog for mirred ingress
The test Davide added in commit ca22da2fbd ("act_mirred: use the backlog
for nested calls to mirred ingress") hangs our testing VMs every 10 or so
runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
lockdep.

The problem as previously described by Davide (see Link) is that
if we reverse flow of traffic with the redirect (egress -> ingress)
we may reach the same socket which generated the packet. And we may
still be holding its socket lock. The common solution to such deadlocks
is to put the packet in the Rx backlog, rather than run the Rx path
inline. Do that for all egress -> ingress reversals, not just once
we started to nest mirred calls.

In the past there was a concern that the backlog indirection will
lead to loss of error reporting / less accurate stats. But the current
workaround does not seem to address the issue.

Fixes: 53592b3640 ("net/sched: act_mirred: Implement ingress actions")
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Suggested-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16 10:13:31 +00:00
Hangbin Liu
31f26e4fec selftests: bonding: make sure new active is not null
One of Jakub's tests[1] shows that there may be period all ports
are down and no active slave. This makes the new_active_slave null
and the test fails. Add a check to make sure the new active is not null.

 [  189.051966] br0: port 2(s1) entered disabled state
 [  189.317881] bond0: (slave eth1): link status definitely down, disabling slave
 [  189.318487] bond0: (slave eth2): making interface the new active one
 [  190.435430] br0: port 4(s2) entered disabled state
 [  190.773786] bond0: (slave eth0): link status definitely down, disabling slave
 [  190.774204] bond0: (slave eth2): link status definitely down, disabling slave
 [  190.774715] bond0: now running without any active interface!
 [  190.877760] bond0: (slave eth0): link status definitely up
 [  190.878098] bond0: (slave eth0): making interface the new active one
 [  190.878495] bond0: active interface up!
 [  191.802872] br0: port 4(s2) entered blocking state
 [  191.803157] br0: port 4(s2) entered forwarding state
 [  191.813756] bond0: (slave eth2): link status definitely up
 [  192.847095] br0: port 2(s1) entered blocking state
 [  192.847396] br0: port 2(s1) entered forwarding state
 [  192.853740] bond0: (slave eth1): link status definitely up
 # TEST: prio (active-backup ns_ip6_target primary_reselect 1)         [FAIL]
 # Current active slave is null but not eth0

[1] https://netdev-3.bots.linux.dev/vmksft-bonding/results/464481/1-bond-options-sh/stdout

Fixes: 45bf79bc56 ("selftests: bonding: reduce garp_test/arp_validate test time")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16 08:02:39 +00:00
Hou Tao
be66d79189 selftest/bpf: Test the read of vsyscall page under x86-64
Under x86-64, when using bpf_probe_read_kernel{_str}() or
bpf_probe_read{_str}() to read vsyscall page, the read may trigger oops,
so add one test case to ensure that the problem is fixed. Beside those
four bpf helpers mentioned above, testing the read of vsyscall page by
using bpf_probe_read_user{_str} and bpf_copy_from_user{_task}() as well.

The test case passes the address of vsyscall page to these six helpers
and checks whether the returned values are expected:

1) For bpf_probe_read_kernel{_str}()/bpf_probe_read{_str}(), the
   expected return value is -ERANGE as shown below:

bpf_probe_read_kernel_common
  copy_from_kernel_nofault
    // false, return -ERANGE
    copy_from_kernel_nofault_allowed

2) For bpf_probe_read_user{_str}(), the expected return value is -EFAULT
   as show below:

bpf_probe_read_user_common
  copy_from_user_nofault
    // false, return -EFAULT
    __access_ok

3) For bpf_copy_from_user(), the expected return value is -EFAULT:

// return -EFAULT
bpf_copy_from_user
  copy_from_user
    _copy_from_user
      // return false
      access_ok

4) For bpf_copy_from_user_task(), the expected return value is -EFAULT:

// return -EFAULT
bpf_copy_from_user_task
  access_process_vm
    // return 0
    vma_lookup()
    // return 0
    expand_stack()

The occurrence of oops depends on the availability of CPU SMAP [1]
feature and there are three possible configurations of vsyscall page in
the boot cmd-line: vsyscall={xonly|none|emulate}, so there are a total
of six possible combinations. Under all these combinations, the test
case runs successfully.

[1]: https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20240202103935.3154011-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-15 19:21:39 -08:00
Jakub Kicinski
73be9a3aab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/dev.c
  9f30831390 ("net: add rcu safety to rtnl_prop_list_size()")
  723de3ebef ("net: free altname using an RCU callback")

net/unix/garbage.c
  11498715f2 ("af_unix: Remove io_uring code for GC.")
  25236c91b5 ("af_unix: Fix task hung while purging oob_skb in GC.")

drivers/net/ethernet/renesas/ravb_main.c
  ed4adc0720 ("net: ravb: Count packets instead of descriptors in GbEth RX path"
)
  c2da940857 ("ravb: Add Rx checksum offload support for GbEth")

net/mptcp/protocol.c
  bdd70eb689 ("mptcp: drop the push_pending field")
  28e5c13805 ("mptcp: annotate lockless accesses around read-mostly fields")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 16:20:04 -08:00