Commit graph

37493 commits

Author SHA1 Message Date
Daniel Xu
068ca522d5 libbpf: Add bpf_object__unpin()
For bpf_object__pin_programs() there is bpf_object__unpin_programs().
Likewise bpf_object__unpin_maps() for bpf_object__pin_maps().

But no bpf_object__unpin() for bpf_object__pin(). Adding the former adds
symmetry to the API.

It's also convenient for cleanup in application code. It's an API I
would've used if it was available for a repro I was writing earlier.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/b2f9d41da4a350281a0b53a804d11b68327e14e5.1692832478.git.dxu@dxuuu.xyz
2023-08-23 17:10:09 -07:00
Charlie Jenkins
4d0c04eac0
RISC-V: mm: Add tests for RISC-V mm
Add tests that enforce mmap hint address behavior. mmap should default
to sv48. mmap will provide an address at the highest address space that
can fit into the hint address, unless the hint address is less than sv39
and not 0, then it will return a sv39 address.

These tests are split into two files: mmap_default.c and mmap_bottomup.c
because a new process must be exec'd in order to change the mmap layout.
The run_mmap.sh script sets the stack to be unlimited for the
mmap_bottomup.c test which triggers a bottomup layout.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20230809232218.849726-3-charlie@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:54:13 -07:00
Guilherme Amadio
9e1f16939b perf build: Allow customization of clang options for BPF target
This also puts an unconditional -Werror under control of WERROR. The
clang includes added during the build can lead to a warning that may be
turned into an error. In addition, hardened clang produces a warning
about lack of support for -fstack-protector* options for the BPF target:

  clang -g -O2 -target bpf -Wall -Werror -Ilinux/tools/perf/util/bpf_skel/.tmp/.. \
    -I -idirafter /usr/lib/llvm/16/bin/../../../../lib/clang/16/include -idirafter /usr/local/include \
    -idirafter /usr/include  -Ilinux/tools/include/uapi -c util/bpf_skel/bperf_follower.bpf.c \
    -o linux/tools/perf/util/bpf_skel/.tmp/bperf_follower.bpf.o && llvm-strip -g linux/tools/perf/util/bpf_skel/.tmp/bperf_follower.bpf.o
  clang-16: error: /usr/lib/llvm/16/bin/../../../../lib/clang/16/include: 'linker' input unused [-Werror,-Wunused-command-line-argument]
  clang-16: error: ignoring '-fstack-protector-strong' option as it is not currently supported for target 'bpf' [-Werror,-Woption-ignored]
  make[1]: *** [Makefile.perf:1082: linux/tools/perf/util/bpf_skel/.tmp/bpf_prog_profiler.bpf.o] Error 1

Signed-off-by: Guilherme Amadio <amadio@gentoo.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZOZQ2LDA+3Wg8x2T@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 15:33:18 -03:00
Ian Rogers
838a8c5f40 perf pmu: Pass PMU rather than aliases and format
Pass the pmu so the aliases and format list can be better abstracted
and later lazily loaded.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 14:27:58 -03:00
Ian Rogers
da6a5afda5 perf pmu: Avoid passing format list to perf_pmu__format_bits()
Pass the PMU so the format list can be better abstracted and later
lazily loaded.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-8-irogers@google.com
[ Did missing conversions in tools/perf/arch/arm*/util/cs-etm.c ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 14:27:34 -03:00
Yafang Shao
0072e3624b selftests/bpf: Add selftest for allow_ptr_leaks
- Without prev commit

  $ tools/testing/selftests/bpf/test_progs --name=tc_bpf
  #232/1   tc_bpf/tc_bpf_root:OK
  test_tc_bpf_non_root:PASS:set_cap_bpf_cap_net_admin 0 nsec
  test_tc_bpf_non_root:PASS:disable_cap_sys_admin 0 nsec
  0: R1=ctx(off=0,imm=0) R10=fp0
  ; if ((long)(iph + 1) > (long)skb->data_end)
  0: (61) r2 = *(u32 *)(r1 +80)         ; R1=ctx(off=0,imm=0) R2_w=pkt_end(off=0,imm=0)
  ; struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr);
  1: (61) r1 = *(u32 *)(r1 +76)         ; R1_w=pkt(off=0,r=0,imm=0)
  ; if ((long)(iph + 1) > (long)skb->data_end)
  2: (07) r1 += 34                      ; R1_w=pkt(off=34,r=0,imm=0)
  3: (b4) w0 = 1                        ; R0_w=1
  4: (2d) if r1 > r2 goto pc+1
  R2 pointer comparison prohibited
  processed 5 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
  test_tc_bpf_non_root:FAIL:test_tc_bpf__open_and_load unexpected error: -13
  #233/2   tc_bpf_non_root:FAIL

- With prev commit

  $ tools/testing/selftests/bpf/test_progs --name=tc_bpf
  #232/1   tc_bpf/tc_bpf_root:OK
  #232/2   tc_bpf/tc_bpf_non_root:OK
  #232     tc_bpf:OK
  Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230823020703.3790-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-23 09:37:29 -07:00
Ian Rogers
7eb5473314 perf pmu: Avoid passing format list to perf_pmu__format_type
Pass the pmu so the format list can be better abstracted and later
lazily loaded.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 08:50:08 -03:00
Ian Rogers
804fee5d0f perf pmu: Avoid passing format list to perf_pmu__config_terms()
Abstract the format list better, hiding it in the PMU, by changing
perf_pmu__config_terms() the PMU rather than the format list in the PMU.

Change the PMU test to pass a dummy PMU for this purpose. Changing the
test allows perf_pmu__del_formats() to become static.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 08:49:35 -03:00
Ian Rogers
6f2f6eafcd perf pmu: Reduce scope of perf_pmu_error()
Move declaration from header file to pmu.y and make static.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 08:49:13 -03:00
Ian Rogers
cc5adb7347 perf pmu: Move perf_pmu__set_format to pmu.y
Avoid having the function in the C and header file, as it is only used
locally by pmu.y.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 08:48:49 -03:00
Ian Rogers
e1a3aad31c perf pmu: Avoid a path name copy
Rather than read a base path and append into a 2nd path, read the base
path directly into output buffer and append to that.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 08:42:17 -03:00
Ian Rogers
91e2e9f0b8 perf script ibs: Remove unused include
Done to reduce dependencies on pmu-events.h.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230823080828.1460376-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 08:41:52 -03:00
Kajol Jain
9823ae6f68 perf bench breakpoint: Skip run if no breakpoints available
Based on commit 7d54a4acd8 ("perf test: Skip watchpoint tests if
no watchpoints available"), hardware breakpoints are not available for
power9 platform and because of that 'perf bench breakpoint' run fails on
power9 platform.

Add code to check for the return value of perf_event_open() in the
breakpoint run and skip the 'perf bench breakpoint' run, if hardware
breakpoints are not available.

Result on power9 system before patch changes:

  [command]# perf bench breakpoint thread
  perf_event_open: No such device

Result on power9 system after patch changes:

  [command]# ./perf bench breakpoint thread
  Skipping perf bench breakpoint thread: No hardware support

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Naveen N Rao <naveen@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230823075103.190565-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-23 08:39:02 -03:00
Willy Tarreau
556fb7131e tools/nolibc: avoid undesired casts in the __sysret() macro
Having __sysret() as an inline function has the unfortunate effect of
adding casts and large constants comparisons after the syscall returns
that significantly inflate some light code that's otherwise syscall-
heavy. Even nolibc-test grew by ~1%.

Let's switch back to a macro for this, and use it only with signed
arguments. Note that it is also possible to design a slightly more
complex macro covering unsigned and pointers but we only have 3 such
syscalls so it is pointless, and these were just addressed not to use
this macro anymore. Now for the argument (the local variable containing
the syscall return value), any negative value is an error, that results
in -1 being returned and errno to be assigned the opposite value.

This may be revisited again in the future if really needed but for now
let's get back to something sane.

Fixes: 428905da6e ("tools/nolibc: sys.h: add a syscall return helper")
Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/
Link: https://lore.kernel.org/lkml/ZNKOJY+g66nkIyvv@1wt.eu/
Cc: Zhangjin Wu <falcon@tinylab.org>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Thomas Weißschuh <thomas@t-8ch.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Willy Tarreau
fb01ff635e tools/nolibc: keep brk(), sbrk(), mmap() away from __sysret()
The __sysret() function causes some undesirable casts so we'll revert
it. In order to keep it simple it will now only support integer return
values like in the past, so we must basically revert the changes that
were made to these 3 syscalls which return a pointer so that they
simply rely on their own test and the SET_ERRNO() macro.

Fixes: 4201cfce15 ("tools/nolibc: clean up sbrk() routine")
Fixes: 924e9539ae ("tools/nolibc: clean up mmap() routine")
Fixes: d27447bc2e ("tools/nolibc: sys.h: apply __sysret() helper")
Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/
Link: https://lore.kernel.org/lkml/ZNKOJY+g66nkIyvv@1wt.eu/
Cc: Zhangjin Wu <falcon@tinylab.org>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Thomas Weißschuh <thomas@t-8ch.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:19:22 +02:00
Zhangjin Wu
872dbfa032 tools/nolibc: silence ppc64 compile warnings
Silence the following warnings reported by the new -Wall -Wextra options
with pure assembly code.

    In file included from sysroot/powerpc/include/stdio.h:13,
                     from nolibc-test.c:13:
    sysroot/powerpc/include/arch.h: In function '_start':
    sysroot/powerpc/include/arch.h:192:32: warning: unused variable 'r2' [-Wunused-variable]
      192 |         register volatile long r2 __asm__ ("r2") = (void *)&TOC - (void *)_start;
          |                                ^~
    sysroot/powerpc/include/arch.h:187:97: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var]
      187 | void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
          |                                                                                                 ^~~~~~

Since only elfv2 ABI requires to save the TOC/GOT pointer to r2
register, when using elfv1 ABI, the old C code is simply ignored by the
compiler, but the compiler can not ignore the inline assembly code and
will introduce build failure or running segfaults. So, let's further
only add the new assembly code for elfv2 ABI with the checking of
_CALL_ELF == 2.

Link: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf
Link: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Zhangjin Wu
418c846821 selftests/nolibc: libc-test: use HOSTCC instead of CC
libc-test is mainly added to compare the behavior of nolibc to the
system libc, it is meaningless and error-prone with cross compiling.

Let's use HOSTCC instead of CC to avoid wrongly use cross compiler when
CROSS_COMPILE is passed or customized.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Fixes: cfb672f94f ("selftests/nolibc: add run-libc-test target")
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:18:50 +02:00
Zhangjin Wu
dcb677c3d3 tools/nolibc: stackprotector.h: make __stack_chk_init static
This allows to generate smaller text/data/dec size.

As the _start_c() function added by crt.h, __stack_chk_init() is called
from _start_c() instead of the assembly _start. So, it is able to mark
it with static now.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Zhangjin Wu
ce1bb82b1c selftests/nolibc: allow report with existing test log
After the tests finish, it is valuable to report and summarize with
existing test log.

This avoid rerun or run the tests again when not necessary.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Zhangjin Wu
faeb4e09fe selftests/nolibc: add test support for ppc64
Kernel uses ARCH=powerpc for both 32-bit and 64-bit PowerPC, here adds a
ppc64 variant for big endian 64-bit PowerPC, users can pass XARCH=ppc64
to test it.

The powernv machine of qemu-system-ppc64 is used with
powernv_be_defconfig.

As the document [1] shows:

  PowerNV (as Non-Virtualized) is the “bare metal” platform using the
  OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can be
  used as an hypervisor OS, running KVM guests, or simply as a host OS.

Notes,

- differs from little endian 64-bit PowerPC, vmlinux is used instead of
  zImage, because big endian zImage [2] only boot on qemu with x-vof=on
  (added from qemu v7.0) and a fixup patch [3] for qemu v7.0.51:

- since the VSX support may be disabled in kernel side, to avoid
  "illegal instruction" errors due to missing VSX kernel support, let's
  simply let compiler not generate vector/scalar (VSX) instructions via
  the '-mno-vsx' option.

- as 'man gcc' shows, '-mmultiple' is used to generate code that uses
  the load multiple word instructions and the store multiple word
  instructions. Those instructions do not work when the processor is in
  little-endian mode (except PPC740/PPC750), so, we only enable it
  for big endian powerpc.

- for big endian ppc64, as the help message from arch/powerpc/Kconfig
  shows, the V2 ABI is standard for 64-bit little-endian, but for
  big-endian it is less well tested by kernel and toolchain, so, use
  elfv1 as-is, no need to explicitly ask toolchain to use elfv2 here.

[1]: https://qemu.readthedocs.io/en/latest/system/ppc/powernv.html
[2]: https://github.com/linuxppc/issues/issues/402
[3]: https://lore.kernel.org/qemu-devel/20220504065536.3534488-1-aik@ozlabs.ru/

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230722121019.GD17311@1wt.eu/
Link: https://lore.kernel.org/lkml/20230719043353.GC5331@1wt.eu/
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Zhangjin Wu
8a5040cb3f selftests/nolibc: add test support for ppc64le
Kernel uses ARCH=powerpc for both 32-bit and 64-bit PowerPC, here adds a
ppc64le variant for little endian 64-bit PowerPC, users can pass
XARCH=ppc64le to test it.

The powernv machine of qemu-system-ppc64le is used for there is just a
working powernv_defconfig.

As the document [1] shows:

  PowerNV (as Non-Virtualized) is the “bare metal” platform using the
  OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can be
  used as an hypervisor OS, running KVM guests, or simply as a host OS.

Notes,

- since the VSX support may be disabled in kernel side, to avoid
  "illegal instruction" errors due to missing VSX kernel support, let's
  simply let compiler not generate vector/scalar (VSX) instructions via
  the '-mno-vsx' option.

- little endian ppc64 prefers elfv2 to elfv1 if the toolchain (e.g. gcc
  13.1.0) supports it, let's align with kernel, otherwise, our elfv1
  binary will not run on kernel with elfv2 ABI.

[1]: https://qemu.readthedocs.io/en/latest/system/ppc/powernv.html

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230722120747.GC17311@1wt.eu/
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Zhangjin Wu
587e984591 selftests/nolibc: add test support for ppc
Kernel uses ARCH=powerpc for both 32-bit and 64-bit PowerPC, here adds a
ppc variant for 32-bit PowerPC and uses it as the default variant of
powerpc architecture.

Users can pass XARCH=ppc (or ARCH=powerpc) to test 32-bit PowerPC.

The default qemu-system-ppc g3beige machine [1] is used to run 32-bit
powerpc kernel with pmac32_defconfig. The missing PMACZILOG serial tty
and console are enabled in another patch [2].

Note,

- zImage doesn't boot due to "qemu-system-ppc: Some ROM regions are
  overlapping" error, so, vmlinux is used instead.

- since the VSX support may be disabled in kernel side, to avoid
  "illegal instruction" errors due to missing VSX kernel support, let's
  simply let compiler not generate vector/scalar (VSX) instructions via
  the '-mno-vsx' option.

- as 'man gcc' shows, '-mmultiple' is used to generate code that uses
  the load multiple word instructions and the store multiple word
  instructions. Those instructions do not work when the processor is in
  little-endian mode (except PPC740/PPC750), so, we only enable it
  for big endian powerpc.

[1]: https://qemu.readthedocs.io/en/latest/system/ppc/powermac.html
[2]: https://lore.kernel.org/lkml/bb7b5f9958b3e3a20f6573ff7ce7c5dc566e7e32.1690982937.git.tanyuan@tinylab.org/

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/ZL9leVOI25S2+0+g@1wt.eu/
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Zhangjin Wu
c6c3734fb6 selftests/nolibc: add XARCH and ARCH mapping support
Most of the CPU architectures have different variants, but kernel
usually only accepts parts of them via the ARCH variable, the others
should be customized via kernel config files.

To simplify testing, a new XARCH variable is added to extend the
kernel's ARCH with a few variants of the same architecture, and it is
used to customize variant specific variables, at last XARCH is converted
to the kernel's ARCH:

  e.g. make run XARCH=<one of the supported variants>
                | \
                |  `-> variant specific variables:
                |      IMAGE, DEFCONFIG, QEMU_ARCH, QEMU_ARGS, CFLAGS ...
                \
                 `---> kernel's ARCH

XARCH and ARCH are carefully mapped to allow users to pass architecture
variants via XARCH or pass architecture via ARCH from cmdline.

PowerPC is the first user and also a very good reference architecture of
this mapping, it has variants with different combinations of
32-bit/64-bit and bit endian/little endian.

To use this mapping, the other architectures can refer to PowerPC, If
the target architecture only has one variant, XARCH is simply an alias
of ARCH, no additional mapping required.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230702171715.GD16233@1wt.eu/
Link: https://lore.kernel.org/lkml/20230730061801.GA7690@1wt.eu/
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Zhangjin Wu
e45ce88e65 tools/nolibc: add support for powerpc64
This follows the 64-bit PowerPC ABI [1], refers to the slides: "A new
ABI for little-endian PowerPC64 Design & Implementation" [2] and the
musl code in arch/powerpc64/crt_arch.h.

First, stdu and clrrdi are used instead of stwu and clrrwi for
powerpc64.

Second, the stack frame size is increased to 32 bytes for powerpc64, 32
bytes is the minimal stack frame size supported described in [2].

Besides, the TOC pointer (GOT pointer) must be saved to r2.

This works on both little endian and big endian 64-bit PowerPC.

[1]: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf
[2]: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Zhangjin Wu
0cb0675ec3 tools/nolibc: add support for powerpc
Both syscall declarations and _start code definition are added for
powerpc to nolibc.

Like mips, powerpc uses a register (exactly, the summary overflow bit)
to record the error occurred, and uses another register to return the
value [1]. So, the return value of every syscall declaration must be
normalized to match the __sysret() helper, return -value when there is
an error, otheriwse, return value directly.

Glibc and musl use different methods to check the summary overflow bit,
glibc (sysdeps/unix/sysv/linux/powerpc/sysdep.h) saves the cr register
to r0 at first, and then check the summary overflow bit in cr0:

    mfcr r0
    r0 & (1 << 28) ? -r3 : r3

    -->

    10003c14:       7c 00 00 26     mfcr    r0
    10003c18:       74 09 10 00     andis.  r9,r0,4096
    10003c1c:       41 82 00 08     beq     0x10003c24
    10003c20:       7c 63 00 d0     neg     r3,r3

Musl (arch/powerpc/syscall_arch.h) directly checks the summary overflow
bit with the 'bns' instruction, it is smaller:

    /* no summary overflow bit means no error, return value directly */
    bns+ 1f
    /* otherwise, return negated value */
    neg r3, r3
    1:

    -->

    10000418:       40 a3 00 08     bns     0x10000420
    1000041c:       7c 63 00 d0     neg     r3,r3

Like musl, Linux (arch/powerpc/include/asm/vdso/gettimeofday.h) uses the
same method for do_syscall_2() too.

Here applies the second method to get smaller size.

[1]: https://man7.org/linux/man-pages/man2/syscall.2.html

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
45f65f8d04 selftests/nolibc: enable compiler warnings
It will help the developers to avoid cruft and detect some bugs.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
711edef8f7 selftests/nolibc: don't strip nolibc-test
Binary size is not important for nolibc-test and some debugging
information is nice to have, so don't strip the binary during linking.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
9c5e490093 selftests/nolibc: prevent out of bounds access in expect_vfprintf
If read() fails and returns -1 (or returns garbage for some other
reason) buf would be accessed out of bounds.
Only use the return value of read() after it has been validated.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
37266a9ec7 selftests/nolibc: use correct return type for read() and write()
Avoid truncating values before comparing them.

As printf in nolibc doesn't support ssize_t add casts to int for
printing.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
711f91fdec selftests/nolibc: avoid sign-compare warnings
These warnings will be enabled later so avoid triggering them.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
c8d078153f selftests/nolibc: avoid unused parameter warnings
This warning will be enabled later so avoid triggering it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
17e66f235e selftests/nolibc: make functions static if possible
This allows the compiler to generate warnings if they go unused.

Functions that are supposed to be used as breakpoints should not be
static, so un-statify those if necessary.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
10874f20ee selftests/nolibc: mark test helpers as potentially unused
When warning about unused functions these would be reported by we want
to keep them for future use.

Suggested-by: Zhangjin Wu <falcon@tinylab.org>
Link: https://lore.kernel.org/lkml/20230731064826.16584-1-falcon@tinylab.org/
Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230731224929.GA18296@1wt.eu/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
79df81aaea selftests/nolibc: drop unused variables
These got copied around as new testcases where created.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Willy Tarreau
ca283457b3 selftests/nolibc: avoid warnings during intptr tests
Recent fix ceb528feb7 ("selftests/nolibc: avoid gaps in test numbers")
had the annoying side effect of always returning skipped tests, which
are normally supposed to happen only when certain features are missing
to run the test (missing kernel options, toolchain not supporting
stack-protector etc). As such there are now always warnings. Let's
modify the test to not use the condition and instead use a ternary
expression to check the result.

Fixes: ceb528feb7 ("selftests/nolibc: avoid gaps in test numbers")
Cc: Thomas WeiÃ<9F>schuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:57 +02:00
Thomas Weißschuh
202a0bd12f tools/nolibc: stdint: use __SIZE_TYPE__ for size_t
Otherwise both gcc and clang may generate warnings about type
mismatches:

sysroot/mips/include/string.h:12:14: warning: mismatch in argument 1 type of built-in function 'malloc'; expected 'unsigned int' [-Wbuiltin-declaration-mismatch]
   12 | static void *malloc(size_t len);
      |              ^~~~~~

The compiler provides __SIZE_TYPE__ as the type that corresponds to size_t
(typically "long unsigned int" or "unsigned int"). It was verified to be
available at least since gcc-3.4 and clang-3.8, so from now on we'll use
this definition for size_t.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/20230805161929.GA15284@1wt.eu/
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
04694658ad tools/nolibc: sys: avoid implicit sign cast
getauxval() returns an unsigned long but the overall type of the ternary
operator needs to be signed.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
809145f842 tools/nolibc: setvbuf: avoid unused parameter warnings
This warning will be enabled later so avoid triggering it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
6407750225 tools/nolibc: fix return type of getpagesize()
It's documented as returning int which is also implemented by glibc and
musl, so adopt that return type.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
f2f5eaefa1 tools/nolibc: drop unused variables
Nobody needs it, get rid of it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Yuan Tan
5c01259b12 selftests/nolibc: add testcase for pipe
Add a test case of pipe that sends and receives message in a single
process.

Suggested-by: Thomas Weißschuh <thomas@t-8ch.de>
Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/all/c5de2d13-3752-4e1b-90d9-f58cca99c702@t-8ch.de/
Signed-off-by: Yuan Tan <tanyuan@tinylab.org>
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
[wt: fixed the "len" type to size_t to address a sign-compare warning
 with upcoming patches]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Yuan Tan
3ec38af6ee tools/nolibc: add pipe() and pipe2() support
According to manual page [1], posix spec [2] and source code like
arch/mips/kernel/syscall.c, for historic reasons, the sys_pipe() syscall
on some architectures has an unusual calling convention.  It returns
results in two registers which means there is no need for it to do
verify the validity of a userspace pointer argument.  Historically that
used to be expensive in Linux.  These days the performance advantage is
negligible.

Nolibc doesn't support the unusual calling convention above, luckily
Linux provides a generic sys_pipe2() with an additional flags argument
from 2.6.27. If flags is 0, then pipe2() is the same as pipe(). So here
we use sys_pipe2() to implement the pipe().

pipe2() is also provided to allow users to use flags argument on demand.

[1]: https://man7.org/linux/man-pages/man2/pipe.2.html
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/pipe.html

Suggested-by: Zhangjin Wu <falcon@tinylab.org>
Link: https://lore.kernel.org/all/20230729100401.GA4577@1wt.eu/
Signed-off-by: Yuan Tan <tanyuan@tinylab.org>
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Zhangjin Wu
e7d0129df6 selftests/nolibc: mmap_munmap_good: fix up return value
The other tests use 1 as failure, mmap_munmap_good uses -1 as failure,
let's fix up this.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Thomas Weißschuh
447e56023f selftests/nolibc: avoid buffer underrun in space printing
If the test description is longer than the status alignment the
parameter 'n' to putcharn() would lead to a signed underflow that then
gets converted to a very large unsigned value.
This in turn leads out-of-bound writes in memset() crashing the
application.

The failure case of EXPECT_PTRER() used in "mmap_bad" exhibits this
exact behavior.

Fixes: 29f5540be3 ("selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 05:17:07 +02:00
Ryan Roberts
4893c22eb2 tools/nolibc/stdio: add setvbuf() to set buffering mode
Add a minimal implementation of setvbuf(), which error checks the mode
argument (as required by spec) and returns. Since nolibc never buffers
output, nothing needs to be done.

The kselftest framework recently added a call to setvbuf(). As a result,
any tests that use the kselftest framework and nolibc cause a compiler
error due to missing function. This provides an urgent fix for the
problem which is preventing arm64 testing on linux-next.

Example:

clang --target=aarch64-linux-gnu -fintegrated-as
-Werror=unknown-warning-option -Werror=ignored-optimization-argument
-Werror=option-ignored -Werror=unused-command-line-argument
--target=aarch64-linux-gnu -fintegrated-as
-fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \
-include ../../../../include/nolibc/nolibc.h -I../..\
-static -ffreestanding -Wall za-fork.c
build/kselftest/arm64/fp/za-fork-asm.o
-o build/kselftest/arm64/fp/za-fork
In file included from <built-in>:1:
In file included from ./../../../../include/nolibc/nolibc.h:97:
In file included from ./../../../../include/nolibc/arch.h:25:
./../../../../include/nolibc/arch-aarch64.h:178:35: warning: unknown
attribute 'optimize' ignored [-Wunknown-attributes]
void __attribute__((weak,noreturn,optimize("omit-frame-pointer")))
__no_stack_protector _start(void)
                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from za-fork.c:12:
../../kselftest.h:123:2: error: call to undeclared function 'setvbuf';
ISO C99 and later do not support implicit function declarations
[-Wimplicit-function-declaration]
        setvbuf(stdout, NULL, _IOLBF, 0);
        ^
../../kselftest.h:123:24: error: use of undeclared identifier '_IOLBF'
        setvbuf(stdout, NULL, _IOLBF, 0);
                              ^
1 warning and 2 errors generated.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Link: https://lore.kernel.org/linux-kselftest/CA+G9fYus3Z8r2cg3zLv8uH8MRrzLFVWdnor02SNr=rCz+_WGVg@mail.gmail.com/
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
850fad7de8 selftests/nolibc: allow test -include /path/to/nolibc.h
As the head comment of nolibc-test.c shows, it can be built in 3 ways:

    $(CC) -nostdlib -include /path/to/nolibc.h => NOLIBC already defined
    $(CC) -nostdlib -I/path/to/nolibc/sysroot  => _NOLIBC_* guards are present
    $(CC) with default libc                    => NOLIBC* never defined

Only last two of them are tested currently, let's allow test the first one too.

This may help to find issues about using nolibc.h to build programs. it
derives from this change:

    commit 3a8039e289 ("tools/nolibc: Fix build of stdio.h due to header ordering")

Usage:

    // test with sysroot by default
    $ make run-user

    // test without sysroot, using nolibc.h directly
    $ make run-user NOLIBC_SYSROOT=0

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
b81434073b selftests/nolibc: allow run nolibc-test locally
It is able to run nolibc-test directly without qemu-user when the target
machine is the same as the host machine.

Sometimes, the result running locally may help a lot when the qemu-user
package is too old.

When the target machine differs from the host machine, it is also able
to run nolibc-test directly with qemu-user-static + binfmt_misc.

Link: https://lore.kernel.org/lkml/ZKutZwIOfy5MqedG@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
48967b73f8 selftests/nolibc: add testcases for startup code
The startup code is critical to get the right argc, argv, envp/environ
and _auxv, let's add a startup test group and the corresponding
testcases.

The "environ" test case is also moved from the stdlib test group to this
new startup test group and it is renamed to "environ_envp".

Since argv0 has been used by many other test cases, let's add testcases
to gurantee it too.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
fd3a9efde8 selftests/nolibc: add EXPECT_PTRGE, EXPECT_PTRGT, EXPECT_PTRLE, EXPECT_PTRLT
4 new pointer compare macros are added, they are similar to the integer
compare macros.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
c48d8af2fa tools/nolibc: s390: shrink _start with _start_c
move most of the _start operations to _start_c(), include the
stackprotector initialization.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
eea70cdac6 tools/nolibc: riscv: shrink _start with _start_c
move most of the _start operations to _start_c(), include the
stackprotector initialization.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
61bd4621c0 tools/nolibc: loongarch: shrink _start with _start_c
move most of the _start operations to _start_c(), include the
stackprotector initialization.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
431b806b9b tools/nolibc: mips: shrink _start with _start_c
move most of the _start operations to _start_c(), include the
stackprotector initialization.

Also clean up the instructions in delayed slots.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
539287d751 tools/nolibc: x86_64: shrink _start with _start_c
move most of the _start operations to _start_c(), include the
stackprotector initialization.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
2ab446336b tools/nolibc: i386: shrink _start with _start_c
move most of the _start operations to _start_c(), include the
stackprotector initialization.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
ded8af47c2 tools/nolibc: aarch64: shrink _start with _start_c
move most of the _start operations to _start_c(), include the
stackprotector initialization.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
61f9880721 tools/nolibc: arm: shrink _start with _start_c
move most of the _start operations to _start_c(), include the
stackprotector initialization.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
06f2a62c81 tools/nolibc: crt.h: initialize stack protector
As suggested by Thomas, It is able to move the stackprotector
initialization from the assembly _start to the beginning of the new
_start_c(). Let's call __stack_chk_init() in _start_c() as a
preparation.

Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/a00284a6-54b1-498c-92aa-44997fa78403@t-8ch.de/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
d7f16723d3 tools/nolibc: stackprotector.h: add empty __stack_chk_init for !_NOLIBC_STACKPROTECTOR
Let's define an empty __stack_chk_init for the !_NOLIBC_STACKPROTECTOR
branch.

This allows to remove #ifdef around every call of __stack_chk_init().

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
1733675515 tools/nolibc: add new crt.h with _start_c
As the environ and _auxv support added for nolibc, the assembly _start
function becomes more and more complex and therefore makes the porting
of nolibc to new architectures harder and harder.

To simplify portability, this C version of _start_c() is added to do
most of the assembly start operations in C, which reduces the complexity
a lot and will eventually simplify the porting of nolibc to the new
architectures.

The new _start_c() only requires a stack pointer argument, it will find
argc, argv, envp/environ and _auxv for us, and then call main(),
finally, it exit() with main's return status. With this new _start_c(),
the future new architectures only require to add very few assembly
instructions.

As suggested by Thomas, users may use a different signature of main
(e.g. void main(void)), a _nolibc_main alias is added for main to
silence the warning about potential conflicting types.

As suggested by Willy, the code is carefully polished for both smaller
size and better readability with local variables and the right types.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230715095729.GC24086@1wt.eu/
Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/90fdd255-32f4-4caf-90ff-06456b53dac3@t-8ch.de/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
af93807eae tools/nolibc: remove the old sys_stat support
The statx manpage [1] shows that it has been supported from Linux 4.11
and glibc 2.28, the Linux support can be checked for all of the
architectures with this command:

    $ git grep -r statx v4.11 arch/ include/uapi/asm-generic/unistd.h \
      | grep -E "aarch64|arm|mips|s390|x86|:include/uapi"

Besides riscv and loongarch, all of the nolibc supported architectures
have added sys_statx from Linux v4.11. riscv is mainlined to v4.15,
loongarch is mainlined to v5.19, both of them use the generic unistd.h,
so, they have added sys_statx from their first mainline versions.

The current oldest stable branch is v4.14, only reserving sys_statx
still preserves compatibility with all of the supported stable branches,
So, let's remove the old arch related and dependent sys_stat support
completely.

This is friendly to the future new architecture porting.

[1]: https://man7.org/linux/man-pages/man2/statx.2.html

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
bff60150f7 tools/nolibc: fix up startup failures for -O0 under gcc < 11.1.0
As gcc doc [1] shows:

  Most optimizations are completely disabled at -O0 or if an -O level is
  not set on the command line, even if individual optimization flags are
  specified.

Test result [2] shows, gcc>=11.1.0 deviates from the above description,
but before gcc 11.1.0, "-O0" still forcely uses frame pointer in the
_start function even if the individual optimize("omit-frame-pointer")
flag is specified.

The frame pointer related operations will change the stack pointer (e.g.
In x86_64, an extra "push %rbp" will be inserted at the beginning of
_start) and make it differs from the one we expected, as a result, break
the whole startup function.

To fix up this issue, as suggested by Thomas, the individual "Os" and
"omit-frame-pointer" optimize flags are used together on _start function
to disable frame pointer completely even if the -O0 is set on the
command line.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
[2]: https://lore.kernel.org/lkml/20230714094723.140603-1-falcon@tinylab.org/

Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/34b21ba5-7b59-4b3b-9ed6-ef9a3a5e06f7@t-8ch.de/
Fixes: 7f85485896 ("tools/nolibc: make compiler and assembler agree on the section around _start")
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
2023349835 tools/nolibc: arch-*.h: add missing space after ','
Fix up such errors reported by scripts/checkpatch.pl:

    ERROR: space required after that ',' (ctx:VxV)
    #148: FILE: tools/include/nolibc/arch-aarch64.h:148:
    +void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void)
                             ^

    ERROR: space required after that ',' (ctx:VxV)
    #148: FILE: tools/include/nolibc/arch-aarch64.h:148:
    +void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void)
                                      ^

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Thomas Weißschuh
ceb528feb7 selftests/nolibc: avoid gaps in test numbers
As the test numbers are based on line numbers gaps without testcases are
to be avoided.
Instead use the already existing test condition logic to implement
conditional execution.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Thomas Weißschuh
b184a261e5 selftests/nolibc: simplify status printing
pad_spc() is only ever used to print the status message of testcases.
The line size is always constant, the return value is never used and the
format string is never used as such.

Remove all the unneeded logic and simplify the API and its users.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Thomas Weißschuh
3097783ecf selftests/nolibc: make evaluation of test conditions
If "cond" is a multi-token statement the behavior of the preprocessor
will lead to the negation "!" to be only applied to the first token.
Although currently no test uses such multi-token conditions but it can
happen at any time.

Put braces around "cond" to ensure the negation works as expected.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Thomas Weißschuh
67d108e2a2 tools/nolibc: completely remove optional environ support
In commit 52e423f5b9 ("tools/nolibc: export environ as a weak symbol on i386")
and friends the asm startup logic was extended to directly populate the
"environ" array.

This makes it impossible for "environ" to be dropped by the linker.
Therefore also drop the other logic to handle non-present "environ".

Also add a testcase to validate the initialization of environ.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
4beb9be811 selftests/nolibc: report: add newline before test failures
a newline is inserted just before the test failures to avoid mixing the
test failures with the raw test log.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
7d92e89363 selftests/nolibc: report: extrude the test status line
two newlines are added around the test summary line to extrude the test
status.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
0ac908e304 selftests/nolibc: report: align passed, skipped and failed
align the test values for different runs and different architectures.

Since the total number of tests is not bigger than 1000 currently, let's
align them with "%3d".

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
[wt: s/%03d/%3d/ as discussed with Zhangjin]
Link: https://lore.kernel.org/lkml/20230709185112.97236-1-falcon@tinylab.org/
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
c0faa0dace selftests/nolibc: report: print total tests
Let's count and print the total number of tests, now, the data of
passed, skipped and failed have the same format.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
c0315c79aa selftests/nolibc: report: print a summarized test status
one of the test status: success, warning and failure is printed to
summarize the passed, skipped and failed values.

- "success" means no skipped and no failed.
- "warning" means has at least one skipped and no failed.
- "failure" means all tests are failed.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230702164358.GB16233@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:22 +02:00
Zhangjin Wu
148e9718e2 selftests/nolibc: add chmod_argv0 test
argv0 is readable and chmodable, let's use it for chmod test, but a safe
umask should be used, the readable and executable modes should be
reserved.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:40:15 +02:00
Zhangjin Wu
135b622e48 selftests/nolibc: chroot_exe: remove procfs dependency
Since argv0 also works for CONFIG_PROC_FS=n, let's use it instead of
'/proc/self/exe'.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
f576d3c075 selftests/nolibc: stat_timestamps: remove procfs dependency
'/proc/self/' is a good path which doesn't have stale time info but it
is only available for CONFIG_PROC_FS=y.

When CONFIG_PROC_FS=n, use argv0 instead of '/proc/self', use '/' for the
worst case.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
38fc0a3553 selftests/nolibc: chdir_root: restore current path after test
The PWD environment variable has the path of the nolibc-test program,
the current path must be the same as it, otherwise, the test cases will
fail with relative path (e.g. ./nolibc-test).

Since only chdir_root really changes the current path, let's restore it
with the PWD environment variable.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
6861b1a339 selftests/nolibc: vfprintf: remove MEMFD_CREATE dependency
The vfprintf test case require to open a temporary file to write, the
old memfd_create() method is perfect but has strong dependency on
MEMFD_CREATE and also TMPFS or HUGETLBFS (see fs/Kconfig):

    config MEMFD_CREATE
	def_bool TMPFS || HUGETLBFS

And from v6.2, MFD_NOEXEC_SEAL must be passed for the non-executable
memfd, otherwise, The kernel warning will be output to the test result
like this:

        Running test 'vfprintf'
        0 emptymemfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=1 'init'
         "" = ""                                                  [OK]

To avoid such warning and also to remove the MEMFD_CREATE dependency,
let's open a file from tmpfs directly.

The /tmp directory is used to detect the existing of tmpfs, if not
there, skip instead of fail.

And further, for pid == 1, the initramfs is loaded as ramfs, which can
be used as tmpfs, so, it is able to further remove TMPFS dependency too.

Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/9ad51430-b7c0-47dc-80af-20c86539498d@t-8ch.de
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
bbb14546bd selftests/nolibc: prepare /tmp for tests that need to write
create a /tmp directory. If it succeeds, the directory is writable,
which is normally the case when booted from an initramfs anyway.

This will be used instead of procfs for some tests.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Link: https://lore.kernel.org/lkml/20230710050600.9697-1-falcon@tinylab.org/
[wt: removed the unneeded mount() call]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
b8b26108e4 selftests/nolibc: fix up failures when CONFIG_PROC_FS=n
For CONFIG_PROC_FS=n, the /proc is not mountable, but the /proc
directory has been created in the prepare() stage whenever /proc is
there or not.

so, the checking of /proc in the run_syscall() stage will be always true
and at last it will fail all of the procfs dependent test cases, which
deviates from the 'cond' check design of the EXPECT_xx macros, without
procfs, these test cases should be skipped instead of failed.

To solve this issue, one method is checking /proc/self instead of /proc,
another method is removing the /proc directory completely for
CONFIG_PROC_FS=n, we apply the second method to avoid misleading the
users.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
4e14e84442 selftests/nolibc: add a new rmdir() test case
A new rmdir_blah test case is added to remove a non-existing /blah,
which expects failure with ENOENT errno.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
f4191f3d52 tools/nolibc: add rmdir() support
a reverse operation of mkdir() is meaningful, add rmdir() here.

required by nolibc-test to remove /proc while CONFIG_PROC_FS is not
enabled.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
f7a419e35b selftests/nolibc: link_cross: use /proc/self/cmdline
For CONFIG_NET=n, there would be no /proc/self/net, so, use
/proc/self/cmdline instead.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
c388c9920d selftests/nolibc: fix up kernel parameters support
kernel parameters allow pass two types of strings, one type is like
'noapic', another type is like 'panic=5', the first type is passed as
arguments of the init program, the second type is passed as environment
variables of the init program.

when users pass kernel parameters like this:

    noapic NOLIBC_TEST=syscall

our nolibc-test program will use the test setting from argv[1] and
ignore the one from NOLIBC_TEST environment variable, and at last, it
will print the following line and ignore the whole test setting.

    Ignoring unknown test name 'noapic'

reversing the parsing order does solve the above issue:

    test = getenv("NOLIBC_TEST");
    if (test)
        test = argv[1];

but it still doesn't work with such kernel parameters (without
NOLIBC_TEST environment variable):

    noapic FOO=bar

To support all of the potential kernel parameters, let's verify the test
setting from both of argv[1] and NOLIBC_TEST environment variable.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
950add280c selftests/nolibc: prefer <sys/reboot.h> to <linux/reboot.h>
Since both glibc and musl provide RB_ flags via <sys/reboot.h>, and we
just add RB_ flags for nolibc, let's use RB_ flags instead of
LINUX_REBOOT_ flags and only reserve the required <sys/reboot.h> header.

This allows compile libc-test for musl libc without the linux headers.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
788aca91ab tools/nolibc: types.h: add RB_ flags for reboot()
Both glibc and musl provide RB_ flags via <sys/reboot.h> for reboot(),
they don't need to include <linux/reboot.h>, let nolibc provide RB_
flags too.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
989abf1c7b selftests/nolibc: fix up int_fast16/32_t test cases for musl
musl limits the fast signed int in 32bit, but glibc and nolibc don't, to
let such test cases work on musl, let's provide the type based
SINT_MAX_OF_TYPE(type) and SINT_MIN_OF_TYPE(type).

Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/bc635c4f-67fe-4e86-bfdf-bcb4879b928d@t-8ch.de/
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
5f2de00e2c selftests/nolibc: add _LARGEFILE64_SOURCE for musl
_GNU_SOURCE Implies _LARGEFILE64_SOURCE in glibc, but in musl, the
default configuration doesn't enable _LARGEFILE64_SOURCE.

>From include/dirent.h of musl, getdents64 is provided as getdents when
_LARGEFILE64_SOURCE is defined.

    #if defined(_LARGEFILE64_SOURCE)
    ...
    #define getdents64 getdents
    #endif

Let's define _LARGEFILE64_SOURCE to fix up this compile error:

    tools/testing/selftests/nolibc/nolibc-test.c: In function ‘test_getdents64’:
    tools/testing/selftests/nolibc/nolibc-test.c:453:8: warning: implicit declaration of function ‘getdents64’; did you mean ‘getdents’? [-Wimplicit-function-declaration]
      453 |  ret = getdents64(fd, (void *)buffer, sizeof(buffer));
          |        ^~~~~~~~~~
          |        getdents
    /usr/bin/ld: /tmp/ccKILm5u.o: in function `test_getdents64':
    nolibc-test.c:(.text+0xe3e): undefined reference to `getdents64'
    collect2: error: ld returned 1 exit status

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
79b4f68e9e selftests/nolibc: gettid: restore for glibc and musl
As the gettid manpage [1] shows, glibc 2.30 has gettid support, so,
let's enable the test for glibc >= 2.30.

gettid works on musl too.

[1]: https://man7.org/linux/man-pages/man2/gettid.2.html

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
46cf630c53 selftests/nolibc: stat_fault: silence NULL argument warning with glibc
Use another invalid address (void *)1 instead of NULL to silence this
compile warning with glibc:

    $ make libc-test
      CC      libc-test
    nolibc-test.c: In function ‘run_syscall’:
    nolibc-test.c:622:49: warning: null argument where non-null required (argument 1) [-Wnonnull]
      622 |   CASE_TEST(stat_fault);        EXPECT_SYSER(1, stat(NULL, &stat_buf), -1, EFAULT); break;
          |                                                 ^~~~
    nolibc-test.c:304:79: note: in definition of macro ‘EXPECT_SYSER2’
      304 |  do { if (!cond) pad_spc(llen, 64, "[SKIPPED]\n"); else ret += expect_syserr2(expr, expret, experr1, experr2, llen); } while (0)
          |                                                                               ^~~~
    nolibc-test.c:622:33: note: in expansion of macro ‘EXPECT_SYSER’
      622 |   CASE_TEST(stat_fault);        EXPECT_SYSER(1, stat(NULL, &stat_buf), -1, EFAULT); break;

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
cfb672f94f selftests/nolibc: add run-libc-test target
allow run and report glibc or musl based libc-test.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
fcdbf5dda4 selftests/nolibc: add mmap_munmap_good test case
mmap() a file with a good offset and then munmap() it. a non-zero offset
is passed to test the 6th argument of my_syscall6().

Note, it is not easy to find a unique file for mmap() in different
scenes, so, a file list is used to search the right one:

- /dev/zero: is commonly used to allocate anonymous memory and is likely
  present and readable

- /proc/1/exe: for 'run' and 'run-user' target, 'run-user' can not find
  '/proc/self/exe'

- /proc/self/exe: for 'libc-test' target, normal program 'libc-test' has
  no permission to access '/proc/1/exe'

- argv0: the path of the program itself, let it pass even with worst
  case scene: no procfs and no /dev/zero

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230702193306.GK16233@1wt.eu/
Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/bff82ea6-610b-4471-a28b-6c76c28604a6@t-8ch.de/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
ba3d0892be selftests/nolibc: add munmap_bad test case
The addr argument of munmap() must be a multiple of the page size,
passing invalid (void *)1 addr expects failure with -EINVAL.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
d4a3b2b998 selftests/nolibc: add mmap_bad test case
The length argument of mmap() must be greater than 0, passing a zero
length argument expects failure with -EINVAL.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
f193ecbff0 selftests/nolibc: add sbrk_0 to test current brk getting
>From musl 0.9.14 (to the latest version 1.2.3), both sbrk() and brk()
have almost been disabled for they conflict with malloc, only sbrk(0) is
still permitted as a way to get the current location of the program
break, let's support such case.

EXPECT_PTRNE() is used to expect sbrk() always successfully getting the
current break.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
29f5540be3 selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER
The syscalls like sbrk() and mmap() return pointers, to test them, more
pointer compare test macros are required, add them:

- EXPECT_PTREQ() expects two equal pointers.
- EXPECT_PTRNE() expects two non-equal pointers.
- EXPECT_PTRER() expects failure with a specified errno.
- EXPECT_PTRER2() expects failure with one of two specified errnos.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
82e339c230 selftests/nolibc: prepare: create /dev/zero
/dev/zero is commonly used to allocate anonymous memory, it is a very
good file for tests, let's prepare it.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230702193306.GK16233@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
938b5b9833 selftests/nolibc: export argv0 for some tests
argv0 is the path to nolibc-test program itself, which is a very good
always existing readable file for some tests, let's export it.

Note, the path may be absolute or relative, please make sure the tests
work with both of them. If it is relative, we must make sure the current
path is the one specified by the PWD environment variable.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/ZKKbS3cwKcHgnGwu@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
4201cfce15 tools/nolibc: clean up sbrk() routine
Fix up the error reported by scripts/checkpatch.pl:

    ERROR: do not use assignment in if condition
    #95: FILE: tools/include/nolibc/sys.h:95:
    +	if ((ret = sys_brk(0)) && (sys_brk(ret + inc) == ret + inc))

Apply the new generic __sysret() to merge the SET_ERRNO() and return
lines.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
924e9539ae tools/nolibc: clean up mmap() routine
Do several cleanups together:

- Since all supported architectures have my_syscall6() now, remove the
  #ifdef check.

- Move the mmap() related macros to tools/include/nolibc/types.h and
  reuse most of them from <linux/mman.h>

- Apply the new generic __sysret() to convert the calling of sys_map()
  to oneline code

Note, since MAP_FAILED is -1 on Linux, so we can use the generic
__sysret() which returns -1 upon error and still satisfy user land that
checks for MAP_FAILED.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230702192347.GJ16233@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
6591be4a73 tools/nolibc: __sysret: support syscalls who return a pointer
No official reference states the errno range, here aligns with musl and
glibc and uses [-MAX_ERRNO, -1] instead of all negative ones.

- musl: src/internal/syscall_ret.c
- glibc: sysdeps/unix/sysv/linux/sysdep.h

The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc
defined in tools/include/nolibc/errno.h.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@1wt.eu/
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.aculab.com/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
6d1970e1ef tools/nolibc: add missing my_syscall6() for mips
It is able to pass the 6th argument like the 5th argument via the stack
for mips, let's add a new my_syscall6() now, see [1] for details:

  The mips/o32 system call convention passes arguments 5 through 8 on
  the user stack.

Both mmap() and pselect6() require my_syscall6().

[1]: https://man7.org/linux/man-pages/man2/syscall.2.html

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
8b9bdab635 tools/nolibc: arch-mips.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST
my_syscall<N> share the same long clobber list, define a macro for them.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
2dca615ade tools/nolibc: arch-loongarch.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST
my_syscall<N> share the same long clobber list, define a macro for them.

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
f09f1912e4 toolc/nolibc: arch-*.h: clean up whitespaces after __asm__
replace "__asm__  volatile" with "__asm__ volatile" and insert necessary
whitespace before "\" to make sure the lines are aligned.

    $ sed -i -e 's/__asm__  volatile ( /__asm__ volatile (  /g' tools/include/nolibc/*.h

Note, arch-s390.h uses post-tab instead of post-whitespaces, must avoid
insert whitespace just before the tabs:

    $ sed -i -e 's/__asm__  volatile (\t/__asm__ volatile (\t/g' tools/include/nolibc/arch-*.h

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Zhangjin Wu
f134c7066c tools/nolibc: arch-*.h: fix up code indent errors
More than 8 whitespaces of the code indent are replaced with "tab +
whitespaces" to fix up such errors reported by scripts/checkpatch.pl:

    ERROR: code indent should use tabs where possible
    #64: FILE: tools/include/nolibc/arch-mips.h:64:
    +^I                                                                      \$

    ERROR: code indent should use tabs where possible
    #72: FILE: tools/include/nolibc/arch-mips.h:72:
    +^I          "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9"  \$

This command is used:

    $ sed -i -e '/^\t*        /{s/        /\t/g}' tools/include/nolibc/arch-*.h

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Thomas Weißschuh
67eb617a8e selftests/nolibc: simplify call to ioperm
Since commit 53fcfafa8c ("tools/nolibc/unistd: add syscall()") nolibc
has support for syscall(2).
Use it to get rid of some ifdef-ery.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23 04:38:02 +02:00
Masami Hiramatsu (Google)
d892d3d3d8 selftests/ftrace: Add BTF fields access testcases
Add test cases for accessing the data structure fields using BTF info.
This includes the field access from parameters and retval, and accessing
string information.

Link: https://lore.kernel.org/all/169272161265.160970.14048619786574971276.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-23 09:41:55 +09:00
Masami Hiramatsu (Google)
08c9306fc2 tracing/fprobe-event: Assume fprobe is a return event by $retval
Assume the fprobe event is a return event if there is $retval is
used in the probe's argument without %return. e.g.

echo 'f:myevent vfs_read $retval' >> dynamic_events

then 'myevent' is a return probe event.

Link: https://lore.kernel.org/all/169272160261.160970.13613040161560998787.stgit@devnote2/

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-23 09:41:32 +09:00
Hao Luo
29d67fdebc libbpf: Free btf_vmlinux when closing bpf_object
I hit a memory leak when testing bpf_program__set_attach_target().
Basically, set_attach_target() may allocate btf_vmlinux, for example,
when setting attach target for bpf_iter programs. But btf_vmlinux
is freed only in bpf_object_load(), which means if we only open
bpf object but not load it, setting attach target may leak
btf_vmlinux.

So let's free btf_vmlinux in bpf_object__close() anyway.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230822193840.1509809-1-haoluo@google.com
2023-08-22 16:16:31 -07:00
Arnaldo Carvalho de Melo
7a46404b3c perf lzma: Convert some pr_err() to pr_debug() as callers already use pr_debug()
I noticed some error with:

  # perf list ex_ret_brn
  lzma: fopen failed on /usr/lib/modules/5.15.14-100.fc34.x86_64/kernel/net/bluetooth/bnep/bnep.ko.xz: 'No such file or directory'
  lzma: fopen failed on /usr/lib/modules/5.16.16-200.fc35.x86_64/kernel/drivers/gpu/drm/drm_kms_helper.ko.xz: 'No such file or directory'
  lzma: fopen failed on /usr/lib/modules/5.18.16-200.fc36.x86_64/kernel/arch/x86/crypto/crct10dif-pclmul.ko.xz: 'No such file or directory'
  lzma: fopen failed on /usr/lib/modules/5.16.16-200.fc35.x86_64/kernel/drivers/i2c/busses/i2c-piix4.ko.xz: 'No such file or directory'
  <BIG SNIP>

Then using 'perf probe' + 'perf trace' to debug 'perf list', it seems
its some inconsistency in the ~/.debug/ cache where broken build id
symlinks that ends up making it try to uncompress some kernel modules
using the lzma routines:

   395.309 perf/3594447 probe_perf:lzma_decompress_to_file(__probe_ip: 6118448, input_string: "/usr/lib/modules/5.18.17-200.fc36.x86_64/kernel/drivers/nvme/host/nvme.ko.xz")
                                       lzma_decompress_to_file (/var/home/acme/bin/perf)
                                       filename__decompress (/var/home/acme/bin/perf)
                                       filename__read_build_id (/var/home/acme/bin/perf)
                                       filename__sprintf_build_id (inlined)
                                       build_id_cache__valid_id (inlined)
                                       build_id_cache__list_all (/var/home/acme/bin/perf)
                                       print_sdt_events (/var/home/acme/bin/perf)
                                       cmd_list (/var/home/acme/bin/perf)
                                       run_builtin (/var/home/acme/bin/perf)
                                       handle_internal_command (inlined)
                                       run_argv (inlined)
                                       main (/var/home/acme/bin/perf)
                                       __libc_start_call_main (/usr/lib64/libc.so.6)
                                       __libc_start_main@@GLIBC_2.34 (/usr/lib64/libc.so.6)
                                       _start (/var/home/acme/bin/perf)

But callers of filename__decompress() already check its return and use
pr_debug(), so be consistent and make functions it calls also use
pr_debug().

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZOUD0+GkuCVkYF7n@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-22 16:53:32 -03:00
Kumar Kartikeya Dwivedi
fbc5bc4c8e selftests/bpf: Add test for bpf_obj_drop with bad reg->off
Add a selftest for the fix provided in the previous commit. Without the
fix, the selftest passes the verifier while it should fail. The special
logic for detecting graph root or node for reg->off and bypassing
reg->off == 0 guarantee for release helpers/kfuncs has been dropped.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230822175140.1317749-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-22 12:52:48 -07:00
Yonghong Song
fb30159426 selftests/bpf: Add a failure test for bpf_kptr_xchg() with local kptr
For a bpf_kptr_xchg() with local kptr, if the map value kptr type and
allocated local obj type does not match, with the previous patch,
the below verifier error message will be logged:
  R2 is of type <allocated local obj type> but <map value kptr type> is expected

Without the previous patch, the test will have unexpected success.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230822050058.2887354-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-22 09:43:55 -07:00
Mark Brown
0bbe06493b
Add cs42l43 PC focused SoundWire CODEC
Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>:

This patch chain adds support for the Cirrus Logic cs42l43 PC focused
SoundWire CODEC. The chain is currently based of Lee's for-mfd-next
branch.

This series is mostly just a resend keeping pace with the kernel under
it, except for a minor fixup in the ASoC stuff.

Thanks,
Charles

Charles Keepax (4):
  dt-bindings: mfd: cirrus,cs42l43: Add initial DT binding
  mfd: cs42l43: Add support for cs42l43 core driver
  pinctrl: cs42l43: Add support for the cs42l43
  ASoC: cs42l43: Add support for the cs42l43

Lucas Tanure (2):
  soundwire: bus: Allow SoundWire peripherals to register IRQ handlers
  spi: cs42l43: Add SPI controller support

 .../bindings/sound/cirrus,cs42l43.yaml        |  313 +++
 MAINTAINERS                                   |    4 +
 drivers/mfd/Kconfig                           |   23 +
 drivers/mfd/Makefile                          |    3 +
 drivers/mfd/cs42l43-i2c.c                     |   98 +
 drivers/mfd/cs42l43-sdw.c                     |  239 ++
 drivers/mfd/cs42l43.c                         | 1188 +++++++++
 drivers/mfd/cs42l43.h                         |   28 +
 drivers/pinctrl/cirrus/Kconfig                |   11 +
 drivers/pinctrl/cirrus/Makefile               |    2 +
 drivers/pinctrl/cirrus/pinctrl-cs42l43.c      |  609 +++++
 drivers/soundwire/bus.c                       |   32 +
 drivers/soundwire/bus_type.c                  |   12 +
 drivers/spi/Kconfig                           |    7 +
 drivers/spi/Makefile                          |    1 +
 drivers/spi/spi-cs42l43.c                     |  284 ++
 include/linux/mfd/cs42l43-regs.h              | 1184 +++++++++
 include/linux/mfd/cs42l43.h                   |  102 +
 include/linux/soundwire/sdw.h                 |    9 +
 include/sound/cs42l43.h                       |   17 +
 sound/soc/codecs/Kconfig                      |   16 +
 sound/soc/codecs/Makefile                     |    4 +
 sound/soc/codecs/cs42l43-jack.c               |  946 +++++++
 sound/soc/codecs/cs42l43-sdw.c                |   74 +
 sound/soc/codecs/cs42l43.c                    | 2278 +++++++++++++++++
 sound/soc/codecs/cs42l43.h                    |  131 +
 26 files changed, 7615 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/cirrus,cs42l43.yaml
 create mode 100644 drivers/mfd/cs42l43-i2c.c
 create mode 100644 drivers/mfd/cs42l43-sdw.c
 create mode 100644 drivers/mfd/cs42l43.c
 create mode 100644 drivers/mfd/cs42l43.h
 create mode 100644 drivers/pinctrl/cirrus/pinctrl-cs42l43.c
 create mode 100644 drivers/spi/spi-cs42l43.c
 create mode 100644 include/linux/mfd/cs42l43-regs.h
 create mode 100644 include/linux/mfd/cs42l43.h
 create mode 100644 include/sound/cs42l43.h
 create mode 100644 sound/soc/codecs/cs42l43-jack.c
 create mode 100644 sound/soc/codecs/cs42l43-sdw.c
 create mode 100644 sound/soc/codecs/cs42l43.c
 create mode 100644 sound/soc/codecs/cs42l43.h

--
2.30.2
2023-08-22 12:48:04 +01:00
Hangbin Liu
be80942465 selftests: bonding: do not set port down before adding to bond
Before adding a port to bond, it need to be set down first. In the
lacpdu test the author set the port down specifically. But commit
a4abfa627c ("net: rtnetlink: Enslave device before bringing it up")
changed the operation order, the kernel will set the port down _after_
adding to bond. So all the ports will be down at last and the test failed.

In fact, the veth interfaces are already inactive when added. This
means there's no need to set them down again before adding to the bond.
Let's just remove the link down operation.

Fixes: a4abfa627c ("net: rtnetlink: Enslave device before bringing it up")
Reported-by: Zhengchao Shao <shaozhengchao@huawei.com>
Closes: https://lore.kernel.org/netdev/a0ef07c7-91b0-94bd-240d-944a330fcabd@huawei.com/
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230817082459.1685972-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21 19:05:42 -07:00
Jiri Olsa
8909a9392b selftests/bpf: Add extra link to uprobe_multi tests
Attaching extra program to same functions system wide for api
and link tests.

This way we can test the pid filter works properly when there's
extra system wide consumer on the same uprobe that will trigger
the original uprobe handler.

We expect to have the same counts as before.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-29-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
d571efae0f selftests/bpf: Add uprobe_multi pid filter tests
Running api and link tests also with pid filter and checking
the probe gets executed only for specific pid.

Spawning extra process to trigger attached uprobes and checking
we get correct counts from executed programs.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-28-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
e7cf9a48f8 selftests/bpf: Add uprobe_multi cookie test
Adding test for cookies setup/retrieval in uprobe_link uprobes
and making sure bpf_get_attach_cookie works properly.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-27-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
85209e839f selftests/bpf: Add uprobe_multi usdt bench test
Adding test that attaches 50k usdt probes in usdt_multi binary.

After the attach is done we run the binary and make sure we get
proper amount of hits.

With current uprobes:

  # perf stat --null ./test_progs -n 254/6
  #254/6   uprobe_multi_test/bench_usdt:OK
  #254     uprobe_multi_test:OK
  Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

   Performance counter stats for './test_progs -n 254/6':

      1353.659680562 seconds time elapsed

With uprobe_multi link:

  # perf stat --null ./test_progs -n 254/6
  #254/6   uprobe_multi_test/bench_usdt:OK
  #254     uprobe_multi_test:OK
  Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

   Performance counter stats for './test_progs -n 254/6':

         0.322046364 seconds time elapsed

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-26-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
4cde2d8aa7 selftests/bpf: Add uprobe_multi usdt test code
Adding code in uprobe_multi test binary that defines 50k usdts
and will serve as attach point for uprobe_multi usdt bench test
in following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-25-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
3706919ee0 selftests/bpf: Add uprobe_multi bench test
Adding test that attaches 50k uprobes in uprobe_multi binary.

After the attach is done we run the binary and make sure we
get proper amount of hits.

The resulting attach/detach times on my setup:

  test_bench_attach_uprobe:PASS:uprobe_multi__open 0 nsec
  test_bench_attach_uprobe:PASS:uprobe_multi__attach 0 nsec
  test_bench_attach_uprobe:PASS:uprobes_count 0 nsec
  test_bench_attach_uprobe: attached in   0.346s
  test_bench_attach_uprobe: detached in   0.419s
  #262/5   uprobe_multi_test/bench_uprobe:OK

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-24-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
519dfeaf51 selftests/bpf: Add uprobe_multi test program
Adding uprobe_multi test program that defines 50k uprobe_multi_func_*
functions and will serve as attach point for uprobe_multi bench test
in following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-23-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
a93d22ea60 selftests/bpf: Add uprobe_multi link test
Adding uprobe_multi test for bpf_link_create attach function.

Testing attachment using the struct bpf_link_create_opts.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-22-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
ffc6890361 selftests/bpf: Add uprobe_multi api test
Adding uprobe_multi test for bpf_program__attach_uprobe_multi
attach function.

Testing attachment using glob patterns and via bpf_uprobe_multi_opts
paths/syms fields.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-21-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
75b3715720 selftests/bpf: Add uprobe_multi skel test
Adding uprobe_multi test for skeleton load/attach functions,
to test skeleton auto attach for uprobe_multi link.

Test that bpf_get_func_ip works properly for uprobe_multi
attachment.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-20-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:27 -07:00
Jiri Olsa
3830d04a74 selftests/bpf: Move get_time_ns to testing_helpers.h
We'd like to have single copy of get_time_ns used b bench and test_progs,
but we can't just include bench.h, because of conflicting 'struct env'
objects.

Moving get_time_ns to testing_helpers.h which is being included by both
bench and test_progs objects.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-19-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
5902da6d8a libbpf: Add uprobe multi link support to bpf_program__attach_usdt
Adding support for usdt_manager_attach_usdt to use uprobe_multi
link to attach to usdt probes.

The uprobe_multi support is detected before the usdt program is
loaded and its expected_attach_type is set accordingly.

If uprobe_multi support is detected the usdt_manager_attach_usdt
gathers uprobes info and calls bpf_program__attach_uprobe to
create all needed uprobes.

If uprobe_multi support is not detected the old behaviour stays.

Also adding usdt.s program section for sleepable usdt probes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-18-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
7e1b468123 libbpf: Add uprobe multi link detection
Adding uprobe-multi link detection. It will be used later in
bpf_program__attach_usdt function to check and use uprobe_multi
link over standard uprobe links.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-17-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
5bfdd32dd5 libbpf: Add support for u[ret]probe.multi[.s] program sections
Adding support for several uprobe_multi program sections
to allow auto attach of multi_uprobe programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-16-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
3140cf121c libbpf: Add bpf_program__attach_uprobe_multi function
Adding bpf_program__attach_uprobe_multi function that
allows to attach multiple uprobes with uprobe_multi link.

The user can specify uprobes with direct arguments:

  binary_path/func_pattern/pid

or with struct bpf_uprobe_multi_opts opts argument fields:

  const char **syms;
  const unsigned long *offsets;
  const unsigned long *ref_ctr_offsets;
  const __u64 *cookies;

User can specify 2 mutually exclusive set of inputs:

 1) use only path/func_pattern/pid arguments

 2) use path/pid with allowed combinations of:
    syms/offsets/ref_ctr_offsets/cookies/cnt

    - syms and offsets are mutually exclusive
    - ref_ctr_offsets and cookies are optional

Any other usage results in error.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-15-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
5054a303f8 libbpf: Add bpf_link_create support for multi uprobes
Adding new uprobe_multi struct to bpf_link_create_opts object
to pass multiple uprobe data to link_create attr uapi.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-14-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
e613d1d0f7 libbpf: Add elf_resolve_pattern_offsets function
Adding elf_resolve_pattern_offsets function that looks up
offsets for symbols specified by pattern argument.

The 'pattern' argument allows wildcards (*?' supported).

Offsets are returned in allocated array together with its
size and needs to be released by the caller.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-13-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
7ace84c689 libbpf: Add elf_resolve_syms_offsets function
Adding elf_resolve_syms_offsets function that looks up
offsets for symbols specified in syms array argument.

Offsets are returned in allocated array with the 'cnt' size,
that needs to be released by the caller.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-12-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
3774705db1 libbpf: Add elf symbol iterator
Adding elf symbol iterator object (and some functions) that follow
open-coded iterator pattern and some functions to ease up iterating
elf object symbols.

The idea is to iterate single symbol section with:

  struct elf_sym_iter iter;
  struct elf_sym *sym;

  if (elf_sym_iter_new(&iter, elf, binary_path, SHT_DYNSYM))
        goto error;

  while ((sym = elf_sym_iter_next(&iter))) {
        ...
  }

I considered opening the elf inside the iterator and iterate all symbol
sections, but then it gets more complicated wrt user checks for when
the next section is processed.

Plus side is the we don't need 'exit' function, because caller/user is
in charge of that.

The returned iterated symbol object from elf_sym_iter_next function
is placed inside the struct elf_sym_iter, so no extra allocation or
argument is needed.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
f90eb70d44 libbpf: Add elf_open/elf_close functions
Adding elf_open/elf_close functions and using it in
elf_find_func_offset_from_file function. It will be
used in following changes to save some common code.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-10-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:26 -07:00
Jiri Olsa
5c74272504 libbpf: Move elf_find_func_offset* functions to elf object
Adding new elf object that will contain elf related functions.
There's no functional change.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:25 -07:00
Jiri Olsa
8097e460ca libbpf: Add uprobe_multi attach type and link names
Adding new uprobe_multi attach type and link names,
so the functions can resolve the new values.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230809083440.3209381-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:25 -07:00
Jiri Olsa
b733eeade4 bpf: Add pid filter support for uprobe_multi link
Adding support to specify pid for uprobe_multi link and the uprobes
are created only for task with given pid value.

Using the consumer.filter filter callback for that, so the task gets
filtered during the uprobe installation.

We still need to check the task during runtime in the uprobe handler,
because the handler could get executed if there's another system
wide consumer on the same uprobe (thanks Oleg for the insight).

Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:25 -07:00
Jiri Olsa
0b779b61f6 bpf: Add cookies support for uprobe_multi link
Adding support to specify cookies array for uprobe_multi link.

The cookies array share indexes and length with other uprobe_multi
arrays (offsets/ref_ctr_offsets).

The cookies[i] value defines cookie for i-the uprobe and will be
returned by bpf_get_attach_cookie helper when called from ebpf
program hooked to that specific uprobe.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:25 -07:00
Jiri Olsa
89ae89f53d bpf: Add multi uprobe link
Adding new multi uprobe link that allows to attach bpf program
to multiple uprobes.

Uprobes to attach are specified via new link_create uprobe_multi
union:

  struct {
    __aligned_u64   path;
    __aligned_u64   offsets;
    __aligned_u64   ref_ctr_offsets;
    __u32           cnt;
    __u32           flags;
  } uprobe_multi;

Uprobes are defined for single binary specified in path and multiple
calling sites specified in offsets array with optional reference
counters specified in ref_ctr_offsets array. All specified arrays
have length of 'cnt'.

The 'flags' supports single bit for now that marks the uprobe as
return probe.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:25 -07:00
Jiri Olsa
c5487f8d91 bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum
Switching BPF_F_KPROBE_MULTI_RETURN macro to anonymous enum,
so it'd show up in vmlinux.h. There's not functional change
compared to having this as macro.

Acked-by: Yafang Shao <laoar.shao@gmail.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230809083440.3209381-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21 15:51:25 -07:00
David Hildenbrand
e5013f11c6 selftest/mm: ksm_functional_tests: Add PROT_NONE test
Let's test whether merging and unmerging in PROT_NONE areas works as
expected.

Pass a page protection to mmap_and_merge_range(), which will trigger
an mprotect() after writing to the pages, but before enabling merging.

Make sure that unsharing works as expected, by performing a ptrace write
(using /proc/self/mem) and by setting MADV_UNMERGEABLE.

Note that this implicitly tests that ptrace writes in an inaccessible
(PROT_NONE) mapping work as expected.

[david@redhat.com: use sizeof(i) in test_prot_none(), per Peter]
  Link: https://lkml.kernel.org/r/e9cdb144-70c7-6596-2377-e675635c94e0@redhat.com
Link: https://lkml.kernel.org/r/20230803143208.383663-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: liubo <liubo254@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 14:28:42 -07:00
David Hildenbrand
42096aa24b selftest/mm: ksm_functional_tests: test in mmap_and_merge_range() if anything got merged
Let's extend mmap_and_merge_range() to test if anything in the current
process was merged. range_maps_duplicates() is too unreliable for that
use case, so instead look at KSM stats.

Trigger a complete unmerge first, to cleanup the stable tree and
stabilize accounting of merged pages.

Note that we're using /proc/self/ksm_merging_pages instead of
/proc/self/ksm_stat, because that one is available in more existing
kernels.

If /proc/self/ksm_merging_pages can't be opened, we can't perform any
checks and simply skip them.

We have to special-case the shared zeropage for now. But the only user
-- test_unmerge_zero_pages() -- performs its own merge checks.

Link: https://lkml.kernel.org/r/20230803143208.383663-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: liubo <liubo254@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 14:28:42 -07:00
Andrew Morton
5994eabf3b merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
Randy Dunlap
ef815d2cba treewide: drop CONFIG_EMBEDDED
There is only one Kconfig user of CONFIG_EMBEDDED and it can be switched
to EXPERT or "if !ARCH_MULTIPLATFORM" (suggested by Arnd).

Link: https://lkml.kernel.org/r/20230816055010.31534-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>	[RISC-V]
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:46:25 -07:00
Hugh Dickins
daa60ae64c mm,thp: fix smaps THPeligible output alignment
Extract from current /proc/self/smaps output:

Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
ProtectionKey:         0

That's not the alignment shown in Documentation/filesystems/proc.rst: it's
an ugly artifact from missing out the %8 other fields are using; but
there's even one selftest which expects it to look that way.  Hoping no
other smaps parsers depend on THPeligible to look so ugly, fix these.

Link: https://lkml.kernel.org/r/cfb81f7a-f448-5bc2-b0e1-8136fcd1dd8c@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:38:01 -07:00
Aleksa Sarai
6469b66e3f selftests: improve vm.memfd_noexec sysctl tests
This adds proper tests for the nesting functionality of vm.memfd_noexec as
well as some minor cleanups to spawn_*_thread().

Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-5-7ff9e3e10ba6@cyphar.com
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:38:00 -07:00
Aleksa Sarai
202e14222f memfd: do not -EACCES old memfd_create() users with vm.memfd_noexec=2
Given the difficulty of auditing all of userspace to figure out whether
every memfd_create() user has switched to passing MFD_EXEC and
MFD_NOEXEC_SEAL flags, it seems far less distruptive to make it possible
for older programs that don't make use of executable memfds to run under
vm.memfd_noexec=2.  Otherwise, a small dependency change can result in
spurious errors.  For programs that don't use executable memfds, passing
MFD_NOEXEC_SEAL is functionally a no-op and thus having the same

In addition, every failure under vm.memfd_noexec=2 needs to print to the
kernel log so that userspace can figure out where the error came from. 
The concerns about pr_warn_ratelimited() spam that caused the switch to
pr_warn_once()[1,2] do not apply to the vm.memfd_noexec=2 case.

This is a user-visible API change, but as it allows programs to do
something that would be blocked before, and the sysctl itself was broken
and recently released, it seems unlikely this will cause any issues.

[1]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/

Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-2-7ff9e3e10ba6@cyphar.com
Fixes: 105ff5339f ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:59 -07:00
Aleksa Sarai
99f34659e7 selftests: memfd: error out test process when child test fails
Patch series "memfd: cleanups for vm.memfd_noexec", v2.

The most critical issue with vm.memfd_noexec=2 (the fact that passing
MFD_EXEC would bypass it entirely[1]) has been fixed in Andrew's
tree[2], but there are still some outstanding issues that need to be
addressed:

 * vm.memfd_noexec=2 shouldn't reject old-style memfd_create(2) syscalls
   because it will make it far to difficult to ever migrate. Instead it
   should imply MFD_EXEC.

 * The dmesg warnings are pr_warn_once(), which on most systems means
   that they will be used up by systemd or some other boot process and
   userspace developers will never see it.

   - For the !(flags & (MFD_EXEC | MFD_NOEXEC_SEAL)) case, outputting a
     rate-limited message to the kernel log is necessary to tell
     userspace that they should add the new flags.

     Arguably the most ideal way to deal with the spam concern[3,4]
     while still prompting userspace to switch to the new flags would be
     to only log the warning once per task or something similar.
     However, adding something to task_struct for tracking this would be
     needless bloat for a single pr_warn_ratelimited().

     So just switch to pr_info_ratelimited() to avoid spamming the log
     with something that isn't a real warning. There's lots of
     info-level stuff in dmesg, it seems really unlikely that this
     should be an actual problem. Most programs are already switching to
     the new flags anyway.

   - For the vm.memfd_noexec=2 case, we need to log a warning for every
     failure because otherwise userspace will have no idea why their
     previously working program started returning -EACCES (previously
     -EINVAL) from memfd_create(2). pr_warn_once() is simply wrong here.

 * The racheting mechanism for vm.memfd_noexec makes it incredibly
   unappealing for most users to enable the sysctl because enabling it
   on &init_pid_ns means you need a system reboot to unset it. Given the
   actual security threat being protected against, CAP_SYS_ADMIN users
   being restricted in this way makes little sense.

   The argument for this ratcheting by the original author was that it
   allows you to have a hierarchical setting that cannot be unset by
   child pidnses, but this is not accurate -- changing the parent
   pidns's vm.memfd_noexec setting to be more restrictive didn't affect
   children.

   Instead, switch the vm.memfd_noexec sysctl to be properly
   hierarchical and allow CAP_SYS_ADMIN users (in the pidns's owning
   userns) to lower the setting as long as it is not lower than the
   parent's effective setting. This change also makes it so that
   changing a parent pidns's vm.memfd_noexec will affect all
   descendants, providing a properly hierarchical setting. The
   performance impact of this is incredibly minimal since the maximum
   depth of pidns is 32 and it is only checked during memfd_create(2)
   and unshare(CLONE_NEWPID).

 * The memfd selftests would not exit with a non-zero error code when
   certain tests that ran in a forked process (specifically the ones
   related to MFD_EXEC and MFD_NOEXEC_SEAL) failed.

[1]: https://lore.kernel.org/all/ZJwcsU0vI-nzgOB_@codewreck.org/
[2]: https://lore.kernel.org/all/20230705063315.3680666-1-jeffxu@google.com/
[3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundation.org/


This patch (of 5):

Before this change, a test runner using this self test would see a return
code of 0 when the tests using a child process (namely the MFD_NOEXEC_SEAL
and MFD_EXEC tests) failed, masking test failures.

Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com
Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-1-7ff9e3e10ba6@cyphar.com
Fixes: 11f75a0144 ("selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Jeff Xu <jeffxu@google.com>
Cc: "Christian Brauner (Microsoft)" <brauner@kernel.org>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:59 -07:00
Rong Tao
708879a1b4 selftests/mm: fix uffd-stress help information
commit 686a8bb72349("selftests/mm: split uffd tests into uffd-stress and
uffd-unit-tests") split uffd tests into uffd-stress and uffd-unit-tests,
obviously we need to modify the help information synchronously.

Also modify code indentation.

Link: https://lkml.kernel.org/r/tencent_64FC724AC5F05568F41BD1C68058E83CEB05@qq.com
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:58 -07:00
SeongJae Park
9628ace840 selftests/damon/sysfs: test damon_target filter
Test existence of files and validity of input keyword for DAMON monitoring
target based DAMOS filter on DAMON sysfs interface.

Link: https://lkml.kernel.org/r/20230802214312.110532-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:37 -07:00
SeongJae Park
4c45c20d53 selftests/damon/sysfs: test address range damos filter
Add a selftest for checking existence of addr_{start,end} files under
DAMOS filter directory, and 'addr' damos filter type input of DAMON sysfs
interface.

Link: https://lkml.kernel.org/r/20230802214312.110532-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:36 -07:00
SeongJae Park
b823cb08e6 selftests/damon/sysfs: test tried_regions/total_bytes file
Update sysfs.sh DAMON selftest for checking existence of 'total_bytes'
file under the 'tried_regions' directory of DAMON sysfs interface.

Link: https://lkml.kernel.org/r/20230802213222.109841-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:34 -07:00
Ayush Jain
edb72f4e4f selftests: mm: add KSM_MERGE_TIME tests
Add KSM_MERGE_TIME and KSM_MERGE_TIME_HUGE_PAGES tests with
size of 100.

./run_vmtests.sh -t ksm
-----------------------------
running ./ksm_tests -H -s 100
-----------------------------
Number of normal pages:    0
Number of huge pages:    50
Total size:    100 MiB
Total time:    0.399844662 s
Average speed:  250.097 MiB/s
[PASS]
-----------------------------
running ./ksm_tests -P -s 100
-----------------------------
Total size:    100 MiB
Total time:    0.451931496 s
Average speed:  221.272 MiB/s
[PASS]

Link: https://lkml.kernel.org/r/20230728164102.4655-1-ayush.jain3@amd.com
Signed-off-by: Ayush Jain <ayush.jain3@amd.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:31 -07:00
Ayush Jain
1738b94962 selftests/mm: FOLL_LONGTERM need to be updated to 0x100
After commit 2c2241081f ("mm/gup: move private gup FOLL_ flags to
internal.h") FOLL_LONGTERM flag value got updated from 0x10000 to 0x100 at
include/linux/mm_types.h.

As hmm.hmm_device_private.hmm_gup_test uses FOLL_LONGTERM Updating same
here as well.

Before this change test goes in an infinite assert loop in
hmm.hmm_device_private.hmm_gup_test
==========================================================
 RUN           hmm.hmm_device_private.hmm_gup_test ...
hmm-tests.c:1962:hmm_gup_test:Expected HMM_DMIRROR_PROT_WRITE..
..(2) == m[2] (34)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
...
==========================================================

 Call Trace:
 <TASK>
 ? sched_clock+0xd/0x20
 ? __lock_acquire.constprop.0+0x120/0x6c0
 ? ktime_get+0x2c/0xd0
 ? sched_clock+0xd/0x20
 ? local_clock+0x12/0xd0
 ? lock_release+0x26e/0x3b0
 pin_user_pages_fast+0x4c/0x70
 gup_test_ioctl+0x4ff/0xbb0
 ? gup_test_ioctl+0x68c/0xbb0
 __x64_sys_ioctl+0x99/0xd0
 do_syscall_64+0x60/0x90
 ? syscall_exit_to_user_mode+0x2a/0x50
 ? do_syscall_64+0x6d/0x90
 ? syscall_exit_to_user_mode+0x2a/0x50
 ? do_syscall_64+0x6d/0x90
 ? irqentry_exit_to_user_mode+0xd/0x20
 ? irqentry_exit+0x3f/0x50
 ? exc_page_fault+0x96/0x200
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
 RIP: 0033:0x7f6aaa31aaff

After this change test is able to pass successfully.

Link: https://lkml.kernel.org/r/20230808124347.79163-1-ayush.jain3@amd.com
Fixes: 2c2241081f ("mm/gup: move private gup FOLL_ flags to internal.h")
Signed-off-by: Ayush Jain <ayush.jain3@amd.com>
Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:07:21 -07:00
Lucas Karpinski
60439471f3 selftests: cgroup: fix test_kmem_basic less than error
test_kmem_basic creates 100,000 negative dentries, with each one mapping
to a slab object.  After memory.high is set, these are reclaimed through
the shrink_slab function call which reclaims all 100,000 entries.  The
test passes the majority of the time because when slab1 or current is
calculated, it is often above 0, however, 0 is also an acceptable value.

Link: https://lkml.kernel.org/r/7d6gcuyzdjcice6qbphrmpmv5skr5jtglg375unnjxqhstvhxc@qkn6dw6bao6v
Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:07:21 -07:00
Kaige Ye
58a8d2edd5 perf stat-display: Check if snprintf()'s fmt argument is NULL
It is undefined behavior to pass NULL as snprintf()'s fmt argument.
Here is an example to trigger the problem:

  $ perf stat --metric-only -x, -e instructions -- sleep 1
  insn per cycle,
  Segmentation fault (core dumped)

With this patch:

  $ perf stat --metric-only -x, -e instructions -- sleep 1
  insn per cycle,
  ,

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kaige Ye <ye@kaige.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/01CA7674B690CA24+20230804020907.144562-2-ye@kaige.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 10:54:22 -03:00
Arnaldo Carvalho de Melo
7d9642311b perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(augmented_arg->value) is a power of two.
Similar to what was done in the previous cset for sizeof(saddr), we need
to make sure sizeof(augmented_arg->value) is a power of two to do bounds
checking using &=:

  augmented_len &= sizeof(augmented_arg->value) - 1;

Suggested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZONrPo0NSqdbXiGx@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 10:49:31 -03:00
Arnaldo Carvalho de Melo
262b54b6c9 perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(saddr) is a power of two.
We're using the BPF verifier suggestion:

    22: (85) call bpf_probe_read#4
    R2 min value is negative, either use unsigned or 'var &= const'

That works only when const is a (power of two - 1) so add an assert to
make sure that that is the case.

Suggested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZONrFmJBNlQpSpZj@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 10:48:58 -03:00
Ilkka Koskinen
a50b8db3ea perf vendor events arm64: AmpereOne: Remove unsupported events
Some of the events included in the ampereone/core-imp-def are not
supported on AmpereOne, remove them.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230803211331.140553-5-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 10:01:28 -03:00
Ilkka Koskinen
705ed54914 perf vendor events arm64: Add AmpereOne metrics
This patch adds AmpereOne metrics. The metrics also work around
the issue related to some of the events.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230803211331.140553-4-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 10:01:08 -03:00
Ilkka Koskinen
47715f2b62 perf vendor events arm64: AmpereOne: Mark affected STALL_* events impacted by errata
Per errata AC03_CPU_29, STALL_SLOT_FRONTEND, STALL_FRONTEND, and STALL
events are not counting as expected. The follow up metrics patch will
include correct way to calculate the impacted events.

Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230803211331.140553-3-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 10:01:08 -03:00
Ilkka Koskinen
b8af10062d perf vendor events arm64: Remove L1D_CACHE_LMISS from AmpereOne list
amperene/cache.json file tried to include L1D_CACHE_LMISS while it
doesn't exist in common-and-microarch.json. While this bug doesn't seem to
cause issue in newer kernels with jevents.py script, it prevents building
older perf tools with the backported patch.

Fixes: a9650b7f6f ("perf vendor events arm64: Add AmpereOne core PMU events")
Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Closes: https://lore.kernel.org/all/76bb2e47-ce44-76ae-838e-53279047084d@oracle.com/
Link: https://lore.kernel.org/r/20230803211331.140553-2-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 10:00:53 -03:00
John Garry
7298e87607 perf jevents: Raise exception for no definition of a arch std event
Recently Ilkka reported that the JSONs for the AmpereOne arm64-based
platform included a dud event which referenced a non-existent arch std
event [0].

Previously in the times of jevents.c, we would raise an exception for this.

This is still invalid, even though the current code just ignores such an
event.

Re-introduce code to raise an exception for when no definition exists to
help catch as many invalid JSONs as possible.

[0] https://lore.kernel.org/linux-perf-users/9e851e2a-26c7-ba78-cb20-be4337b2916a@oracle.com/

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20230807111631.3033102-1-john.g.garry@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-21 09:57:05 -03:00
Hangbin Liu
429b55b441 selftests: fib_test: add a test case for IPv6 source address delete
Add a test case for IPv6 source address delete.

As David suggested, add tests:
- Single device using src address
- Two devices with the same source address
- VRF with single device using src address
- VRF with two devices using src address

As Ido points out, in IPv6, the preferred source address is looked up in
the same VRF as the first nexthop device. This will give us similar results
to IPv4 if the route is installed in the same VRF as the nexthop device, but
not when the nexthop device is enslaved to a different VRF. So add tests:
- src address and nexthop dev in same VR
- src address and nexthop device in different VRF

The link local address delete logic is different from the global address.
It should only affect the associate device it bonds to. So add tests cases
for link local address testing.

Here is the test result:

IPv6 delete address route tests
    Single device using src address
    TEST: Prefsrc removed when src address removed on other device      [ OK ]
    Two devices with the same source address
    TEST: Prefsrc not removed when src address exist on other device    [ OK ]
    TEST: Prefsrc removed when src address removed on all devices       [ OK ]
    VRF with single device using src address
    TEST: Prefsrc removed when src address removed on other device      [ OK ]
    VRF with two devices using src address
    TEST: Prefsrc not removed when src address exist on other device    [ OK ]
    TEST: Prefsrc removed when src address removed on all devices       [ OK ]
    src address and nexthop dev in same VRF
    TEST: Prefsrc removed from VRF when source address deleted          [ OK ]
    TEST: Prefsrc in default VRF not removed                            [ OK ]
    TEST: Prefsrc not removed from VRF when source address exist        [ OK ]
    TEST: Prefsrc in default VRF removed                                [ OK ]
    src address and nexthop device in different VRF
    TEST: Prefsrc not removed from VRF when nexthop dev in diff VRF     [ OK ]
    TEST: Prefsrc not removed in default VRF                            [ OK ]
    TEST: Prefsrc removed from VRF when nexthop dev in diff VRF         [ OK ]
    TEST: Prefsrc removed in default VRF                                [ OK ]
    Table ID 0
    TEST: Prefsrc removed from default VRF when source address deleted  [ OK ]
    Link local source route
    TEST: Prefsrc not removed when delete ll addr from other dev        [ OK ]
    TEST: Prefsrc removed when delete ll addr                           [ OK ]
    TEST: Prefsrc not removed when delete ll addr from other dev        [ OK ]
    TEST: Prefsrc removed even ll addr still exist on other dev         [ OK ]

Tests passed:  19
Tests failed:   0

Suggested-by: Ido Schimmel <idosch@idosch.org>
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:27:22 +01:00
Hangbin Liu
c4cf2bc0d2 selftests: vrf_route_leaking: remove ipv6_ping_frag from default testing
As the initial commit 1a01727676 ("selftests: Add VRF route leaking
tests") said, the IPv6 MTU test fails as source address selection
picking ::1. Every time we run the selftest this one report failed.
There seems not much meaning  to keep reporting a failure for 3 years
that no one plan to fix/update. Let't just skip this one first. We can
add it back when the issue fixed.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:25:10 +01:00
Anh Tuan Phan
144e22e756 selftests/net: Add log.txt and tools to .gitignore
Update .gitignore to untrack tools directory and log.txt. "tools" is
generated in "selftests/net/Makefile" and log.txt is generated in
"selftests/net/gro.sh" when executing run_all_tests.

Signed-off-by: Anh Tuan Phan <tuananhlfc@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:15:41 +01:00
Jiri Pirko
f65f305ae0 tools: ynl-gen: use temporary file for rendering
Currently any error during render leads to output an empty file.
That is quite annoying when using tools/net/ynl/ynl-regen.sh
which git greps files with content of "YNL-GEN.." and therefore ignores
empty files. So once you fail to regen, you have to checkout the file.

Avoid that by rendering to a temporary file first, only at the end
copy the content to the actual destination.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 19:24:38 +01:00
Linus Torvalds
bf98bae3d8 - Use LEA ...%rsp instead of ADD %rsp in the Zen1/2 SRSO return sequence
as latter clobbers flags which interferes with fastop emulation in
   KVM, leading to guests freezing during boot
 
 - A fix for the DIV(0) quotient data leak on Zen1 to clear the divider
   buffers at the right time
 
 - Disable the SRSO mitigation on unaffected configurations as it got
   enabled there unnecessarily
 
 - Change .text section name to fix CONFIG_LTO_CLANG builds
 
 - Improve the optprobe indirect jmp check so that certain configurations
   can still be able to use optprobes at all
 
 - A serious and good scrubbing of the untraining routines by PeterZ:
  - Add proper speculation stopping traps so that objtool is happy
  - Adjust objtool to handle the new thunks
  - Make the thunk pointer assignable to the different untraining
    sequences at runtime, thus avoiding the alternative at the return
    thunk. It simplifies the code a bit too.
  - Add a entry_untrain_ret() main entry point which selects the
    respective untraining sequence
  - Rename things so that they're more clear
  - Fix stack validation with FRAME_POINTER=y builds
 
 - Fix static call patching to handle when a JMP to the return thunk is
   the last insn on the very last module memory page
 
 - Add more documentation about what each untraining routine does and
   why
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmTge4wACgkQEsHwGGHe
 VUpgwRAAgP1dAq4c/DuLQh/+Mao/pM+EiNxwoDTNJ27ZoRfXG5vLXF3++TRkmFKB
 ua+jEhkNTAH1xyF+um4exjUD2UC62UfNo4wBZPjl+jVmguHqpsNOsZj7M3+GRD+3
 vRWspaOnNPKOIVdtvftaS6J3YavFUolwZSRC9HCFQiriX5zV4BlMZEJxkWw6LNW6
 LeJt4qmbDXCIzmCRqEmtNBOhuWuMvhwWg9G1Aq4MLcHf+gHSEGNnY8Otl7YPPeqr
 ys9vE5hQ3NiUmBkGnhw+Mj3gGFCL2fzWF0XqY8VCTPcYTVRFen7BmelhJVm7RhAr
 wpXdyCU+bV4qrn2uRpBSbzH/DfxfQA2xbRtBR+L7x5ZbHamFyi17fN94AQv2WUXz
 7TUdooWPuJLPQ2CHAgSChTEF/CZBl6pYHEorHkzA1GqV0omMT7hg8GEHn17JGI5v
 FDPGYHuznsu59DhGNh7Wx4hLO10slvkSHly+se7eCaDr1hDIpJtiZLxn6n+SphZh
 qzYc+Pxa3UcgNSxqqfOBqDWQQNdoYqx1ONao8nWgjj+/y0eIEf27uqIDT/o5tb7E
 YejDq7xO00CartGm2g/0S0OvDvRTWbU0LoGMKNxo/HTD+goM8pa7vdE77g5NNSCy
 wG0BnFWni53p66JJzzxxgPG39OYu9NR6ilcOTYT9jlPT3ZMySYg=
 =ndko
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.5_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "Extraordinary embargoed times call for extraordinary measures. That's
  why this week's x86/urgent branch is larger than usual, containing all
  the known fallout fixes after the SRSO mitigation got merged.

  I know, it is a bit late in the game but everyone who has reported a
  bug stemming from the SRSO pile, has tested that branch and has
  confirmed that it fixes their bug.

  Also, I've run it on every possible hardware I have and it is looking
  good. It is running on this very machine while I'm typing, for 2 days
  now without an issue. Famous last words...

   - Use LEA ...%rsp instead of ADD %rsp in the Zen1/2 SRSO return
     sequence as latter clobbers flags which interferes with fastop
     emulation in KVM, leading to guests freezing during boot

   - A fix for the DIV(0) quotient data leak on Zen1 to clear the
     divider buffers at the right time

   - Disable the SRSO mitigation on unaffected configurations as it got
     enabled there unnecessarily

   - Change .text section name to fix CONFIG_LTO_CLANG builds

   - Improve the optprobe indirect jmp check so that certain
     configurations can still be able to use optprobes at all

   - A serious and good scrubbing of the untraining routines by PeterZ:
      - Add proper speculation stopping traps so that objtool is happy
      - Adjust objtool to handle the new thunks
      - Make the thunk pointer assignable to the different untraining
        sequences at runtime, thus avoiding the alternative at the
        return thunk. It simplifies the code a bit too.
      - Add a entry_untrain_ret() main entry point which selects the
        respective untraining sequence
      - Rename things so that they're more clear
      - Fix stack validation with FRAME_POINTER=y builds

   - Fix static call patching to handle when a JMP to the return thunk
     is the last insn on the very last module memory page

   - Add more documentation about what each untraining routine does and
     why"

* tag 'x86_urgent_for_v6.5_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/srso: Correct the mitigation status when SMT is disabled
  x86/static_call: Fix __static_call_fixup()
  objtool/x86: Fixup frame-pointer vs rethunk
  x86/srso: Explain the untraining sequences a bit more
  x86/cpu/kvm: Provide UNTRAIN_RET_VM
  x86/cpu: Cleanup the untrain mess
  x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
  x86/cpu: Rename original retbleed methods
  x86/cpu: Clean up SRSO return thunk mess
  x86/alternative: Make custom return thunk unconditional
  objtool/x86: Fix SRSO mess
  x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()
  x86/cpu: Fix __x86_return_thunk symbol type
  x86/retpoline,kprobes: Skip optprobe check for indirect jumps with retpolines and IBT
  x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG
  x86/srso: Disable the mitigation on unaffected configurations
  x86/CPU/AMD: Fix the DIV(0) initial fix attempt
  x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()
2023-08-19 10:46:02 +02:00
Ido Schimmel
f520489e99 selftests: mlxsw: Fix test failure on Spectrum-4
Remove assumptions about shared buffer cell size and instead query the
cell size from devlink. Adjust the test to send small packets that fit
inside a single cell.

Tested on Spectrum-{1,2,3,4}.

Fixes: 4735402173 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/f7dfbf3c4d1cb23838d9eb99bab09afaa320c4ca.1692268427.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:41:06 -07:00
Jakub Kicinski
7ff57803d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/sfc/tc.c
  fa165e1949 ("sfc: don't unregister flow_indr if it was never registered")
  3bf969e88a ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 12:44:56 -07:00
Arnaldo Carvalho de Melo
64917f4df0 perf trace: Use heuristic when deciding if a syscall tracepoint "const char *" field is really a string
'perf trace' tries to find BPF progs associated with a syscall that have
a signature that is similar to syscalls without one to try and reuse,
so, for instance, the 'open' signature can be reused with many other
syscalls that have as its first arg a string.

It uses the tracefs events format file for finding a signature that can
be reused, but then comes the "write" syscall with its second argument
as a "const char *":

  # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_write/format
  name: sys_enter_write
  ID: 746
  format:
  	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
  	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
  	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
  	field:int common_pid;	offset:4;	size:4;	signed:1;

  	field:int __syscall_nr;	offset:8;	size:4;	signed:1;
  	field:unsigned int fd;	offset:16;	size:8;	signed:0;
  	field:const char * buf;	offset:24;	size:8;	signed:0;
  	field:size_t count;	offset:32;	size:8;	signed:0;

  print fmt: "fd: 0x%08lx, buf: 0x%08lx, count: 0x%08lx", ((unsigned long)(REC->fd)), ((unsigned long)(REC->buf)), ((unsigned long)(REC->count))
  #

Which isn't a string (the man page for glibc has buf as "void *"), so we
have to use the name of the argument as an heuristic, to consider a
string just args that are "const char *" and that have in its name  the
"path", "file", etc substrings.

With that now it reuses:

  [root@quaco ~]# perf trace -v --max-events=1 |& grep Reus
  Reusing "open" BPF sys_enter augmenter for "stat"
  Reusing "open" BPF sys_enter augmenter for "lstat"
  Reusing "open" BPF sys_enter augmenter for "access"
  Reusing "connect" BPF sys_enter augmenter for "accept"
  Reusing "sendto" BPF sys_enter augmenter for "recvfrom"
  Reusing "connect" BPF sys_enter augmenter for "bind"
  Reusing "connect" BPF sys_enter augmenter for "getsockname"
  Reusing "connect" BPF sys_enter augmenter for "getpeername"
  Reusing "open" BPF sys_enter augmenter for "execve"
  Reusing "open" BPF sys_enter augmenter for "truncate"
  Reusing "open" BPF sys_enter augmenter for "chdir"
  Reusing "open" BPF sys_enter augmenter for "mkdir"
  Reusing "open" BPF sys_enter augmenter for "rmdir"
  Reusing "open" BPF sys_enter augmenter for "creat"
  Reusing "open" BPF sys_enter augmenter for "link"
  Reusing "open" BPF sys_enter augmenter for "unlink"
  Reusing "open" BPF sys_enter augmenter for "symlink"
  Reusing "open" BPF sys_enter augmenter for "readlink"
  Reusing "open" BPF sys_enter augmenter for "chmod"
  Reusing "open" BPF sys_enter augmenter for "chown"
  Reusing "open" BPF sys_enter augmenter for "lchown"
  Reusing "open" BPF sys_enter augmenter for "mknod"
  Reusing "open" BPF sys_enter augmenter for "statfs"
  Reusing "open" BPF sys_enter augmenter for "pivot_root"
  Reusing "open" BPF sys_enter augmenter for "chroot"
  Reusing "open" BPF sys_enter augmenter for "acct"
  Reusing "open" BPF sys_enter augmenter for "swapon"
  Reusing "open" BPF sys_enter augmenter for "swapoff"
  Reusing "open" BPF sys_enter augmenter for "delete_module"
  Reusing "open" BPF sys_enter augmenter for "setxattr"
  Reusing "open" BPF sys_enter augmenter for "lsetxattr"
  Reusing "openat" BPF sys_enter augmenter for "fsetxattr"
  Reusing "open" BPF sys_enter augmenter for "getxattr"
  Reusing "open" BPF sys_enter augmenter for "lgetxattr"
  Reusing "openat" BPF sys_enter augmenter for "fgetxattr"
  Reusing "open" BPF sys_enter augmenter for "listxattr"
  Reusing "open" BPF sys_enter augmenter for "llistxattr"
  Reusing "open" BPF sys_enter augmenter for "removexattr"
  Reusing "open" BPF sys_enter augmenter for "lremovexattr"
  Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr"
  Reusing "open" BPF sys_enter augmenter for "mq_open"
  Reusing "open" BPF sys_enter augmenter for "mq_unlink"
  Reusing "fsetxattr" BPF sys_enter augmenter for "add_key"
  Reusing "fremovexattr" BPF sys_enter augmenter for "request_key"
  Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch"
  Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "linkat"
  Reusing "open" BPF sys_enter augmenter for "symlinkat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat"
  Reusing "connect" BPF sys_enter augmenter for "accept4"
  Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at"
  Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2"
  Reusing "open" BPF sys_enter augmenter for "memfd_create"
  Reusing "fremovexattr" BPF sys_enter augmenter for "execveat"
  Reusing "fremovexattr" BPF sys_enter augmenter for "statx"
  [root@quaco ~]#

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZN5lrdeEdSMCn7hk@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-18 16:33:28 -03:00
Arnaldo Carvalho de Melo
83a0943b18 perf trace: Use the augmented_raw_syscall BPF skel only for tracing syscalls
It is possible to use 'perf trace' with tracepoints and in that case we
can't initialize/use the augmented_raw_syscalls BPF skel.

For instance, this usecase:

  # perf trace -e sched:*exec --max-events=5
         ? (         ): NetworkManager/1183  ... [continued]: poll())                                             = 1
     0.043 ( 0.007 ms): NetworkManager/1183 epoll_wait(epfd: 17<anon_inode:[eventpoll]>, events: 0x55555f90e920, maxevents: 6) = 0
     0.060 ( 0.007 ms): NetworkManager/1183 write(fd: 3<anon_inode:[eventfd]>, buf: 0x7ffc5a27cd30, count: 8)     = 8
     0.073 ( 0.005 ms): NetworkManager/1183 epoll_wait(epfd: 24<anon_inode:[eventpoll]>, events: 0x7ffc5a27cd20, maxevents: 2) = 1
     0.082 ( 0.010 ms): NetworkManager/1183 recvmmsg(fd: 26<socket:[30298]>, mmsg: 0x7ffc5a27caa0, vlen: 8)       = 1
  #

Where we want to trace just some sched tracepoints ending in 'exec' ends
up tracing all syscalls.

Fix it by checking existing trace->trace_syscalls boolean to see if we
need the augmenter.

A followup patch will move those sections of code used only with the
augmenter to separate functions, to get it cleaner and remove the goto,
done just for reviewing purposes.

With this patch in place the previous behaviour is restored: no syscalls
when we have other events and no syscall names:

  [root@quaco ~]# perf probe do_filp_open "filename=pathname->name:string"
  Added new event:
    probe:do_filp_open   (on do_filp_open with filename=pathname->name:string)

  You can now use it in all perf tools, such as:

	  perf record -e probe:do_filp_open -aR sleep 1

  [root@quaco ~]# perf trace --max-events=10 -e probe:do_filp_open sleep 1
     0.000 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache")
     0.056 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6")
     0.481 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive")
     0.501 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias")
     0.572 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION")
     0.581 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION")
     0.616 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib64/gconv/gconv-modules.cache")
     0.656 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_MEASUREMENT")
     0.664 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_MEASUREMENT")
     0.696 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE")
  [root@quaco ~]#

As well as mixing syscalls with tracepoints, getting the syscall
tracepoints used augmented using the BPF skel:

  [root@quaco ~]# perf trace --max-events=10 -e open*,probe:do_filp_open sleep 1
     0.000 (         ): sleep/455124 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ...
     0.005 (         ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache")
     0.000 ( 0.011 ms): sleep/455124  ... [continued]: openat())                                           = 3
     0.031 (         ): sleep/455124 openat(dfd: CWD, filename: "/lib64/libc.so.6", flags: RDONLY|CLOEXEC) ...
     0.033 (         ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6")
     0.031 ( 0.006 ms): sleep/455124  ... [continued]: openat())                                           = 3
     0.258 (         ): sleep/455124 openat(dfd: CWD, filename: "/usr/lib/locale/locale-archive", flags: RDONLY|CLOEXEC) ...
     0.261 (         ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive")
     0.258 ( 0.006 ms): sleep/455124  ... [continued]: openat())                                           = -1 ENOENT (No such file or directory)
     0.272 (         ): sleep/455124 openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) ...
     0.273  (        ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias")

A final note: the probe:do_filp_open uses a kprobe (probably optimized
as its in the start of a function) that uses the kprobe_tracer mechanism
in the kernel to collect the pathname->name string and stash it into the
tracepoint created by 'perf probe' for that:

  [root@quaco ~]# cat /sys/kernel/debug/tracing/kprobe_events
  p:probe/do_filp_open _text+4621920 filename=+0(+0(%si)):string
  [root@quaco ~]#

While the syscalls:sys_enter_openat tracepoint gets its string from a
BPF program attached to raw_syscalls:sys_enter that tail calls into
another BPF program that knows the types for the openat syscall args and
thus can bpf_probe_read it right after the normal
sys_enter/sys_enter_openat tracepoint payload that comes prefixed with
whatever perf_event_open asked for (CPU, timestamp, etc):

  [root@quaco ~]# bpftool prog | grep -E "sys_enter |sys_enter_opena" -A3
  3176: tracepoint  name sys_enter  tag 0bc3fc9d11754ba1  gpl
	loaded_at 2023-08-17T12:32:20-0300  uid 0
	xlated 272B  jited 257B  memlock 4096B  map_ids 2462,2466,2463
	btf_id 2976
  --
  3180: tracepoint  name sys_enter_opena  tag 19dd077f00ec2f58  gpl
	  loaded_at 2023-08-17T12:32:20-0300  uid 0
	  xlated 328B  jited 206B  memlock 4096B  map_ids 2466,2465
	  btf_id 2976
  [root@quaco ~]#

Fixes: 5e6da6be30 ("perf trace: Migrate BPF augmentation to use a skeleton")
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/lkml/ZN4+s2Wl+zYmXTDj@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-18 16:33:12 -03:00
Yonghong Song
0a55264cf9 selftests/bpf: Fix a selftest compilation error
When building the kernel and selftest with clang compiler (llvm17 or llvm18),
I hit the following compilation failure:
  In file included from progs/test_lwt_redirect.c:3:
  In file included from /usr/include/linux/ip.h:21:
  In file included from /usr/include/asm/byteorder.h:5:
  In file included from /usr/include/linux/byteorder/little_endian.h:13:
  /usr/include/linux/swab.h:136:8: error: unknown type name '__always_inline'
    136 | static __always_inline unsigned long __swab(const unsigned long y)
        |        ^
  /usr/include/linux/swab.h:171:8: error: unknown type name '__always_inline'
    171 | static __always_inline __u16 __swab16p(const __u16 *p)
  ...

bpf_helpers.h file provided a definition for __always_inline.
Putting 'ip.h' after 'bpf_helpers.h' fixed the issue.

Fixes: 43a7c3ef8a ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230818174312.1883381-1-yonghong.song@linux.dev
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-18 12:19:22 -07:00
Alexey Dobriyan
f58a2dd8d5 proc: skip proc-empty-vm on anything but amd64 and i386
This test is arch specific, requires "munmap everything" primitive.

Link: https://lkml.kernel.org/r/20230630183434.17434-2-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:18:50 -07:00
Alexey Dobriyan
4356b11ec0 proc: support proc-empty-vm test on i386
Unmap everything starting from 4GB length until it unmaps, otherwise test
has to detect which virtual memory split kernel is using.

Link: https://lkml.kernel.org/r/20230630183434.17434-1-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:18:49 -07:00
Liam R. Howlett
0b8bb544b1 maple_tree: update mas_preallocate() testing
Since the mas_preallocate() calculation has been updated to be more
precise, the testing must also be updated to check for what is expected.

Link: https://lkml.kernel.org/r/20230724183157.3939892-13-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:49 -07:00
Liam R. Howlett
da0892547b maple_tree: re-introduce entry to mas_preallocate() arguments
The current preallocation strategy is to preallocate the absolute
worst-case allocation for a tree modification.  The entry (or NULL) is
needed to know how many nodes are needed to write to the tree.  Start by
adding the argument to the mas_preallocate() definition.

Link: https://lkml.kernel.org/r/20230724183157.3939892-8-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:48 -07:00
Ryan Roberts
05f1edac80 selftests/mm: run all tests from run_vmtests.sh
It is very unclear to me how one is supposed to run all the mm selftests
consistently and get clear results.

Most of the test programs are launched by both run_vmtests.sh and
run_kselftest.sh:

  hugepage-mmap
  hugepage-shm
  map_hugetlb
  hugepage-mremap
  hugepage-vmemmap
  hugetlb-madvise
  map_fixed_noreplace
  gup_test
  gup_longterm
  uffd-unit-tests
  uffd-stress
  compaction_test
  on-fault-limit
  map_populate
  mlock-random-test
  mlock2-tests
  mrelease_test
  mremap_test
  thuge-gen
  virtual_address_range
  va_high_addr_switch
  mremap_dontunmap
  hmm-tests
  madv_populate
  memfd_secret
  ksm_tests
  ksm_functional_tests
  soft-dirty
  cow

However, of this set, when launched by run_vmtests.sh, some of the
programs are invoked multiple times with different arguments. When
invoked by run_kselftest.sh, they are invoked without arguments (and as
a consequence, some fail immediately).

Some test programs are only launched by run_vmtests.sh:

  test_vmalloc.sh

And some test programs and only launched by run_kselftest.sh:

  khugepaged
  migration
  mkdirty
  transhuge-stress
  split_huge_page_test
  mdwe_test
  write_to_hugetlbfs

Furthermore, run_vmtests.sh is invoked by run_kselftest.sh, so in this
case all the test programs invoked by both scripts are run twice!

Needless to say, this is a bit of a mess. In the absence of fully
understanding the history here, it looks to me like the best solution is
to launch ALL test programs from run_vmtests.sh, and ONLY invoke
run_vmtests.sh from run_kselftest.sh. This way, we get full control over
the parameters, each program is only invoked the intended number of
times, and regardless of which script is used, the same tests get run in
the same way.

The only drawback is that if using run_kselftest.sh, it's top-level tap
result reporting reports only a single test and it fails if any of the
contained tests fail. I don't see this as a big deal though since we
still see all the nested reporting from multiple layers. The other issue
with this is that all of run_vmtests.sh must execute within a single
kselftest timeout period, so let's increase that to something more
suitable.

In the Makefile, TEST_GEN_PROGS will compile and install the tests and
will add them to the list of tests that run_kselftest.sh will run.
TEST_GEN_FILES will compile and install the tests but will not add them
to the test list. So let's move all the programs from TEST_GEN_PROGS to
TEST_GEN_FILES so that they are built but not executed by
run_kselftest.sh. Note that run_vmtests.sh is added to TEST_PROGS, which
means it ends up in the test list. (the lack of "_GEN" means it won't be
compiled, but simply copied).

Link: https://lkml.kernel.org/r/20230724082522.1202616-9-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:43 -07:00
Ryan Roberts
e170621027 selftests/mm: optionally pass duration to transhuge-stress
Until now, transhuge-stress runs until its explicitly killed, so when
invoked by run_kselftest.sh, it would run until the test timeout, then it
would be killed and the test would be marked as failed.

Add a new, optional command line parameter that allows the user to specify
the duration in seconds that the program should run.  The program exits
after this duration with a success (0) exit code.  If the argument is
omitted the old behacvior remains.

On it's own, this doesn't quite solve our problem because run_kselftest.sh
does not allow passing parameters to the program under test.  But we will
shortly move this to run_vmtests.sh, which does allow parameter passing.

Link: https://lkml.kernel.org/r/20230724082522.1202616-8-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:43 -07:00
Ryan Roberts
0003033297 selftests/mm: make migration test robust to failure
The `migration` test currently has a number of robustness problems that
cause it to hang and leak resources.

Timeout: There are 3 tests, which each previously ran for 60 seconds. 
However, the timeout in mm/settings for a single test binary was set to 45
seconds.  So when run using run_kselftest.sh, the top level timeout would
trigger before the test binary was finished.  Solve this by meeting in the
middle; each of the 3 tests now runs for 20 seconds (for a total of 60),
and the top level timeout is set to 90 seconds.

Leaking child processes: the `shared_anon` test fork()s some children but
then an ASSERT() fires before the test kills those children.  The assert
causes immediate exit of the parent and leaking of the children. 
Furthermore, if run using the run_kselftest.sh wrapper, the wrapper would
get stuck waiting for those children to exit, which never happens.  Solve
this by setting the "parent death signal" to SIGHUP in the child, so that
the child is killed automatically if the parent dies.

With these changes, the test binary now runs to completion on arm64, with
2 tests passing and the `shared_anon` test failing.

Link: https://lkml.kernel.org/r/20230724082522.1202616-7-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:43 -07:00
Ryan Roberts
49f09526b1 selftests/mm: va_high_addr_switch should skip unsupported arm64 configs
va_high_addr_switch has a mechanism to determine if the tests should be
run or skipped (supported_arch()).  This currently returns unconditionally
true for arm64.  However, va_high_addr_switch also requires a large
virtual address space for the tests to run, otherwise they spuriously
fail.

Since arm64 can only support VA > 48 bits when the page size is 64K, let's
decide whether we should skip the test suite based on the page size.  This
reduces noise when running on 4K and 16K kernels.

Link: https://lkml.kernel.org/r/20230724082522.1202616-6-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:43 -07:00
Ryan Roberts
6e16f51335 selftests/mm: fix thuge-gen test bugs
thuge-gen was previously only munmapping part of the mmapped buffer, which
caused us to run out of 1G huge pages for a later part of the test.  Fix
this by munmapping the whole buffer.  Based on the code, it looks like a
typo rather than an intention to keep some of the buffer mapped.

thuge-gen was also calling mmap with SHM_HUGETLB flag (bit 11 set), which
is actually MAP_DENYWRITE in mmap context.  The man page says this flag is
ignored in modern kernels.  I'm pretty sure from the context that the
author intended to pass the MAP_HUGETLB flag so I've fixed that up too.

Link: https://lkml.kernel.org/r/20230724082522.1202616-5-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:42 -07:00
Ryan Roberts
e515bce98d selftests/mm: enable mrelease_test for arm64
mrelease_test defaults to defining __NR_pidfd_open and
__NR_process_mrelease syscall numbers to -1, if they are not defined
anywhere else, and the suite would then be marked as skipped as a result.

arm64 (at least the stock debian toolchain that I'm using) requires
including <sys/syscall.h> to pull in the defines for these syscalls.  So
let's add this header.  With this in place, the test is passing on arm64.

Link: https://lkml.kernel.org/r/20230724082522.1202616-4-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:42 -07:00
Ryan Roberts
f6dd4e223d selftests/mm: skip soft-dirty tests on arm64
arm64 does not support the soft-dirty PTE bit.  However, the `soft-dirty`
test suite is currently run unconditionally and therefore generates
spurious test failures on arm64.  There are also some tests in
`madv_populate` which assume it is supported.

For `soft-dirty` lets disable the whole suite for arm64; it is no longer
built and run_vmtests.sh will skip it if its not present.

For `madv_populate`, we need a runtime mechanism so that the remaining
tests continue to be run.  Unfortunately, the only way to determine if the
soft-dirty dirty bit is supported is to write to a page, then see if the
bit is set in /proc/self/pagemap.  But the tests that we want to
conditionally execute are testing precicesly this.  So if we introduced
this feature check, we could accedentally turn a real failure (on a system
that claims to support soft-dirty) into a skip.  So instead, do the check
based on architecture; for arm64, we report that soft-dirty is not
supported.

Link: https://lkml.kernel.org/r/20230724082522.1202616-3-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:42 -07:00
Ryan Roberts
58e2847ad2 selftests: line buffer test program's stdout
Patch series "selftests/mm fixes for arm64", v3.

Given my on-going work on large anon folios and contpte mappings, I
decided it would be a good idea to start running mm selftests to help
guard against regressions.  However, it soon became clear that I
couldn't get the suite to run cleanly on arm64 with a vanilla v6.5-rc1
kernel (perhaps I'm just doing it wrong??), so got stuck in a rabbit
hole trying to debug and fix all the issues.  Some were down to
misconfigurations, but I also found a number of issues with the tests
and even a couple of issues with the kernel.


This patch (of 8):

The selftests runner pipes the test program's stdout to tap_prefix.  The
presence of the pipe means that the test program sets its stdout to be
fully buffered (as aposed to line buffered when directly connected to the
terminal).  The block buffering means that there is often content in the
buffer at fork() time, which causes the output to end up duplicated.  This
was causing problems for mm:cow where test results were duplicated 20-30x.

Solve this by using `stdbuf`, when available to force the test program to
use line buffered mode.  This means previously printf'ed results are
flushed out of the program before any fork().

Additionally, explicitly set line buffer mode in ksft_print_header(),
which means that all test programs that use the ksft framework will
benefit even if stdbuf is not present on the system.

[ryan.roberts@arm.com: add setvbuf() to set buffering mode]
  Link: https://lkml.kernel.org/r/20230726070655.2713530-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230724082522.1202616-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230724082522.1202616-2-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:42 -07:00
Jiaqi Yan
ba91e7e5d1 selftests/mm: add tests for HWPOISON hugetlbfs read
Add tests for the improvement made to read operation on HWPOISON
hugetlb page with different read granularities. For each chunk size,
three read scenarios are tested:
1. Simple regression test on read without HWPOISON.
2. Sequential read page by page should succeed until encounters the 1st
   raw HWPOISON subpage.
3. After skip a raw HWPOISON subpage by lseek, read()s always succeed.

Link: https://lkml.kernel.org/r/20230713001833.3778937-5-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:27 -07:00
Axel Rasmussen
99aa77215a selftests/mm: add uffd unit test for UFFDIO_POISON
The test is pretty basic, and exercises UFFDIO_POISON straightforwardly. 
We register a region with userfaultfd, in missing fault mode.  For each
fault, we either UFFDIO_COPY a zeroed page (odd pages) or UFFDIO_POISON
(even pages).  We do this mix to test "something like a real use case",
where guest memory would be some mix of poisoned and non-poisoned pages.

We read each page in the region, and assert that the odd pages are zeroed
as expected, and the even pages yield a SIGBUS as expected.

Why UFFDIO_COPY instead of UFFDIO_ZEROPAGE?  Because hugetlb doesn't
support UFFDIO_ZEROPAGE, and we don't want to have special case code.

Link: https://lkml.kernel.org/r/20230707215540.2324998-9-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:18 -07:00
Axel Rasmussen
7cf0f9e837 selftests/mm: refactor uffd_poll_thread to allow custom fault handlers
Previously, we had "one fault handler to rule them all", which used
several branches to deal with all of the scenarios required by all of the
various tests.

In upcoming patches, I plan to add a new test, which has its own slightly
different fault handling logic.  Instead of continuing to add cruft to the
existing fault handler, let's allow tests to define custom ones, separate
from other tests.

Link: https://lkml.kernel.org/r/20230707215540.2324998-8-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:17 -07:00
Jeff Xu
badbbcd765 selftests/memfd: sysctl: fix MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
Add selftest for sysctl vm.memfd_noexec is 2
(MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED)

memfd_create(.., MFD_EXEC) should fail in this case.

Link: https://lkml.kernel.org/r/20230705063315.3680666-3-jeffxu@google.com
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Closes: https://lore.kernel.org/linux-mm/CABi2SkXUX_QqTQ10Yx9bBUGpN1wByOi_=gZU6WEy5a8MaQY3Jw@mail.gmail.com/T/
Signed-off-by: Jeff Xu <jeffxu@google.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: kernel test robot <lkp@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:11 -07:00
xu xin
3d0745e59c selftest: add a testcase of ksm zero pages
Add a function test_unmerge_zero_page() to test the functionality on
unsharing and counting ksm-placed zero pages and counting of this patch
series.

test_unmerge_zero_page() actually contains four subjct test objects:
(1) whether the count of ksm zero pages can update correctly after merging;
(2) whether the count of ksm zero pages can update correctly after
    unmerging by madvise(...MADV_UNMERGEABLE);
(3) whether the count of ksm zero pages can update correctly after
	unmerging by triggering write fault.
(4) whether ksm zero pages are really unmerged.

Link: https://lkml.kernel.org/r/20230613030947.186089-1-yang.yang29@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:10 -07:00
Domenico Cerasuolo
d9cfaf405b selftests: cgroup: add zswap-memcg unwanted writeback test
Add a test to verify that when a memcg hits its limit in zswap, it doesn't
trigger an unwanted writeback that would result in pages not owned by that
memcg to be sent to disk, even if zswap isn't full.  This was fixed by
commit 0bdf0efa180a("zswap: do not shrink if cgroup may not zswap").

Link: https://lkml.kernel.org/r/20230621153548.428093-4-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:09 -07:00
Domenico Cerasuolo
a549f9f315 selftests: cgroup: add test_zswap with no kmem bypass test
Add a cgroup selftest that verifies memcg charging in zswap.  The original
issue was that kmem bypass was applied to pages swapped out to zswap by
kswapd, resulting in zswapped memory not being charged.  It was fixed by
commit cd08d80ecdac("mm: correctly charge compressed memory to its
memcg").

Link: https://lkml.kernel.org/r/20230621153548.428093-3-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:08 -07:00
Domenico Cerasuolo
fe3b1bf19b selftests: cgroup: add test_zswap program
Patch series "selftests: cgroup: add zswap test program".

This series adds 2 zswap related selftests that verify known and fixed
issues.  A new dedicated test program (test_zswap) is proposed since the
test cases are specific to zswap and hosts specific helpers.

The first patch adds the (empty) test program, while the other 2 add an
actual test function each.


This patch (of 3):

Add empty cgroup-zswap self test scaffold program, test functions to be
added in the next commits.

Link: https://lkml.kernel.org/r/20230621153548.428093-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230621153548.428093-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:08 -07:00
Peng Zhang
c38d9ff2cc maple_tree: add test for expanding range in RCU mode
Add test for expanding range in RCU mode. If we use the fast path of the
slot store to expand range in RCU mode, this test will fail.

Link: https://lkml.kernel.org/r/20230628073657.75314-3-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:05 -07:00
Peter Xu
de4ec376df selftests/mm: add gup test matrix in run_vmtests.sh
Add a matrix for testing gup based on the current gup_test.  Only run the
matrix when -a is specified because it's a bit slow.

It covers:

  - Different types of huge pages: thp, hugetlb, or no huge page
  - Permissions: Write / Read-only
  - Fast-gup, with/without
  - Types of the GUP: pin / gup / longterm pins
  - Shared / Private memories
  - GUP size: 1 / 512 / random page sizes

Link: https://lkml.kernel.org/r/20230628215310.73782-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A . Shutemov <kirill@shutemov.name>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:04 -07:00
Peter Xu
2bc4813622 selftests/mm: add -a to run_vmtests.sh
Allows to specify optional tests in run_vmtests.sh, where we can run time
consuming test matrix only when user specified "-a".

Link: https://lkml.kernel.org/r/20230628215310.73782-8-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A . Shutemov <kirill@shutemov.name>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:04 -07:00
Linus Torvalds
eabeef9054 asm-generic: regression fix for 6.5
Just one partial revert for a commit from the merge window
 that caused annoying behavior when building old kernels on
 arm64 hosts.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmTe0Z8ACgkQYKtH/8kJ
 UidZGQ/9GFtIHh9QK6XPlAlx3oC6HdFUrOqDwoueNd4sFk9nNAuW+Z/a0YX8DtnQ
 /SrJPiCtoPPNCMNlxk7ZDx+ra0Tu+iC7rAvRnogKSV8qB35jNLIqhEde2opCa2Dx
 sfmvtP0hr5c+b8UP9IN2U11WvJhv1yjldHtCKAjSf9zMtwPQjQvAIx1kzSlC57kE
 zjsKVTB3yXUddhYGR7639RT7/1K25BVcG6kfKIEN/uPm+M/ljk58EJhUhSeEEfBf
 NmtlhxWzJBySOTk5mjEVUsTAPrAckPI3Vh4sf0qd4rXe4Q+XX9Jvur9kPOoF8lQF
 GO+WXrmooaEir3+1+w4PB9xXgZD26Iz018xdhcigLgEGefieM1JpvwJa8NarK8UK
 iLg2+sBvCAJPJQ44E4r1hWf8ZFolEhwzg703/fGIjDPMQnhHNay4i73NyCw4ii2c
 F+iCFhea5AKz+d5qQuaZmg7klY/4cBIKDVY/EsO4lmK0187eq3zZKJSMLD21+NLy
 kr63OAV8yXbduLsIOjvR3KrbSvMq+9vasEX4zUGlzA+VR1l+MqtwJI13EPyqhubJ
 kQQi6PUe/353ECdzwk5IAqMbGP7qhkOuv89+Ff9mPTlKAwDPwdZjbnsz5I5peskB
 zVBO366Z0w3o2ENv9i4UEa+TtaCMIbLWdXJxPufwoqPHX3v7nUM=
 =tTzL
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-fix-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic regression fix from Arnd Bergmann:
 "Just one partial revert for a commit from the merge window that caused
  annoying behavior when building old kernels on arm64 hosts"

* tag 'asm-generic-fix-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: partially revert "Unify uapi bitsperlong.h for arm64, riscv and loongarch"
2023-08-18 18:13:36 +02:00
Dave Marchevsky
63ae8eb2c5 selftests/bpf: Add CO-RE relocs kfunc flavors tests
This patch adds selftests that exercise kfunc flavor relocation
functionality added in the previous patch. The actual kfunc defined
in kernel/bpf/helpers.c is:

  struct task_struct *bpf_task_acquire(struct task_struct *p)

The following relocation behaviors are checked:

  struct task_struct *bpf_task_acquire___one(struct task_struct *name)
    * Should succeed despite differing param name

  struct task_struct *bpf_task_acquire___two(struct task_struct *p, void *ctx)
    * Should fail because there is no two-param bpf_task_acquire

  struct task_struct *bpf_task_acquire___three(void *ctx)
    * Should fail because, despite vmlinux's bpf_task_acquire having one param,
      the types don't match

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230817225353.2570845-2-davemarchevsky@fb.com
2023-08-18 18:12:59 +02:00
Dave Marchevsky
5964a223f5 libbpf: Support triple-underscore flavors for kfunc relocation
The function signature of kfuncs can change at any time due to their
intentional lack of stability guarantees. As kfuncs become more widely
used, BPF program writers will need facilities to support calling
different versions of a kfunc from a single BPF object. Consider this
simplified example based on a real scenario we ran into at Meta:

  /* initial kfunc signature */
  int some_kfunc(void *ptr)

  /* Oops, we need to add some flag to modify behavior. No problem,
    change the kfunc. flags = 0 retains original behavior */
  int some_kfunc(void *ptr, long flags)

If the initial version of the kfunc is deployed on some portion of the
fleet and the new version on the rest, a fleetwide service that uses
some_kfunc will currently need to load different BPF programs depending
on which some_kfunc is available.

Luckily CO-RE provides a facility to solve a very similar problem,
struct definition changes, by allowing program writers to declare
my_struct___old and my_struct___new, with ___suffix being considered a
'flavor' of the non-suffixed name and being ignored by
bpf_core_type_exists and similar calls.

This patch extends the 'flavor' facility to the kfunc extern
relocation process. BPF program writers can now declare

  extern int some_kfunc___old(void *ptr)
  extern int some_kfunc___new(void *ptr, int flags)

then test which version of the kfunc exists with bpf_ksym_exists.
Relocation and verifier's dead code elimination will work in concert as
expected, allowing this pattern:

  if (bpf_ksym_exists(some_kfunc___old))
    some_kfunc___old(ptr);
  else
    some_kfunc___new(ptr, 0);

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230817225353.2570845-1-davemarchevsky@fb.com
2023-08-18 18:12:30 +02:00
Mark Brown
94f23ac36f kselftest/arm64: Fix hwcaps selftest build
The hwcaps selftest currently relies on the assembler being able to
assemble the crc32w instruction but this is not in the base v8.0 so is not
accepted by the standard GCC configurations used by many distributions.
Switch to manually encoding to fix the build.

Fixes: 09d2e95a04 ("kselftest/arm64: add crc32 feature to hwcap test")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230816-arm64-fix-crc32-build-v1-1-40165c1290f2@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-08-18 17:07:44 +01:00