Commit graph

25615 commits

Author SHA1 Message Date
Andrii Nakryiko
14f1aae17e selftests/bpf: Add global variables linking selftest
Add selftest validating various aspects of statically linking global
variables:
  - correct resolution of extern variables across .bss, .data, and .rodata
    sections;
  - correct handling of weak definitions;
  - correct de-duplication of repeating special externs (.kconfig, .ksyms).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-17-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko
f2644fb44d selftests/bpf: Add function linking selftest
Add selftest validating various aspects of statically linking functions:
  - no conflicts and correct resolution for name-conflicting static funcs;
  - correct resolution of extern functions;
  - correct handling of weak functions, both resolution itself and libbpf's
    handling of unused weak function that "lost" (it leaves gaps in code with
    no ELF symbols);
  - correct handling of hidden visibility to turn global function into
    "static" for the purpose of BPF verification.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-16-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko
b131aed910 selftests/bpf: Omit skeleton generation for multi-linked BPF object files
Skip generating individual BPF skeletons for files that are supposed to be
linked together to form the final BPF object file. Very often such files are
"incomplete" BPF object files, which will fail libbpf bpf_object__open() step,
if used individually, thus failing BPF skeleton generation. This is by design,
so skip individual BPF skeletons and only validate them as part of their
linked final BPF object file and skeleton.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-15-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko
41c472e85b selftests/bpf: Use -O0 instead of -Og in selftests builds
While -Og is designed to work well with debugger, it's still inferior to -O0
in terms of debuggability experience. It will cause some variables to still be
inlined, it will also prevent single-stepping some statements and otherwise
interfere with debugging experience. So switch to -O0 which turns off any
optimization and provides the best debugging experience.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-14-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko
0a342457b3 libbpf: Support extern resolution for BTF-defined maps in .maps section
Add extra logic to handle map externs (only BTF-defined maps are supported for
linking). Re-use the map parsing logic used during bpf_object__open(). Map
externs are currently restricted to always match complete map definition. So
all the specified attributes will be compared (down to pining, map_flags,
numa_node, etc). In the future this restriction might be relaxed with no
backwards compatibility issues. If any attribute is mismatched between extern
and actual map definition, linker will report an error, pointing out which one
mismatches.

The original intent was to allow for extern to specify attributes that matters
(to user) to enforce. E.g., if you specify just key information and omit
value, then any value fits. Similarly, it should have been possible to enforce
map_flags, pinning, and any other possible map attribute. Unfortunately, that
means that multiple externs can be only partially overlapping with each other,
which means linker would need to combine their type definitions to end up with
the most restrictive and fullest map definition. This requires an extra amount
of BTF manipulation which at this time was deemed unnecessary and would
require further extending generic BTF writer APIs. So that is left for future
follow ups, if there will be demand for that. But the idea seems intresting
and useful, so I want to document it here.

Weak definitions are also supported, but are pretty strict as well, just
like externs: all weak map definitions have to match exactly. In the follow up
patches this most probably will be relaxed, with __weak map definitions being
able to differ between each other (with non-weak definition always winning, of
course).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-13-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko
a46349227c libbpf: Add linker extern resolution support for functions and global variables
Add BPF static linker logic to resolve extern variables and functions across
multiple linked together BPF object files.

For that, linker maintains a separate list of struct glob_sym structures,
which keeps track of few pieces of metadata (is it extern or resolved global,
is it a weak symbol, which ELF section it belongs to, etc) and ties together
BTF type info and ELF symbol information and keeps them in sync.

With adding support for extern variables/funcs, it's now possible for some
sections to contain both extern and non-extern definitions. This means that
some sections may start out as ephemeral (if only externs are present and thus
there is not corresponding ELF section), but will be "upgraded" to actual ELF
section as symbols are resolved or new non-extern definitions are appended.

Additional care is taken to not duplicate extern entries in sections like
.kconfig and .ksyms.

Given libbpf requires BTF type to always be present for .kconfig/.ksym
externs, linker extends this requirement to all the externs, even those that
are supposed to be resolved during static linking and which won't be visible
to libbpf. With BTF information always present, static linker will check not
just ELF symbol matches, but entire BTF type signature match as well. That
logic is stricter that BPF CO-RE checks. It probably should be re-used by
.ksym resolution logic in libbpf as well, but that's left for follow up
patches.

To make it unnecessary to rewrite ELF symbols and minimize BTF type
rewriting/removal, ELF symbols that correspond to externs initially will be
updated in place once they are resolved. Similarly for BTF type info, VAR/FUNC
and var_secinfo's (sec_vars in struct bpf_linker) are staying stable, but
types they point to might get replaced when extern is resolved. This might
leave some left-over types (even though we try to minimize this for common
cases of having extern funcs with not argument names vs concrete function with
names properly specified). That can be addresses later with a generic BTF
garbage collection. That's left for a follow up as well.

Given BTF type appending phase is separate from ELF symbol
appending/resolution, special struct glob_sym->underlying_btf_id variable is
used to communicate resolution and rewrite decisions. 0 means
underlying_btf_id needs to be appended (it's not yet in final linker->btf), <0
values are used for temporary storage of source BTF type ID (not yet
rewritten), so -glob_sym->underlying_btf_id is BTF type id in obj-btf. But by
the end of linker_append_btf() phase, that underlying_btf_id will be remapped
and will always be > 0. This is the uglies part of the whole process, but
keeps the other parts much simpler due to stability of sec_var and VAR/FUNC
types, as well as ELF symbol, so please keep that in mind while reviewing.

BTF-defined maps require some extra custom logic and is addressed separate in
the next patch, so that to keep this one smaller and easier to review.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-12-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko
83a157279f libbpf: Tighten BTF type ID rewriting with error checking
It should never fail, but if it does, it's better to know about this rather
than end up with nonsensical type IDs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-11-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko
386b1d241e libbpf: Extend sanity checking ELF symbols with externs validation
Add logic to validate extern symbols, plus some other minor extra checks, like
ELF symbol #0 validation, general symbol visibility and binding validations.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-10-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko
42869d2852 libbpf: Make few internal helpers available outside of libbpf.c
Make skip_mods_and_typedefs(), btf_kind_str(), and btf_func_linkage() helpers
available outside of libbpf.c, to be used by static linker code.

Also do few cleanups (error code fixes, comment clean up, etc) that don't
deserve their own commit.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-9-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko
beaa3711ad libbpf: Factor out symtab and relos sanity checks
Factor out logic for sanity checking SHT_SYMTAB and SHT_REL sections into
separate sections. They are already quite extensive and are suffering from too
deep indentation. Subsequent changes will extend SYMTAB sanity checking
further, so it's better to factor each into a separate function.

No functional changes are intended.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-8-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko
c7ef5ec957 libbpf: Refactor BTF map definition parsing
Refactor BTF-defined maps parsing logic to allow it to be nicely reused by BPF
static linker. Further, at least for BPF static linker, it's important to know
which attributes of a BPF map were defined explicitly, so provide a bit set
for each known portion of BTF map definition. This allows BPF static linker to
do a simple check when dealing with extern map declarations.

The same capabilities allow to distinguish attributes explicitly set to zero
(e.g., __uint(max_entries, 0)) vs the case of not specifying it at all (no
max_entries attribute at all). Libbpf is currently not utilizing that, but it
could be useful for backwards compatibility reasons later.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-7-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko
6245947c1b libbpf: Allow gaps in BPF program sections to support overriden weak functions
Currently libbpf is very strict about parsing BPF program instruction
sections. No gaps are allowed between sequential BPF programs within a given
ELF section. Libbpf enforced that by keeping track of the next section offset
that should start a new BPF (sub)program and cross-checks that by searching
for a corresponding STT_FUNC ELF symbol.

But this is too restrictive once we allow to have weak BPF programs and link
together two or more BPF object files. In such case, some weak BPF programs
might be "overridden" by either non-weak BPF program with the same name and
signature, or even by another weak BPF program that just happened to be linked
first. That, in turn, leaves BPF instructions of the "lost" BPF (sub)program
intact, but there is no corresponding ELF symbol, because no one is going to
be referencing it.

Libbpf already correctly handles such cases in the sense that it won't append
such dead code to actual BPF programs loaded into kernel. So the only change
that needs to be done is to relax the logic of parsing BPF instruction
sections. Instead of assuming next BPF (sub)program section offset, iterate
available STT_FUNC ELF symbols to discover all available BPF subprograms and
programs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-6-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko
aea28a602f libbpf: Mark BPF subprogs with hidden visibility as static for BPF verifier
Define __hidden helper macro in bpf_helpers.h, which is a short-hand for
__attribute__((visibility("hidden"))). Add libbpf support to mark BPF
subprograms marked with __hidden as static in BTF information to enforce BPF
verifier's static function validation algorithm, which takes more information
(caller's context) into account during a subprogram validation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-5-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko
0fec7a3cee libbpf: Suppress compiler warning when using SEC() macro with externs
When used on externs SEC() macro will trigger compilation warning about
inapplicable `__attribute__((used))`. That's expected for extern declarations,
so suppress it with the corresponding _Pragma.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-4-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko
5b438f01d7 bpftool: Dump more info about DATASEC members
Dump succinct information for each member of DATASEC: its kinds and name. This
is extremely helpful to see at a quick glance what is inside each DATASEC of
a given BTF. Without this, one has to jump around BTF data to just find out
the name of a VAR or FUNC. DATASEC's var_secinfo member is special in that
regard because it doesn't itself contain the name of the member, delegating
that to the referenced VAR and FUNC kinds. Other kinds, like
STRUCT/UNION/FUNC/ENUM, encode member names directly and thus are clearly
identifiable in BTF dump.

The new output looks like this:

[35] DATASEC '.bss' size=0 vlen=6
        type_id=8 offset=0 size=4 (VAR 'input_bss1')
        type_id=13 offset=0 size=4 (VAR 'input_bss_weak')
        type_id=16 offset=0 size=4 (VAR 'output_bss1')
        type_id=17 offset=0 size=4 (VAR 'output_data1')
        type_id=18 offset=0 size=4 (VAR 'output_rodata1')
        type_id=20 offset=0 size=8 (VAR 'output_sink1')
[36] DATASEC '.data' size=0 vlen=2
        type_id=9 offset=0 size=4 (VAR 'input_data1')
        type_id=14 offset=0 size=4 (VAR 'input_data_weak')
[37] DATASEC '.kconfig' size=0 vlen=2
        type_id=25 offset=0 size=4 (VAR 'LINUX_KERNEL_VERSION')
        type_id=28 offset=0 size=1 (VAR 'CONFIG_BPF_SYSCALL')
[38] DATASEC '.ksyms' size=0 vlen=1
        type_id=30 offset=0 size=1 (VAR 'bpf_link_fops')
[39] DATASEC '.rodata' size=0 vlen=2
        type_id=12 offset=0 size=4 (VAR 'input_rodata1')
        type_id=15 offset=0 size=4 (VAR 'input_rodata_weak')
[40] DATASEC 'license' size=0 vlen=1
        type_id=24 offset=0 size=4 (VAR 'LICENSE')

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-3-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko
0dd7e456bb bpftool: Support dumping BTF VAR's "extern" linkage
Add dumping of "extern" linkage for BTF VAR kind. Also shorten
"global-allocated" to "global" to be in line with FUNC's "global".

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-2-andrii@kernel.org
2021-04-23 14:05:25 -07:00
Petr Machata
1233898ab7 selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale test
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying
chip supports, and tests that they all work. In order to determine that it
issues a stream of ICMP packets and checks if they are mirrored as
expected.

However, the mausezahn invocation missed the -6 flag to identify the use of
IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to
ICMP6. It also didn't pass an explicit source IP address, which apparently
worked at some point in the past, but does not anymore.

To fix these issues, extend the function mirror_test() in mirror_lib by
detecting the IPv6 protocol addresses, and using a different ICMP scheme.
Fix __mirror_gre_test() in the selftest itself to pass a source IP address.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23 14:01:28 -07:00
Petr Machata
dda7f4fa55 selftests: mlxsw: Increase the tolerance of backlog buildup
The intention behind this test is to make sure that qdisc limit is
correctly projected to the HW. However, first, due to rounding in the
qdisc, and then in the driver, the number cannot actually be accurate. And
second, the approach to testing this is to oversubscribe the port with
traffic generated on the same switch. The actual backlog size therefore
fluctuates.

In practice, this test proved to be noisier than the rest, and spuriously
fails every now and then. Increase the tolerance to 10 % to avoid these
issues.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23 14:01:28 -07:00
Danielle Ratson
059b18e21c selftests: mlxsw: Return correct error code in resource scale tests
Currently, the resource scale test checks a few cases, when the error code
resets between the cases. So for example, if one case fails and the
consecutive case passes, the error code eventually will fit the last test
and will be 0.

Save a new return code that will hold the 'or' return codes of all the
cases, so the final return code will consider all the cases.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23 14:01:28 -07:00
Danielle Ratson
1f1c92139e selftests: mlxsw: Remove a redundant if statement in tc_flower_scale test
Currently, the error return code of the failure condition is lost after
using an if statement, so the test doesn't fail when it should.

Remove the if statement that separates the condition and the error code
check, so the test won't always pass.

Fixes: abfce9e062 ("selftests: mlxsw: Reduce running time using offload indication")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23 14:01:28 -07:00
Danielle Ratson
b6fc2f2121 selftests: mlxsw: Remove a redundant if statement in port_scale test
Currently, the error return code of the failure condition is lost after
using an if statement, so the test doesn't fail when it should.

Remove the if statement that separates the condition and the error code
check, so the test won't always pass.

Fixes: 5154b1b826 ("selftests: mlxsw: Add a scale test for physical ports")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23 14:01:28 -07:00
Petr Machata
c8d0260cdd selftests: net: mirror_gre_vlan_bridge_1q: Make an FDB entry static
The FDB roaming test installs a destination MAC address on the wrong
interface of an FDB database and tests whether the mirroring fails, because
packets are sent to the wrong port. The test by mistake installs the FDB
entry as local. This worked previously, because drivers were notified of
local FDB entries in the same way as of static entries. However that has
been fixed in the commit 6ab4c3117a ("net: bridge: don't notify switchdev
for local FDB addresses"), and local entries are not notified anymore. As a
result, the HW is not reconfigured for the FDB roam, and mirroring keeps
working, failing the test.

To fix the issue, mark the FDB entry as static.

Fixes: 9c7c8a8244 ("selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23 14:01:28 -07:00
Zhen Lei
c6f8714125 perf map: Fix error return code in maps__clone()
Although 'err' has been initialized to -ENOMEM, but it will be reassigned
by the "err = unwind__prepare_access(...)" statement in the for loop. So
that, the value of 'err' is unknown when map__clone() failed.

Fixes: 6c50258443 ("perf unwind: Call unwind__prepare_access for forked thread")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: zhen lei <thunder.leizhen@huawei.com>
Link: http://lore.kernel.org/lkml/20210415092744.3793-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-23 16:03:09 -03:00
Thomas Richter
671b60cb6a perf ftrace: Fix access to pid in array when setting a pid filter
Command 'perf ftrace -v -- ls' fails in s390 (at least 5.12.0rc6).

The root cause is a missing pointer dereference which causes an
array element address to be used as PID.

Fix this by extracting the PID.

Output before:
  # ./perf ftrace -v -- ls
  function_graph tracer is used
  write '-263732416' to tracing/set_ftrace_pid failed: Invalid argument
  failed to set ftrace pid
  #

Output after:
   ./perf ftrace -v -- ls
   function_graph tracer is used
   # tracer: function_graph
   #
   # CPU  DURATION                  FUNCTION CALLS
   # |     |   |                     |   |   |   |
   4)               |  rcu_read_lock_sched_held() {
   4)   0.552 us    |    rcu_lockdep_current_cpu_online();
   4)   6.124 us    |  }

Reported-by: Alexander Schmidt <alexschm@de.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20210421120400.2126433-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-23 15:58:10 -03:00
Leo Yan
b14585d9f1 perf auxtrace: Fix potential NULL pointer dereference
In the function auxtrace_parse_snapshot_options(), the callback pointer
"itr->parse_snapshot_options" can be NULL if it has not been set during
the AUX record initialization.  This can cause tool crashing if the
callback pointer "itr->parse_snapshot_options" is dereferenced without
performing NULL check.

Add a NULL check for the pointer "itr->parse_snapshot_options" before
invoke the callback.

Fixes: d20031bb63 ("perf tools: Add AUX area tracing Snapshot Mode")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: http://lore.kernel.org/lkml/20210420151554.2031768-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-23 15:34:32 -03:00
Paolo Bonzini
c4f71901d5 KVM/arm64 updates for Linux 5.13
New features:
 
 - Stage-2 isolation for the host kernel when running in protected mode
 - Guest SVE support when running in nVHE mode
 - Force W^X hypervisor mappings in nVHE mode
 - ITS save/restore for guests using direct injection with GICv4.1
 - nVHE panics now produce readable backtraces
 - Guest support for PTP using the ptp_kvm driver
 - Performance improvements in the S2 fault handler
 - Alexandru is now a reviewer (not really a new feature...)
 
 Fixes:
 - Proper emulation of the GICR_TYPER register
 - Handle the complete set of relocation in the nVHE EL2 object
 - Get rid of the oprofile dependency in the PMU code (and of the
   oprofile body parts at the same time)
 - Debug and SPE fixes
 - Fix vcpu reset
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCCpuAPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD2G8QALWQYeBggKnNmAJfuihzZ2WariBmgcENs2R2
 qNZ/Py6dIF+b69P68nmgrEV1x2Kp35cPJbBwXnnrS4FCB5tk0b8YMaj00QbiRIYV
 UXbPxQTmYO1KbevpoEcw8NmR4bZJ/hRYPuzcQG7CCMKIZw0zj2cMcBofzQpTOAp/
 CgItdcv7at3iwamQatfU9vUmC0nDdnjdIwSxTAJOYMVV1ENwtnYSNgZVo4XLTg7n
 xR/5Qx27PKBJw7GyTRAIIxKAzNXG2tDL+GVIHe4AnRp3z3La8sr6PJf7nz9MCmco
 ISgeY7EGQINzmm4LahpnV+2xwwxOWo8QotxRFGNuRTOBazfARyAbp97yJ6eXJUpa
 j0qlg3xK9neyIIn9BQKkKx4sY9V45yqkuVDsK6odmqPq3EE01IMTRh1N/XQi+sTF
 iGrlM3ZW4AjlT5zgtT9US/FRXeDKoYuqVCObJeXZdm3sJSwEqTAs0JScnc0YTsh7
 m30CODnomfR2y5X6GoaubbQ0wcZ2I20K1qtIm+2F6yzD5P1/3Yi8HbXMxsSWyYWZ
 1ldoSa+ZUQlzV9Ot0S3iJ4PkphLKmmO96VlxE2+B5gQG50PZkLzsr8bVyYOuJC8p
 T83xT9xd07cy+FcGgF9veZL99Y6BLHMa6ZwFUolYNbzJxqrmqyR1aiJMEBIcX+aP
 ACeKW1w5
 =fpey
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.13

New features:

- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)

Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
  oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
2021-04-23 07:41:17 -04:00
Marco Elver
3ddb3fd8cd signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures
The alignment of a structure is that of its largest member. On
architectures like 32-bit Arm (but not e.g. 32-bit x86) 64-bit integers
will require 64-bit alignment and not its natural word size.

This means that there is no portable way to add 64-bit integers to
siginfo_t on 32-bit architectures without breaking the ABI, because
siginfo_t does not yet (and therefore likely never will) contain 64-bit
fields on 32-bit architectures. Adding a 64-bit integer could change the
alignment of the union after the 3 initial int si_signo, si_errno,
si_code, thus introducing 4 bytes of padding shifting the entire union,
which would break the ABI.

One alternative would be to use the __packed attribute, however, it is
non-standard C. Given siginfo_t has definitions outside the Linux kernel
in various standard libraries that can be compiled with any number of
different compilers (not just those we rely on), using non-standard
attributes on siginfo_t should be avoided to ensure portability.

In the case of the si_perf field, word size is sufficient since there is
no exact requirement on size, given the data it contains is user-defined
via perf_event_attr::sig_data. On 32-bit architectures, any excess bits
of perf_event_attr::sig_data will therefore be truncated when copying
into si_perf.

Since si_perf is intended to disambiguate events (e.g. encoding relevant
information if there are more events of the same type), 32 bits should
provide enough entropy to do so on 32-bit architectures.

For 64-bit architectures, no change is intended.

Fixes: fb6cc127e0 ("signal: Introduce TRAP_PERF si_code and si_perf to siginfo")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lkml.kernel.org/r/20210422191823.79012-1-elver@google.com
2021-04-23 09:03:16 +02:00
Mickaël Salaün
3532b0b435 landlock: Enable user space to infer supported features
Add a new flag LANDLOCK_CREATE_RULESET_VERSION to
landlock_create_ruleset(2).  This enables to retreive a Landlock ABI
version that is useful to efficiently follow a best-effort security
approach.  Indeed, it would be a missed opportunity to abort the whole
sandbox building, because some features are unavailable, instead of
protecting users as much as possible with the subset of features
provided by the running kernel.

This new flag enables user space to identify the minimum set of Landlock
features supported by the running kernel without relying on a filesystem
interface (e.g. /proc/version, which might be inaccessible) nor testing
multiple syscall argument combinations (i.e. syscall bisection).  New
Landlock features will be documented and tied to a minimum version
number (greater than 1).  The current version will be incremented for
each new kernel release supporting new Landlock features.  User space
libraries can leverage this information to seamlessly restrict processes
as much as possible while being compatible with newer APIs.

This is a much more lighter approach than the previous
landlock_get_features(2): the complexity is pushed to user space
libraries.  This flag meets similar needs as securityfs versions:
selinux/policyvers, apparmor/features/*/version* and tomoyo/version.

Supporting this flag now will be convenient for backward compatibility.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-14-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
2021-04-22 12:22:11 -07:00
Mickaël Salaün
e1199815b4 selftests/landlock: Add user space tests
Test all Landlock system calls, ptrace hooks semantic and filesystem
access-control with multiple layouts.

Test coverage for security/landlock/ is 93.6% of lines.  The code not
covered only deals with internal kernel errors (e.g. memory allocation)
and race conditions.

Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-11-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
2021-04-22 12:22:11 -07:00
Ray Kinsella
a4b0fccfbd perf tools: Update topdown documentation to permit rdpmc calls
Update Topdown documentation to permit calls to rdpmc, and describe
interaction with system calls.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: http://lore.kernel.org/lkml/20210421091009.1711565-1-mdr@ashroe.eu
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-22 16:09:39 -03:00
Paolo Bonzini
fd49e8ee70 Merge branch 'kvm-sev-cgroup' into HEAD 2021-04-22 13:19:01 -04:00
Yang Li
0db1146167 selftests/powerpc: remove unneeded semicolon
Eliminate the following coccicheck warning:
./tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c:327:4-5: Unneeded
semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1612780870-95890-1-git-send-email-yang.lee@linux.alibaba.com
2021-04-23 01:38:04 +10:00
Ravi Bangoria
290f7d8ce2 powerpc/selftests: Add selftest to test concurrent perf/ptrace events
ptrace and perf watchpoints can't co-exists if their address range
overlaps. See commit 29da4f91c0 ("powerpc/watchpoint: Don't allow
concurrent perf and ptrace events") for more detail. Add selftest
for the same.

Sample o/p:
  # ./ptrace-perf-hwbreak
  test: ptrace-perf-hwbreak
  tags: git_version:powerpc-5.8-7-118-g937fa174a15d-dirty
  perf cpu event -> ptrace thread event (Overlapping): Ok
  perf cpu event -> ptrace thread event (Non-overlapping): Ok
  perf thread event -> ptrace same thread event (Overlapping): Ok
  perf thread event -> ptrace same thread event (Non-overlapping): Ok
  perf thread event -> ptrace other thread event: Ok
  ptrace thread event -> perf kernel event: Ok
  ptrace thread event -> perf same thread event (Overlapping): Ok
  ptrace thread event -> perf same thread event (Non-overlapping): Ok
  ptrace thread event -> perf other thread event: Ok
  ptrace thread event -> perf cpu event (Overlapping): Ok
  ptrace thread event -> perf cpu event (Non-overlapping): Ok
  ptrace thread event -> perf same thread & cpu event (Overlapping): Ok
  ptrace thread event -> perf same thread & cpu event (Non-overlapping): Ok
  ptrace thread event -> perf other thread & cpu event: Ok
  success: ptrace-perf-hwbreak

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412112218.128183-5-ravi.bangoria@linux.ibm.com
2021-04-23 01:38:03 +10:00
Ravi Bangoria
c65c64cc7b powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR
Extend perf-hwbreak.c selftest to test multiple DAWRs. Also add
testcase for testing 512 byte boundary removal.

Sample o/p:
  # ./perf-hwbreak
  ...
  TESTED: Process specific, Two events, diff addr
  TESTED: Process specific, Two events, same addr
  TESTED: Process specific, Two events, diff addr, one is RO, other is WO
  TESTED: Process specific, Two events, same addr, one is RO, other is WO
  TESTED: Systemwide, Two events, diff addr
  TESTED: Systemwide, Two events, same addr
  TESTED: Systemwide, Two events, diff addr, one is RO, other is WO
  TESTED: Systemwide, Two events, same addr, one is RO, other is WO
  TESTED: Process specific, 512 bytes, unaligned
  success: perf_hwbreak

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412112218.128183-4-ravi.bangoria@linux.ibm.com
2021-04-23 01:38:03 +10:00
Ravi Bangoria
c9cb0afb4e powerpc/selftests/perf-hwbreak: Coalesce event creation code
perf-hwbreak selftest opens hw-breakpoint event at multiple places for
which it has same code repeated. Coalesce that code into a function.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412112218.128183-3-ravi.bangoria@linux.ibm.com
2021-04-23 01:38:03 +10:00
Ravi Bangoria
dae4ff8031 powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
Message-ID: <20210412112218.128183-2-ravi.bangoria@linux.ibm.com> (raw)

Add selftests to test multiple active DAWRs with ptrace interface.

Sample o/p:
  $ ./ptrace-hwbreak
  ...
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO, len: 6: Ok

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
[mpe: Fix build on older distros]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-04-23 01:38:03 +10:00
Thadeu Lima de Souza Cascardo
da650ada10 selftests/powerpc: Add uaccess flush test
Also based on the RFI and entry flush tests, it counts the L1D misses
by doing a syscall that does user access: uname, in this case.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
[dja: forward port, rename function]
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210225061949.1213404-1-dja@axtens.net
2021-04-23 01:38:03 +10:00
Quanyang Wang
bc2e9578ba
spi: tools: make a symbolic link to the header file spi.h
The header file spi.h in include/uapi/linux/spi is needed for spidev.h,
so we also need make a symbolic link to it to eliminate the error message
as below:

In file included from spidev_test.c:24:
include/linux/spi/spidev.h:28:10: fatal error: linux/spi/spi.h: No such file or directory
   28 | #include <linux/spi/spi.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.

Fixes: f7005142da ("spi: uapi: unify SPI modes into a single spi.h")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Link: https://lore.kernel.org/r/20210422102604.3034217-1-quanyang.wang@windriver.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-04-22 16:30:39 +01:00
Paolo Bonzini
bf1e15a82e KVM: selftests: Always run vCPU thread with blocked SIG_IPI
The main thread could start to send SIG_IPI at any time, even before signal
blocked on vcpu thread.  Therefore, start the vcpu thread with the signal
blocked.

Without this patch, on very busy cores the dirty_log_test could fail directly
on receiving a SIGUSR1 without a handler (when vcpu runs far slower than main).

Reported-by: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-21 12:20:02 -04:00
Peter Xu
016ff1a442 KVM: selftests: Sync data verify of dirty logging with guest sync
This fixes a bug that can trigger with e.g. "taskset -c 0 ./dirty_log_test" or
when the testing host is very busy.

A similar previous attempt is done [1] but that is not enough, the reason is
stated in the reply [2].

As a summary (partly quotting from [2]):

The problem is I think one guest memory write operation (of this specific test)
contains a few micro-steps when page is during kvm dirty tracking (here I'm
only considering write-protect rather than pml but pml should be similar at
least when the log buffer is full):

  (1) Guest read 'iteration' number into register, prepare to write, page fault
  (2) Set dirty bit in either dirty bitmap or dirty ring
  (3) Return to guest, data written

When we verify the data, we assumed that all these steps are "atomic", say,
when (1) happened for this page, we assume (2) & (3) must have happened.  We
had some trick to workaround "un-atomicity" of above three steps, as previous
version of this patch wanted to fix atomicity of step (2)+(3) by explicitly
letting the main thread wait for at least one vmenter of vcpu thread, which
should work.  However what I overlooked is probably that we still have race
when (1) and (2) can be interrupted.

One example calltrace when it could happen that we read an old interation, got
interrupted before even setting the dirty bit and flushing data:

    __schedule+1742
    __cond_resched+52
    __get_user_pages+530
    get_user_pages_unlocked+197
    hva_to_pfn+206
    try_async_pf+132
    direct_page_fault+320
    kvm_mmu_page_fault+103
    vmx_handle_exit+288
    vcpu_enter_guest+2460
    kvm_arch_vcpu_ioctl_run+325
    kvm_vcpu_ioctl+526
    __x64_sys_ioctl+131
    do_syscall_64+51
    entry_SYSCALL_64_after_hwframe+68

It means iteration number cached in vcpu register can be very old when dirty
bit set and data flushed.

So far I don't see an easy way to guarantee all steps 1-3 atomicity but to sync
at the GUEST_SYNC() point of guest code when we do verification of the dirty
bits as what this patch does.

[1] https://lore.kernel.org/lkml/20210413213641.23742-1-peterx@redhat.com/
[2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210417143602.215059-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-21 12:20:02 -04:00
Christophe Leroy
f56607e85e selftests/timens: Fix gettime_perf to work on powerpc
On powerpc:
- VDSO library is named linux-vdso32.so.1 or linux-vdso64.so.1
- clock_gettime is named __kernel_clock_gettime()

Ensure gettime_perf tries these names before giving up.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/469f37ab91984309eb68c0fb47e8438cdf5b6463.1617198956.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:32 +10:00
Petr Machata
0a4d0cb1a3 selftests: mlxsw: sch_red_ets: Test proper counter cleaning in ETS
There was a bug introduced during the rework which cause non-zero backlog
being stuck at ETS. Introduce a selftest that would have caught the issue
earlier.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
Jiri Olsa
d044d9fc13 selftests/bpf: Add docs target as all dependency
Currently docs target is make dependency for TEST_GEN_FILES,
which makes tests to be rebuilt every time you run make.

Adding docs as all target dependency, so when running make
on top of built selftests it will show just:

  $ make
  make[1]: Nothing to be done for 'docs'.

After cleaning docs, only docs is rebuilt:

  $ make docs-clean
  CLEAN    eBPF_helpers-manpage
  CLEAN    eBPF_syscall-manpage
  $ make
  GEN      ...selftests/bpf/bpf-helpers.rst
  GEN      ...selftests/bpf/bpf-helpers.7
  GEN      ...selftests/bpf/bpf-syscall.rst
  GEN      ...selftests/bpf/bpf-syscall.2
  $ make
  make[1]: Nothing to be done for 'docs'.

Fixes: a01d935b2e ("tools/bpf: Remove bpf-helpers from bpftool docs")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210420132428.15710-1-jolsa@kernel.org
2021-04-20 15:01:59 -07:00
Zhen Lei
f2211881e7 perf data: Fix error return code in perf_data__create_dir()
Although 'ret' has been initialized to -1, but it will be reassigned by
the "ret = open(...)" statement in the for loop. So that, the value of
'ret' is unknown when asprintf() failed.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210415083417.3740-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 14:46:48 -03:00
Arnaldo Carvalho de Melo
bb7db8699b perf tools: Add a build-test variant to use in builds from a tarball
To use in automated tests inside containers from a tarball generated
by 'make perf-tar-src-pkg*', where testing building from a tarball
is obviously not needed, so add a 'build-test-tarball' for that case.

And don't build with gtk2 as this complicates things for cross builds
where we don't always have all the libraries a full perf build requires
available for the target arch, ditto for static builds.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:43:58 -03:00
Zhen Lei
59a1a843b0 perf data: Fix error return code in perf_data__create_dir()
Although 'ret' has been initialized to -1, but it will be reassigned by
the "ret = open(...)" statement in the for loop. So that, the value of
'ret' is unknown when asprintf() failed.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210415083417.3740-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Ian Rogers
b96da02bd6 perf arm64: Fix off-by-one directory paths.
Relative path include works in the regular build due to -I paths but may
break in other situations.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210416214113.552252-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Martin Liška
f89a82a82b perf annotate: Add line number like in TUI and source location at EOL
The patch changes the output format in 2 ways:
- line number is displayed for all source lines (matching TUI mode)
- source locations for the hottest lines are printed
   at the line end in order to preserve layout

Before:

     0.00 :   405ef1: inc    %r15
          :            tmpsd * (TD + tmpsd * TDD)));
     0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3        # 4318b0 <_IO_stdin_used+0x8b0>
          :            tmpsd * (TC +
  eff.c:1811    0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3        # 4318b8 <_IO_stdin_used+0x8b8>
          :            TA + tmpsd * (TB +
     0.35 :   405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3        # 4318c0 <_IO_stdin_used+0x8c0>
          :            dumbo =
  eff.c:1809    1.41 :   405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3        # 4318c8 <_IO_stdin_used+0x8c8>
          :            sumi -= sj * tmpsd * dij2i * dumbo;
  eff.c:1813    2.58 :   405f18: vmulsd %xmm3,%xmm0,%xmm0
     2.81 :   405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
     3.78 :   405f23: vmovsd %xmm0,0x30(%rsp)
          :            for (k = 0; k < lpears[i] + upears[i]; k++) {
  eff.c:1761    0.90 :   405f29: cmp    %r15d,%r12d

After:

     0.00 :   405ef1: inc    %r15
          : 1812   tmpsd * (TD + tmpsd * TDD)));
     0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3        # 4318b0 <_IO_stdin_used+0x8b0>
          : 1811   tmpsd * (TC +
     0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3        # 4318b8 <_IO_stdin_used+0x8b8> // eff.c:1811
          : 1810   TA + tmpsd * (TB +
     0.35 :   405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3        # 4318c0 <_IO_stdin_used+0x8c0>
          : 1809   dumbo =
     1.41 :   405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3        # 4318c8 <_IO_stdin_used+0x8c8> // eff.c:1809
          : 1813   sumi -= sj * tmpsd * dij2i * dumbo;
     2.58 :   405f18: vmulsd %xmm3,%xmm0,%xmm0 // eff.c:1813
     2.81 :   405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
     3.78 :   405f23: vmovsd %xmm0,0x30(%rsp)
          : 1761   for (k = 0; k < lpears[i] + upears[i]; k++) {

Where e.g. '// eff.c:1811' shares the same color as the percentantage
at the line beginning.

Signed-off-by: Martin Liška <mliska@suse.cz>
Link: http://lore.kernel.org/lkml/a0d53f31-f633-5013-c386-a4452391b081@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Alexander Antonov
537f1e38f3 perf: Update .gitignore file
After a "make -C tools/perf", git reports the following untracked file:
perf-iostat

Add this generated file to perf's .gitignore file.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210419094147.15909-5-alexander.antonov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Alexander Antonov
f9ed693e8b perf stat: Enable iostat mode for x86 platforms
This functionality is based on recently introduced sysfs attributes for
Intel® Xeon® Scalable processor family (code name Skylake-SP):

Commit bb42b3d397 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping")

Mode is intended to provide four I/O performance metrics in MB per each
PCIe root port:

 - Inbound Read: I/O devices below root port read from the host memory
 - Inbound Write: I/O devices below root port write to the host memory
 - Outbound Read: CPU reads from I/O devices below root port
 - Outbound Write: CPU writes to I/O devices below root port

Each metric requiries only one uncore event which increments at every 4B
transfer in corresponding direction. The formulas to compute metrics
are generic:
    #EventCount * 4B / (1024 * 1024)

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210419094147.15909-4-alexander.antonov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Alexander Antonov
19776d3ced perf stat: Helper functions for PCIe root ports list in iostat mode
Introduce helper functions to control PCIe root ports list.
These helpers will be used in the follow-up patch.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210419094147.15909-3-alexander.antonov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Alexander Antonov
f07952b179 perf stat: Basic support for iostat in perf
Add basic flow for a new iostat mode in perf. Mode is intended to
provide four I/O performance metrics per each PCIe root port: Inbound Read,
Inbound Write, Outbound Read, Outbound Write.

The actual code to compute the metrics and attribute it to
root port is in follow-on patches.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210419094147.15909-2-alexander.antonov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Kajol Jain
32daa5d789 perf vendor events: Initial JSON/events list for power10 platform
Patch adds initial JSON/events for POWER10.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Paul Clarke <pc@us.ibm.com>
Tested-by: Paul Clarke <pc@us.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20210419112001.71466-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Rob Herring
818869489b libperf xyarray: Add bounds checks to xyarray__entry()
xyarray__entry() is missing any bounds checking yet often the x and y
parameters come from external callers. Add bounds checks and an
unchecked __xyarray__entry().

Committer notes:

Make the 'x' and 'y' arguments to the new xyarray__entry() that does
bounds check to be of type 'size_t', so that we cover also the case
where 'x' and 'y' could be negative, which is needed anyway as having
them as 'int' breaks the build with:

  /home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h: In function ‘xyarray__entry’:
  /home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h:28:8: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
     28 |  if (x >= xy->max_x || y >= xy->max_y)
        |        ^~
  /home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h:28:26: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
     28 |  if (x >= xy->max_x || y >= xy->max_y)
        |                          ^~
  cc1: all warnings being treated as errors

Signed-off-by: Rob Herring <robh@kernel.org>
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210414195758.4078803-1-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:11:33 -03:00
Rob Herring
47d01e7b99 libperf: Add support for user space counter access
x86 and arm64 can both support direct access of event counters in
userspace. The access sequence is less than trivial and currently exists
in perf test code (tools/perf/arch/x86/tests/rdpmc.c) with copies in
projects such as PAPI and libpfm4.

In order to support userspace access, an event must be mmapped first
with perf_evsel__mmap(). Then subsequent calls to perf_evsel__read()
will use the fast path (assuming the arch supports it).

Committer notes:

Added a '__maybe_unused' attribute to the read_perf_counter() argument
to fix the build on arches other than x86_64 and arm.

Committer testing:

  Building and running the libperf tests in verbose mode (V=1) now shows
  those "loop = N, count = N" extra lines, testing user space counter
  access.

  # make V=1 -C tools/lib/perf tests
  make: Entering directory '/home/acme/git/perf/tools/lib/perf'
  make -f /home/acme/git/perf/tools/build/Makefile.build dir=. obj=libperf
  make -C /home/acme/git/perf/tools/lib/api/ O= libapi.a
  make -f /home/acme/git/perf/tools/build/Makefile.build dir=./fd obj=libapi
  make -f /home/acme/git/perf/tools/build/Makefile.build dir=./fs obj=libapi
  make -C tests
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o test-cpumap-a test-cpumap.c ../libperf.a /home/acme/git/perf/tools/lib/api/libapi.a
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o test-threadmap-a test-threadmap.c ../libperf.a /home/acme/git/perf/tools/lib/api/libapi.a
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o test-evlist-a test-evlist.c ../libperf.a /home/acme/git/perf/tools/lib/api/libapi.a
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o test-evsel-a test-evsel.c ../libperf.a /home/acme/git/perf/tools/lib/api/libapi.a
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -L.. -o test-cpumap-so test-cpumap.c /home/acme/git/perf/tools/lib/api/libapi.a -lperf
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -L.. -o test-threadmap-so test-threadmap.c /home/acme/git/perf/tools/lib/api/libapi.a -lperf
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -L.. -o test-evlist-so test-evlist.c /home/acme/git/perf/tools/lib/api/libapi.a -lperf
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -L.. -o test-evsel-so test-evsel.c /home/acme/git/perf/tools/lib/api/libapi.a -lperf
  make -C tests run
  running static:
  - running test-cpumap.c...OK
  - running test-threadmap.c...OK
  - running test-evlist.c...OK
  - running test-evsel.c...
  	loop = 65536, count = 333926
  	loop = 131072, count = 655781
  	loop = 262144, count = 1311141
  	loop = 524288, count = 2630126
  	loop = 1048576, count = 5256955
  	loop = 65536, count = 524594
  	loop = 131072, count = 1058916
  	loop = 262144, count = 2097458
  	loop = 524288, count = 4205429
  	loop = 1048576, count = 8406606
  OK
  running dynamic:
  - running test-cpumap.c...OK
  - running test-threadmap.c...OK
  - running test-evlist.c...OK
  - running test-evsel.c...
  	loop = 65536, count = 328102
  	loop = 131072, count = 655782
  	loop = 262144, count = 1317494
  	loop = 524288, count = 2627851
  	loop = 1048576, count = 5255187
  	loop = 65536, count = 524601
  	loop = 131072, count = 1048923
  	loop = 262144, count = 2107917
  	loop = 524288, count = 4194606
  	loop = 1048576, count = 8409322
  OK
  make: Leaving directory '/home/acme/git/perf/tools/lib/perf'
  #

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20210414155412.3697605-4-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:10:45 -03:00
Yanan Wang
b9c2bd50ec KVM: selftests: Add a test for kvm page table code
This test serves as a performance tester and a bug reproducer for
kvm page table code (GPA->HPA mappings), so it gives guidance for
people trying to make some improvement for kvm.

The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.

If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.

So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.

When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20210330080856.14940-11-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20 04:18:53 -04:00
Yanan Wang
a4b3c8b583 KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
With VM_MEM_SRC_ANONYMOUS_THP specified in vm_userspace_mem_region_add(),
we have to get the transparent hugepage size for HVA alignment. With the
new helpers, we can use get_backing_src_pagesz() to check whether THP is
configured and then get the exact configured hugepage size.

As different architectures may have different THP page sizes configured,
this can get the accurate THP page sizes on any platform.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20210330080856.14940-10-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20 04:18:53 -04:00
Yanan Wang
623653b7d4 KVM: selftests: List all hugetlb src types specified with page sizes
With VM_MEM_SRC_ANONYMOUS_HUGETLB, we currently can only use system
default hugetlb pages to back the testing guest memory. In order to
add flexibility, now list all the known hugetlb backing src types with
different page sizes, so that we can specify use of hugetlb pages of the
exact granularity that we want. And as all the known hugetlb page sizes
are listed, it's appropriate for all architectures.

Besides, the helper get_backing_src_pagesz() is added to get the
granularity of different backing src types(anonumous, thp, hugetlb).

Suggested-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20210330080856.14940-9-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20 04:18:53 -04:00
Yanan Wang
5579fa682a KVM: selftests: Add a helper to get system default hugetlb page size
If HUGETLB is configured in the host kernel, then we can know the system
default hugetlb page size through *cat /proc/meminfo*. Otherwise, we will
not see the information of hugetlb pages in file /proc/meminfo if it's not
configured. So add a helper to determine whether HUGETLB is configured and
then get the default page size by reading /proc/meminfo.

This helper can be useful when a program wants to use the default hugetlb
pages of the system and doesn't know the default page size.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20210330080856.14940-8-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20 04:18:52 -04:00
Yanan Wang
3b70c4d128 KVM: selftests: Add a helper to get system configured THP page size
If we want to have some tests about transparent hugepages, the system
configured THP hugepage size should better be known by the tests, which
can be used for kinds of alignment or guest memory accessing of vcpus...
So it makes sense to add a helper to get the transparent hugepage size.

With VM_MEM_SRC_ANONYMOUS_THP specified in vm_userspace_mem_region_add(),
we now stat /sys/kernel/mm/transparent_hugepage to check whether THP is
configured in the host kernel before madvise(). Based on this, we can also
read file /sys/kernel/mm/transparent_hugepage/hpage_pmd_size to get THP
hugepage size.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20210330080856.14940-7-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20 04:18:52 -04:00
Yanan Wang
6436430e29 KVM: selftests: Make a generic helper to get vm guest mode strings
For generality and conciseness, make an API which can be used in all
kvm libs and selftests to get vm guest mode strings. And the index i
is checked in the API in case of possiable faults.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20210330080856.14940-6-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20 04:18:52 -04:00
Yanan Wang
c412d6ac28 KVM: selftests: Print the errno besides error-string in TEST_ASSERT
Print the errno besides error-string in TEST_ASSERT in the format of
"errno=%d - %s" will explicitly indicate that the string is an error
information. Besides, the errno is easier to be used for debugging
than the error-string.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20210330080856.14940-5-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20 04:18:52 -04:00
Yanan Wang
fa76c775be tools/headers: sync headers of asm-generic/hugetlb_encode.h
This patch syncs contents of tools/include/asm-generic/hugetlb_encode.h
and include/uapi/asm-generic/hugetlb_encode.h. Arch powerpc supports 16KB
hugepages and ARM64 supports 32MB/512MB hugepages. The corresponding mmap
flags have already been added in include/uapi/asm-generic/hugetlb_encode.h,
but not tools/include/asm-generic/hugetlb_encode.h.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210330080856.14940-2-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20 04:18:51 -04:00
Dave Marchevsky
c77cec5c20 bpf/selftests: Add bpf_get_task_stack retval bounds test_prog
Add a libbpf test prog which feeds bpf_get_task_stack's return value
into seq_write after confirming it's positive. No attempt to bound the
value from above is made.

Load will fail if verifier does not refine retval range based on buf sz
input to bpf_get_task_stack.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210416204704.2816874-4-davemarchevsky@fb.com
2021-04-19 18:23:33 -07:00
Dave Marchevsky
bdc4e36945 bpf/selftests: Add bpf_get_task_stack retval bounds verifier test
Add a bpf_iter test which feeds bpf_get_task_stack's return value into
seq_write after confirming it's positive. No attempt to bound the value
from above is made.

Load will fail if verifier does not refine retval range based on
buf sz input to bpf_get_task_stack.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210416204704.2816874-3-davemarchevsky@fb.com
2021-04-19 18:23:33 -07:00
David S. Miller
6dd06ec7c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add vlan match and pop actions to the flowtable offload,
   patches from wenxu.

2) Reduce size of the netns_ct structure, which itself is
   embedded in struct net Make netns_ct a read-mostly structure.
   Patches from Florian Westphal.

3) Add FLOW_OFFLOAD_XMIT_UNSPEC to skip dst check from garbage
   collector path, as required by the tc CT action. From Roi Dayan.

4) VLAN offload fixes for nftables: Allow for matching on both s-vlan
   and c-vlan selectors. Fix match of VLAN id due to incorrect
   byteorder. Add a new routine to properly populate flow dissector
   ethertypes.

5) Missing keys in ip{6}_route_me_harder() results in incorrect
   routes. This includes an update for selftest infra. Patches
   from Ido Schimmel.

6) Add counter hardware offload support through FLOW_CLS_STATS.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:49:50 -07:00
Florent Revest
c2e39c6bdc selftests/bpf: Add a series of tests for bpf_snprintf
The "positive" part tests all format specifiers when things go well.

The "negative" part makes sure that incorrect format strings fail at
load time.

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-7-revest@chromium.org
2021-04-19 15:27:37 -07:00
Florent Revest
58c2b1f5e0 libbpf: Introduce a BPF_SNPRINTF helper macro
Similarly to BPF_SEQ_PRINTF, this macro turns variadic arguments into an
array of u64, making it more natural to call the bpf_snprintf helper.

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-6-revest@chromium.org
2021-04-19 15:27:37 -07:00
Florent Revest
83cd92b464 libbpf: Initialize the bpf_seq_printf parameters array field by field
When initializing the __param array with a one liner, if all args are
const, the initial array value will be placed in the rodata section but
because libbpf does not support relocation in the rodata section, any
pointer in this array will stay NULL.

Fixes: c09add2fbc ("tools/libbpf: Add bpf_iter support")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-5-revest@chromium.org
2021-04-19 15:27:37 -07:00
Florent Revest
7b15523a98 bpf: Add a bpf_snprintf helper
The implementation takes inspiration from the existing bpf_trace_printk
helper but there are a few differences:

To allow for a large number of format-specifiers, parameters are
provided in an array, like in bpf_seq_printf.

Because the output string takes two arguments and the array of
parameters also takes two arguments, the format string needs to fit in
one argument. Thankfully, ARG_PTR_TO_CONST_STR is guaranteed to point to
a zero-terminated read-only map so we don't need a format string length
arg.

Because the format-string is known at verification time, we also do
a first pass of format string validation in the verifier logic. This
makes debugging easier.

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-4-revest@chromium.org
2021-04-19 15:27:36 -07:00
Ido Schimmel
bf5eb67dc8 selftests: fib_nexthops: Test large scale nexthop flushing
Test that all the nexthops are flushed when a multi-part nexthop dump is
required for the flushing.

Without previous patch:

 # ./fib_nexthops.sh
 TEST: Large scale nexthop flushing                                  [FAIL]

With previous patch:

 # ./fib_nexthops.sh
 TEST: Large scale nexthop flushing                                  [ OK ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:20:34 -07:00
Josh Poimboeuf
99033461e6 objtool: Support asm jump tables
Objtool detection of asm jump tables would normally just work, except
for the fact that asm retpolines use alternatives.  Objtool thinks the
alternative code path (a jump to the retpoline) is a sibling call.

Don't treat alternative indirect branches as sibling calls when the
original instruction has a jump table.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/460cf4dc675d64e1124146562cabd2c05aa322e8.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:32 -05:00
Ido Schimmel
8826218215 selftests: fib_tests: Add test cases for interaction with mangling
Test that packets are correctly routed when netfilter mangling rules are
present.

Without previous patch:

 # ./fib_tests.sh -t ipv4_mangle

 IPv4 mangling tests
     TEST:     Connection with correct parameters                        [ OK ]
     TEST:     Connection with incorrect parameters                      [ OK ]
     TEST:     Connection with correct parameters - mangling             [FAIL]
     TEST:     Connection with correct parameters - no mangling          [ OK ]
     TEST:     Connection check - server side                            [FAIL]

 Tests passed:   3
 Tests failed:   2

 # ./fib_tests.sh -t ipv6_mangle

 IPv6 mangling tests
     TEST:     Connection with correct parameters                        [ OK ]
     TEST:     Connection with incorrect parameters                      [ OK ]
     TEST:     Connection with correct parameters - mangling             [FAIL]
     TEST:     Connection with correct parameters - no mangling          [ OK ]
     TEST:     Connection check - server side                            [FAIL]

 Tests passed:   3
 Tests failed:   2

With previous patch:

 # ./fib_tests.sh -t ipv4_mangle

 IPv4 mangling tests
     TEST:     Connection with correct parameters                        [ OK ]
     TEST:     Connection with incorrect parameters                      [ OK ]
     TEST:     Connection with correct parameters - mangling             [ OK ]
     TEST:     Connection with correct parameters - no mangling          [ OK ]
     TEST:     Connection check - server side                            [ OK ]

 Tests passed:   5
 Tests failed:   0

 # ./fib_tests.sh -t ipv6_mangle

 IPv6 mangling tests
     TEST:     Connection with correct parameters                        [ OK ]
     TEST:     Connection with incorrect parameters                      [ OK ]
     TEST:     Connection with correct parameters - mangling             [ OK ]
     TEST:     Connection with correct parameters - no mangling          [ OK ]
     TEST:     Connection check - server side                            [ OK ]

 Tests passed:   5
 Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-18 22:04:16 +02:00
Jakub Kicinski
8203c7ce4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
 - keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
 - fix build after move to net_generic

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-17 11:08:07 -07:00
Linus Torvalds
88a5af9439 Networking fixes for 5.12-rc8, including fixes from netfilter,
and bpf. BPF verifier changes stand out, otherwise things have
 slowed down.
 
 Current release - regressions:
 
  - gro: ensure frag0 meets IP header alignment
 
  - Revert "net: stmmac: re-init rx buffers when mac resume back"
 
  - ethernet: macb: fix the restore of cmp registers
 
 Previous releases - regressions:
 
  - ixgbe: Fix NULL pointer dereference in ethtool loopback test
 
  - ixgbe: fix unbalanced device enable/disable in suspend/resume
 
  - phy: marvell: fix detection of PHY on Topaz switches
 
  - make tcp_allowed_congestion_control readonly in non-init netns
 
  - xen-netback: Check for hotplug-status existence before watching
 
 Previous releases - always broken:
 
  - bpf: mitigate a speculative oob read of up to map value size by
         tightening the masking window
 
  - sctp: fix race condition in sctp_destroy_sock
 
  - sit, ip6_tunnel: Unregister catch-all devices
 
  - netfilter: nftables: clone set element expression template
 
  - netfilter: flowtable: fix NAT IPv6 offload mangling
 
  - net: geneve: check skb is large enough for IPv4/IPv6 header
 
  - netlink: don't call ->netlink_bind with table lock held
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmB6aBQACgkQMUZtbf5S
 Iruu2BAAqHKdB5Qd1iBGaA1md8f+elErsotzjONz+eh2yqDKRaOW84+Fo9TPKgu6
 se0WmAY1HMUO3TEVdFeBsrgrs+bTY1E1OdfoZ39PFNkMdKMM80Ks1rn94nrPOohy
 q1uoNxe9jjT3nRQBTKHWdB3ZC3Jetwf3LP7G2b8SoA+gNd9xl+b1H/drmv7WdE/n
 pY7/GND7wd4qqidLRDgAaavaiGIdqym8V0bZEpz7cZtjT/U6RhjkBLKSB8JFGUxP
 PQ1NFrYKmLDM1zYTSObLOrKUmEaWzPPSsXmWqGkCE4qjJ8euX0e+5EbxF98JHdYW
 O+HMtdgr4UJGWAoxyGaxk7h9w0ydVyC1+Xgi6jAFWdXP7wgvXXQrldLnO44pX/6I
 dYlIM+Br/5VmnKiS1i1gBUURREBRSEy7ZYxtREjGC7dFSUn9RPm+0s0x/DCRBS9/
 MtNo0lCiuWsyaZ2v57aEKLX4YvGpilzg4UU3/45RNW6OnFzQubvjMBJPfap6EUAC
 Ii8uUc/vX0Jq4nZVZzDZ7vlkRcJTQgUqKrzgamUuwJmyPqzefkDcbSZub3tM8G39
 eetiHS1nqe3QwuP+TYM3MaBjw0bdgNz9Wt3xmY3Ehnf3pujMR5fbAsCbcdowV5/+
 OI2ZcTUZculeAW2q9DgsOCtyS/1huwMHG0zO32TgadbFv45UCS0=
 =LN+J
 -----END PGP SIGNATURE-----

Merge tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.12-rc8, including fixes from netfilter, and
  bpf. BPF verifier changes stand out, otherwise things have slowed
  down.

  Current release - regressions:

   - gro: ensure frag0 meets IP header alignment

   - Revert "net: stmmac: re-init rx buffers when mac resume back"

   - ethernet: macb: fix the restore of cmp registers

  Previous releases - regressions:

   - ixgbe: Fix NULL pointer dereference in ethtool loopback test

   - ixgbe: fix unbalanced device enable/disable in suspend/resume

   - phy: marvell: fix detection of PHY on Topaz switches

   - make tcp_allowed_congestion_control readonly in non-init netns

   - xen-netback: Check for hotplug-status existence before watching

  Previous releases - always broken:

   - bpf: mitigate a speculative oob read of up to map value size by
     tightening the masking window

   - sctp: fix race condition in sctp_destroy_sock

   - sit, ip6_tunnel: Unregister catch-all devices

   - netfilter: nftables: clone set element expression template

   - netfilter: flowtable: fix NAT IPv6 offload mangling

   - net: geneve: check skb is large enough for IPv4/IPv6 header

   - netlink: don't call ->netlink_bind with table lock held"

* tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
  netlink: don't call ->netlink_bind with table lock held
  MAINTAINERS: update my email
  bpf: Update selftests to reflect new error states
  bpf: Tighten speculative pointer arithmetic mask
  bpf: Move sanitize_val_alu out of op switch
  bpf: Refactor and streamline bounds check into helper
  bpf: Improve verifier error messages for users
  bpf: Rework ptr_limit into alu_limit and add common error path
  bpf: Ensure off_reg has no mixed signed bounds for all types
  bpf: Move off_reg into sanitize_ptr_alu
  bpf: Use correct permission flag for mixed signed bounds arithmetic
  ch_ktls: do not send snd_una update to TCB in middle
  ch_ktls: tcb close causes tls connection failure
  ch_ktls: fix device connection close
  ch_ktls: Fix kernel panic
  i40e: fix the panic when running bpf in xdpdrv mode
  net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta
  net/mlx5e: Fix setting of RS FEC mode
  net/mlx5: Fix setting of devlink traps in switchdev mode
  Revert "net: stmmac: re-init rx buffers when mac resume back"
  ...
2021-04-17 09:57:15 -07:00
Yang Yingliang
b9c36fdedd KVM: selftests: remove redundant semi-colon
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Message-Id: <20210401142514.1688199-1-yangyingliang@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:01 -04:00
John Paul Adrian Glaubitz
f4bf09dc3a ia64: tools: remove duplicate definition of ia64_mf() on ia64
The ia64_mf() macro defined in tools/arch/ia64/include/asm/barrier.h is
already defined in <asm/gcc_intrin.h> on ia64 which causes libbpf
failing to build:

    CC       /usr/src/linux/tools/bpf/bpftool//libbpf/staticobjs/libbpf.o
  In file included from /usr/src/linux/tools/include/asm/barrier.h:24,
                   from /usr/src/linux/tools/include/linux/ring_buffer.h:4,
                   from libbpf.c:37:
  /usr/src/linux/tools/include/asm/../../arch/ia64/include/asm/barrier.h:43: error: "ia64_mf" redefined [-Werror]
     43 | #define ia64_mf()       asm volatile ("mf" ::: "memory")
        |
  In file included from /usr/include/ia64-linux-gnu/asm/intrinsics.h:20,
                   from /usr/include/ia64-linux-gnu/asm/swab.h:11,
                   from /usr/include/linux/swab.h:8,
                   from /usr/include/linux/byteorder/little_endian.h:13,
                   from /usr/include/ia64-linux-gnu/asm/byteorder.h:5,
                   from /usr/src/linux/tools/include/uapi/linux/perf_event.h:20,
                   from libbpf.c:36:
  /usr/include/ia64-linux-gnu/asm/gcc_intrin.h:382: note: this is the location of the previous definition
    382 | #define ia64_mf() __asm__ volatile ("mf" ::: "memory")
        |
  cc1: all warnings being treated as errors

Thus, remove the definition from tools/arch/ia64/include/asm/barrier.h.

Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-16 16:10:37 -07:00
John Paul Adrian Glaubitz
17786fea41 ia64: tools: remove inclusion of ia64-specific version of errno.h header
There is no longer an ia64-specific version of the errno.h header below
arch/ia64/include/uapi/asm/, so trying to build tools/bpf fails with:

    CC       /usr/src/linux/tools/bpf/bpftool/btf_dumper.o
  In file included from /usr/src/linux/tools/include/linux/err.h:8,
                   from btf_dumper.c:11:
  /usr/src/linux/tools/include/uapi/asm/errno.h:13:10: fatal error: ../../../arch/ia64/include/uapi/asm/errno.h: No such file or directory
     13 | #include "../../../arch/ia64/include/uapi/asm/errno.h"
        |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  compilation terminated.

Thus, just remove the inclusion of the ia64-specific errno.h so that the
build will use the generic errno.h header on this target which was used
there anyway as the ia64-specific errno.h was just a wrapper for the
generic header.

Fixes: c25f867ddd ("ia64: remove unneeded uapi asm-generic wrappers")
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-16 16:10:37 -07:00
David S. Miller
b022654296 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-04-17

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 9 day(s) which contain
a total of 8 files changed, 175 insertions(+), 111 deletions(-).

The main changes are:

1) Fix a potential NULL pointer dereference in libbpf's xsk
   umem handling, from Ciara Loftus.

2) Mitigate a speculative oob read of up to map value size by
   tightening the masking window, from Daniel Borkmann.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 15:48:08 -07:00
Florian Westphal
dc65fe82fb selftests: mptcp: add packet mark test case
Extend mptcp_connect tool with SO_MARK support (-M <value>) and
add a test case that checks that the packet mark gets copied to all
subflows.

This is done by only allowing packets with either skb->mark 1 or 2
via iptables.

DROP rule packet counter is checked; if its not zero, print an error
message and fail the test case.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 15:23:10 -07:00
Daniel Borkmann
d7a5091351 bpf: Update selftests to reflect new error states
Update various selftest error messages:

 * The 'Rx tried to sub from different maps, paths, or prohibited types'
   is reworked into more specific/differentiated error messages for better
   guidance.

 * The change into 'value -4294967168 makes map_value pointer be out of
   bounds' is due to moving the mixed bounds check into the speculation
   handling and thus occuring slightly later than above mentioned sanity
   check.

 * The change into 'math between map_value pointer and register with
   unbounded min value' is similarly due to register sanity check coming
   before the mixed bounds check.

 * The case of 'map access: known scalar += value_ptr from different maps'
   now loads fine given masks are the same from the different paths (despite
   max map value size being different).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-16 23:52:15 +02:00
Marco Elver
6216798bf9 selftests/perf_events: Add kselftest for remove_on_exec
Add kselftest to test that remove_on_exec removes inherited events from
child tasks.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210408103605.1676875-9-elver@google.com
2021-04-16 16:32:42 +02:00
Marco Elver
f2c3c32f45 selftests/perf_events: Add kselftest for process-wide sigtrap handling
Add a kselftest for testing process-wide perf events with synchronous
SIGTRAP on events (using breakpoints). In particular, we want to test
that changes to the event propagate to all children, and the SIGTRAPs
are in fact synchronously sent to the thread where the event occurred.

Note: The "signal_stress" test case is also added later in the series to
perf tool's built-in tests. The test here is more elaborate in that
respect, which on one hand avoids bloating the perf tool unnecessarily,
but we also benefit from structured tests with TAP-compliant output that
the kselftest framework provides.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210408103605.1676875-8-elver@google.com
2021-04-16 16:32:42 +02:00
Alexander Shishkin
874fc35cdd perf intel-pt: Use aux_watermark
Turns out, the default setting of attr.aux_watermark to half of the total
buffer size is not very useful, especially with smaller buffers. The
problem is that, after half of the buffer is filled up, the kernel updates
->aux_head and sets up the next "transaction", while observing that
->aux_tail is still zero (as userspace haven't had the chance to update
it), meaning that the trace will have to stop at the end of this second
"transaction". This means, for example, that the second PERF_RECORD_AUX in
every trace comes with TRUNCATED flag set.

Setting attr.aux_watermark to quarter of the buffer gives enough space for
the ->aux_tail update to be observed and prevents the data loss.

The obligatory before/after showcase:

> # perf_before record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 6 times to write data ]
> Warning:
> AUX data lost 4 times out of 10!
>
> [ perf record: Captured and wrote 0.099 MB perf.data ]
> # perf record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 4 times to write data ]
> [ perf record: Captured and wrote 0.039 MB perf.data ]

The effect is still visible with large workloads and large buffers,
although less pronounced.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210414154955.49603-3-alexander.shishkin@linux.intel.com
2021-04-16 16:32:39 +02:00
Yonghong Song
8af5014276 bpftool: Fix a clang compilation warning
With clang compiler:
  make -j60 LLVM=1 LLVM_IAS=1  <=== compile kernel
  # build selftests/bpf or bpftool
  make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1
  make -j60 -C tools/bpf/bpftool LLVM=1 LLVM_IAS=1
the following compilation warning showed up,
  net.c:160:37: warning: comparison of integers of different signs: '__u32' (aka 'unsigned int') and 'int' [-Wsign-compare]
                for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
                                                  ^~~~~~~~~~~~~~~~~
  .../tools/include/uapi/linux/netlink.h:99:24: note: expanded from macro 'NLMSG_OK'
                           (nlh)->nlmsg_len <= (len))
                           ~~~~~~~~~~~~~~~~ ^   ~~~

In this particular case, "len" is defined as "int" and (nlh)->nlmsg_len is "unsigned int".
The macro NLMSG_OK is defined as below in uapi/linux/netlink.h.
  #define NLMSG_OK(nlh,len) ((len) >= (int)sizeof(struct nlmsghdr) && \
                             (nlh)->nlmsg_len >= sizeof(struct nlmsghdr) && \
                             (nlh)->nlmsg_len <= (len))

The clang compiler complains the comparision "(nlh)->nlmsg_len <= (len))",
but in bpftool/net.c, it is already ensured that "len > 0" must be true.
So theoretically the compiler could deduce that comparison of
"(nlh)->nlmsg_len" and "len" is okay, but this really depends on compiler
internals. Let us add an explicit type conversion (from "int" to "unsigned int")
for "len" in NLMSG_OK to silence this warning right now.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153435.3029635-1-yhs@fb.com
2021-04-15 16:50:22 -07:00
Yonghong Song
ef9985893c selftests/bpf: Silence clang compilation warnings
With clang compiler:
  make -j60 LLVM=1 LLVM_IAS=1  <=== compile kernel
  make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1
Some linker flags are not used/effective for some binaries and
we have warnings like:
  warning: -lelf: 'linker' input unused [-Wunused-command-line-argument]

We also have warnings like:
  .../selftests/bpf/prog_tests/ns_current_pid_tgid.c:74:57: note: treat the string as an argument to avoid this
        if (CHECK(waitpid(cpid, &wstatus, 0) == -1, "waitpid", strerror(errno)))
                                                               ^
                                                               "%s",
  .../selftests/bpf/test_progs.h:129:35: note: expanded from macro 'CHECK'
        _CHECK(condition, tag, duration, format)
                                         ^
  .../selftests/bpf/test_progs.h:108:21: note: expanded from macro '_CHECK'
                fprintf(stdout, ##format);                              \
                                  ^
The first warning can be silenced with clang option -Wno-unused-command-line-argument.
For the second warning, source codes are modified as suggested by the compiler
to silence the warning. Since gcc does not support the option
-Wno-unused-command-line-argument and the warning only happens with clang
compiler, the option -Wno-unused-command-line-argument is enabled only when
clang compiler is used.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153429.3029377-1-yhs@fb.com
2021-04-15 16:50:22 -07:00
Yonghong Song
a22c0c81da selftests/bpf: Fix test_cpp compilation failure with clang
With clang compiler:
  make -j60 LLVM=1 LLVM_IAS=1  <=== compile kernel
  make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1
the test_cpp build failed due to the failure:
  warning: treating 'c-header' input as 'c++-header' when in C++ mode, this behavior is deprecated [-Wdeprecated]
  clang-13: error: cannot specify -o when generating multiple output files

test_cpp compilation flag looks like:
  clang++ -g -Og -rdynamic -Wall -I<...> ... \
  -Dbpf_prog_load=bpf_prog_test_load -Dbpf_load_program=bpf_test_load_program \
  test_cpp.cpp <...>/test_core_extern.skel.h <...>/libbpf.a <...>/test_stub.o \
  -lcap -lelf -lz -lrt -lpthread -o <...>/test_cpp

The clang++ compiler complains the header file in the command line and
also failed the compilation due to this.
Let us remove the header file from the command line which is not intended
any way, and this fixed the compilation problem.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153424.3028986-1-yhs@fb.com
2021-04-15 16:50:22 -07:00
Yonghong Song
f62700ce63 tools: Allow proper CC/CXX/... override with LLVM=1 in Makefile.include
selftests/bpf/Makefile includes tools/scripts/Makefile.include.
With the following command
  make -j60 LLVM=1 LLVM_IAS=1  <=== compile kernel
  make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed the case if CC/AR/LD/CXX/STRIP is allowed to be
overridden, it will be written to clang/llvm-ar/..., instead of
gcc binaries. The definition of CC_NO_CLANG is also relocated
to the place after the above CC is defined.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153419.3028165-1-yhs@fb.com
2021-04-15 16:50:21 -07:00
Yonghong Song
26e6dd1072 selftests: Set CC to clang in lib.mk if LLVM is set
selftests/bpf/Makefile includes lib.mk. With the following command
  make -j60 LLVM=1 LLVM_IAS=1  <=== compile kernel
  make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
2021-04-15 16:50:21 -07:00
Alexei Starovoitov
d3d93e34bd libbpf: Remove unused field.
relo->processed is set, but not used. Remove it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210415141817.53136-1-alexei.starovoitov@gmail.com
2021-04-15 15:34:16 -07:00
Rob Herring
d3003d9e68 libperf tests: Add support for verbose printing
Add __T_VERBOSE() so tests can add verbose output. The verbose output is
enabled with the '-v' command line option. Running 'make tests V=1' will
enable the '-v' option when running the tests.

It'll be used in the next patch, for a user space counter access test.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20210414155412.3697605-3-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-15 16:40:15 -03:00
Rob Herring
6cd70754f2 libperf: Add evsel mmap support
In order to support usersapce access, an event must be mmapped. While
there's already mmap support for evlist, the usecase is a bit different
than the self monitoring with userspace access. So let's add new
perf_evsel__mmap()/perf_evsel_munmap() functions to mmap/munmap an
evsel. This allows implementing userspace access as a fastpath for
perf_evsel__read().

The mmapped address is returned by perf_evsel__mmap_base() which
primarily for users/tests to check if userspace access is enabled.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20210414155412.3697605-2-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-15 16:38:51 -03:00
Rob Herring
2fc83c2cd7 tools include: Add an initial math64.h
Add an initial math64.h similar to linux/math64.h with functions
mul_u64_u64_div64() and mul_u64_u32_shr(). This isn't a direct copy of
include/linux/math64.h as that doesn't define mul_u64_u64_div64().

Implementation was written by Peter Zilkstra based on linux/math64.h
and div64.h[1]. The original implementation was not optimal on arm64 as
__int128 division is not optimal with a call out to __udivti3, so I
dropped the __int128 variant of mul_u64_u64_div64().

[1] https://lore.kernel.org/lkml/20200322101848.GF2452@worktop.programming.kicks-ass.net/

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20210414155412.3697605-2-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-15 16:38:02 -03:00
Vitaly Chikunov
2e1daee14e perf beauty: Fix fsconfig generator
After gnulib update sed stopped matching `[[:space:]]*+' as before,
causing the following compilation error:

  In file included from builtin-trace.c:719:
  trace/beauty/generated/fsconfig_arrays.c:2:3: error: expected expression before ']' token
      2 |  [] = "",
	|   ^
  trace/beauty/generated/fsconfig_arrays.c:2:3: error: array index in initializer not of integer type
  trace/beauty/generated/fsconfig_arrays.c:2:3: note: (near initialization for 'fsconfig_cmds')

Fix this by correcting the regular expression used in the generator.
Also, clean up the script by removing redundant egrep, xargs, and printf
invocations.

Committer testing:

Continues to work:

  $ cat tools/perf/trace/beauty/fsconfig.sh
  #!/bin/sh
  # SPDX-License-Identifier: LGPL-2.1

  if [ $# -ne 1 ] ; then
  	linux_header_dir=tools/include/uapi/linux
  else
  	linux_header_dir=$1
  fi

  linux_mount=${linux_header_dir}/mount.h

  printf "static const char *fsconfig_cmds[] = {\n"
  ms='[[:space:]]*'
  sed -nr "s/^${ms}FSCONFIG_([[:alnum:]_]+)${ms}=${ms}([[:digit:]]+)${ms},.*/\t[\2] = \"\1\",/p" \
  	${linux_mount}
  printf "};\n"
  $ tools/perf/trace/beauty/fsconfig.sh
  static const char *fsconfig_cmds[] = {
  	[0] = "SET_FLAG",
  	[1] = "SET_STRING",
  	[2] = "SET_BINARY",
  	[3] = "SET_PATH",
  	[4] = "SET_PATH_EMPTY",
  	[5] = "SET_FD",
  	[6] = "CMD_CREATE",
  	[7] = "CMD_RECONFIGURE",
  };
  $

Fixes: d35293004a ("perf beauty: Add generator for fsconfig's 'cmd' arg values")
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Co-authored-by: Dmitry V. Levin <ldv@altlinux.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20210414182723.1670663-1-vt@altlinux.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-15 16:34:11 -03:00
Arnaldo Carvalho de Melo
3535a6967c perf record: Improve 'Workload failed' message printing events + what was exec'ed
Before:

  # perf record -a cycles,instructions,cache-misses
  Workload failed: No such file or directory
  #

After:

  # perf record -a cycles,instructions,cache-misses
  Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory
  #

Helps disambiguating other error scenarios:

  # perf record -a -e cycles,instructions,cache-misses bla
  Failed to collect 'cycles,instructions,cache-misses' for the 'bla' workload: No such file or directory
  # perf record -a cycles,instructions,cache-misses sleep 1
  Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory
  #

When all goes well we're back to the usual:

  # perf record -a -e cycles,instructions,cache-misses sleep 1
  [ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 3.151 MB perf.data (21242 samples) ]
  #

Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210414131628.2064862-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-15 16:34:05 -03:00
Arnaldo Carvalho de Melo
9865ea8ab3 perf evlist: Add a method to return the list of evsels as a string
Add a 'scnprintf' method to obtain the list of evsels in a evlist as a
string, excluding the "dummy" event used for things like receiving
metadata events (PERF_RECORD_FORK, MMAP, etc) when synthesizing
preexisting threads.

Will be used to improve the error message for workload failure in 'perf
record.

Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210414131628.2064862-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-15 16:33:57 -03:00
zuoqilin
069904ce31 tools/testing: Remove unused variable
Remove unused variable "ret2".

Signed-off-by: zuoqilin <zuoqilin@yulong.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414141639.1446-1-zuoqilin1@163.com
2021-04-14 17:28:58 -07:00
Florent Revest
1969b3c60d selftests/bpf: Fix the ASSERT_ERR_PTR macro
It is just missing a ';'. This macro is not used by any test yet.

Fixes: 22ba363516 ("selftests/bpf: Move and extend ASSERT_xxx() testing macros")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210414155632.737866-1-revest@chromium.org
2021-04-14 17:24:22 -07:00
Greg Kroah-Hartman
a00fcbc115 Linux 5.12-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBzdS0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGDdAIAIpKH/tAHhH7s7QH
 m5ewgE8foP7M5Ue9fp3+JmbtaYSzhCAMcKhqGtat/zk5PvA9AoYCDXrTetfYtBHh
 LUOmhL9hcKItNobfkYBok6BiFjGUEL3HMqz5w+MUsMwnXIc4RXqfJmsQ932z9Kxf
 yDwe6ehIzJVrQLI/C0mTamYRHu2aiZ1VWzhKuT493rLeg0R2odCCIClPN+/QvCwb
 8/sk6l1c8eOUYYMUzKFZifaZGb12qDjRt4pZmk51aMTzg0WCpElJG+7Uqr4QQhZP
 p6xeNuUQq6WwxtlDkmo79Uzkrurb5tN2/hZ1RcJhs3EdHfpR0MjIyH3Znnb31gnu
 39VjHhg=
 =4KP/
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc7' into driver-core-next

We need the driver core fix in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14 19:53:39 +02:00
Yang Jihong
5676dba708 perf annotate: Fix sample events lost in stdio mode
In hist__find_annotations(), since different 'struct hist_entry' entries
may point to same symbol, we free notes->src to signal already processed
this symbol in stdio mode; when annotate, entry will skipped if
notes->src is NULL to avoid repeated output.

However, there is a problem, for example, run the following command:

 # perf record -e branch-misses -e branch-instructions -a sleep 1

perf.data file contains different types of sample event.

If the same IP sample event exists in branch-misses and branch-instructions,
this event uses the same symbol. When annotate branch-misses events, notes->src
corresponding to this event is set to null, as a result, when annotate
branch-instructions events, this event is skipped and no annotate is output.

Solution of this patch is to remove zfree in hists__find_annotations and
change sort order to "dso,symbol" to avoid duplicate output when different
processes correspond to the same symbol.

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: zhangjinhao2@huawei.com
Link: http://lore.kernel.org/lkml/20210319123527.173883-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-14 09:23:54 -03:00
Toke Høiland-Jørgensen
463c2149ed selftests/bpf: Add tests for target information in bpf_link info queries
Extend the fexit_bpf2bpf test to check that the info for the bpf_link
returned by the kernel matches the expected values.

While we're updating the test, change existing uses of CHEC() to use the
much easier to read ASSERT_*() macros.

v2:
- Convert last CHECK() call and get rid of 'duration' var
- Split ASSERT_OK_PTR() checks to two separate if statements

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413091607.58945-2-toke@redhat.com
2021-04-13 18:18:57 -07:00
Toke Høiland-Jørgensen
441e8c66b2 bpf: Return target info when a tracing bpf_link is queried
There is currently no way to discover the target of a tracing program
attachment after the fact. Add this information to bpf_link_info and return
it when querying the bpf_link fd.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413091607.58945-1-toke@redhat.com
2021-04-13 18:18:57 -07:00
Arnaldo Carvalho de Melo
473b2922c7 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes from perf/urgent that got into upstream.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-13 15:14:37 -03:00
Pedro Tammela
5c50732900 libbpf: Clarify flags in ringbuf helpers
In 'bpf_ringbuf_reserve()' we require the flag to '0' at the moment.

For 'bpf_ringbuf_{discard,submit,output}' a flag of '0' might send a
notification to the process if needed.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210412192434.944343-1-pctammela@mojatatu.com
2021-04-12 21:28:33 -07:00
Daniel Borkmann
cbaa683bb3 bpf: Sync bpf headers in tooling infrastucture
Synchronize tools/include/uapi/linux/bpf.h which was missing changes
from various commits:

  - f3c45326ee ("bpf: Document PROG_TEST_RUN limitations")
  - e5e35e754c ("bpf: BPF-helper for MTU checking add length input")

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2021-04-12 17:31:09 +02:00
Greg Kroah-Hartman
14d34d2dbb Merge 5.12-rc7 into usb-next
We need the USB fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-12 08:15:27 +02:00
Paolo Abeni
1c3cadbe02 self-tests: add veth tests
Add some basic veth tests, that verify the expected flags and
aggregation with different setups (default, xdp, etc...)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:39:28 -07:00
Ingo Molnar
c9450f728c Merge branch 'for-mingo-lkmm' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/core
Pull LKMM changes from Paul E. McKenney: misc documentation updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-04-11 14:36:23 +02:00
Ingo Molnar
120b566d1d Merge branch 'for-mingo-rcu' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

 - Bitmap support for "N" as alias for last bit

 - kvfree_rcu updates

 - mm_dump_obj() updates.  (One of these is to mm, but was suggested by Andrew Morton.)

 - RCU callback offloading update

 - Polling RCU grace-period interfaces

 - Realtime-related RCU updates

 - Tasks-RCU updates

 - Torture-test updates

 - Torture-test scripting updates

 - Miscellaneous fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-04-11 14:31:43 +02:00
Jakub Kicinski
8859a44ea0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

MAINTAINERS
 - keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
 - simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
 - trivial
include/linux/ethtool.h
 - trivial, fix kdoc while at it
include/linux/skmsg.h
 - move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
 - add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
 - trivial

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 20:48:35 -07:00
Linus Torvalds
9288e1f7d3 perf tools fixes for v5.12: 3rd batch
- Fix wrong LBR block sorting in 'perf report'.
 
 - Fix 'perf inject' repipe usage when consuming perf.data files.
 
 - Avoid potential buffer overrun when decoding ARM SPE hardware tracing
   packets, bug found using a fuzzer.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYHCd+AAKCRCyPKLppCJ+
 J2FjAP9m+5RUppmssRK3CCEmcsvxhbtfpKnGMQJzRdR8A+fIKgEA+DN7UBqSUexW
 oTUqu51DkYwYMr/6FS3zea2ELoHI0wI=
 =xUDW
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v5.12-2020-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool fixes from Arnaldo Carvalho de Melo:

 - Fix wrong LBR block sorting in 'perf report'

 - Fix 'perf inject' repipe usage when consuming perf.data files

 - Avoid potential buffer overrun when decoding ARM SPE hardware tracing
   packets, bug found using a fuzzer

* tag 'perf-tools-fixes-for-v5.12-2020-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf arm-spe: Avoid potential buffer overrun
  perf report: Fix wrong LBR block sorting
  perf inject: Fix repipe usage
2021-04-09 17:12:31 -07:00
Linus Torvalds
4e04e7513b Networking fixes for 5.12-rc7, including fixes from can, ipsec,
mac80211, wireless, and bpf trees. No scary regressions here
 or in the works, but small fixes for 5.12 changes keep coming.
 
 Current release - regressions:
 
  - virtio: do not pull payload in skb->head
 
  - virtio: ensure mac header is set in virtio_net_hdr_to_skb()
 
  - Revert "net: correct sk_acceptq_is_full()"
 
  - mptcp: revert "mptcp: provide subflow aware release function"
 
  - ethernet: lan743x: fix ethernet frame cutoff issue
 
  - dsa: fix type was not set for devlink port
 
  - ethtool: remove link_mode param and derive link params
             from driver
 
  - sched: htb: fix null pointer dereference on a null new_q
 
  - wireless: iwlwifi: Fix softirq/hardirq disabling in
                       iwl_pcie_enqueue_hcmd()
 
  - wireless: iwlwifi: fw: fix notification wait locking
 
  - wireless: brcmfmac: p2p: Fix deadlock introduced by avoiding
                             the rtnl dependency
 
 Current release - new code bugs:
 
  - napi: fix hangup on napi_disable for threaded napi
 
  - bpf: take module reference for trampoline in module
 
  - wireless: mt76: mt7921: fix airtime reporting and related
                            tx hangs
 
  - wireless: iwlwifi: mvm: rfi: don't lock mvm->mutex when sending
                                 config command
 
 Previous releases - regressions:
 
  - rfkill: revert back to old userspace API by default
 
  - nfc: fix infinite loop, refcount & memory leaks in LLCP sockets
 
  - let skb_orphan_partial wake-up waiters
 
  - xfrm/compat: Cleanup WARN()s that can be user-triggered
 
  - vxlan, geneve: do not modify the shared tunnel info when PMTU
                   triggers an ICMP reply
 
  - can: fix msg_namelen values depending on CAN_REQUIRED_SIZE
 
  - can: uapi: mark union inside struct can_frame packed
 
  - sched: cls: fix action overwrite reference counting
 
  - sched: cls: fix err handler in tcf_action_init()
 
  - ethernet: mlxsw: fix ECN marking in tunnel decapsulation
 
  - ethernet: nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
 
  - ethernet: i40e: fix receiving of single packets in xsk zero-copy
                    mode
 
  - ethernet: cxgb4: avoid collecting SGE_QBASE regs during traffic
 
 Previous releases - always broken:
 
  - bpf: Refuse non-O_RDWR flags in BPF_OBJ_GET
 
  - bpf: Refcount task stack in bpf_get_task_stack
 
  - bpf, x86: Validate computation of branch displacements
 
  - ieee802154: fix many similar syzbot-found bugs
     - fix NULL dereferences in netlink attribute handling
     - reject unsupported operations on monitor interfaces
     - fix error handling in llsec_key_alloc()
 
  - xfrm: make ipv4 pmtu check honor ip header df
 
  - xfrm: make hash generation lock per network namespace
 
  - xfrm: esp: delete NETIF_F_SCTP_CRC bit from features for esp
               offload
 
  - ethtool: fix incorrect datatype in set_eee ops
 
  - xdp: fix xdp_return_frame() kernel BUG throw for page_pool
         memory model
 
  - openvswitch: fix send of uninitialized stack memory in ct limit
                 reply
 
 Misc:
 
  - udp: add get handling for UDP_GRO sockopt
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmBwyfAACgkQMUZtbf5S
 IruJ/BAAnjghw2kWXRCKK3Tkm0pi0zjaKvTS30AcKCW2+GnqSxTdiWNv+mxqFgnm
 YdduPKiGwLoDkA2i2d4EF8/HK6m+Q6bHcUbZ2npEm1ElkKfxCYGmocor8n2kD+a9
 je94VGYV7zytnxXw85V6/jFLDqOXXwhBfHhlDMVBZP8OyzUfbDKGorWmyGuy9GJp
 81bvzqN2bHUGIM0cDr+ol3eYw2ituGWgiqNfnq7z+/NVcYmD0EPChDRbp0jtH1ng
 dcoONI6YlymDEDpu/9GmyKL1ken9lcWoVdvv/aDGtP62x6SYDt5HKe3wAtJ+Kjbq
 jIPADxPx5BymYIZRBtdNR0rP66LycA7hDtM/C/h1WoihDXwpGeNUU4g0aJ+hsP5Q
 ldwJI1DJo79VbwM2c3Kg73PaphLcPD4RdwF0/ovFsl0+bTDfj8i93ah4Wnzj0Qli
 EMiSDEDNb51e9nkW+xu+FjLWmxHJvLOL/+VgHV5bPJJBob2fqnjAMj2PkPEuEtXY
 TPWEh9y3zaEyp/9tNx0cstGOt6Gf5DQ5Nk6tX6hMpJT/BeL8mju1jm0yPLZhMJjF
 LlTrJgXftfP/cjltdSm4aVqSU5okjHNYDhmHlNgvzih5mt+NVslRJfzwq62Vudqy
 C0kpmVdQNFkOB0UcqQihevZg9mvem3m/dYl+v/MV7Uq6r4s4M2A=
 =SHL0
 -----END PGP SIGNATURE-----

Merge tag 'net-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.12-rc7, including fixes from can, ipsec,
  mac80211, wireless, and bpf trees.

  No scary regressions here or in the works, but small fixes for 5.12
  changes keep coming.

  Current release - regressions:

   - virtio: do not pull payload in skb->head

   - virtio: ensure mac header is set in virtio_net_hdr_to_skb()

   - Revert "net: correct sk_acceptq_is_full()"

   - mptcp: revert "mptcp: provide subflow aware release function"

   - ethernet: lan743x: fix ethernet frame cutoff issue

   - dsa: fix type was not set for devlink port

   - ethtool: remove link_mode param and derive link params from driver

   - sched: htb: fix null pointer dereference on a null new_q

   - wireless: iwlwifi: Fix softirq/hardirq disabling in
     iwl_pcie_enqueue_hcmd()

   - wireless: iwlwifi: fw: fix notification wait locking

   - wireless: brcmfmac: p2p: Fix deadlock introduced by avoiding the
     rtnl dependency

  Current release - new code bugs:

   - napi: fix hangup on napi_disable for threaded napi

   - bpf: take module reference for trampoline in module

   - wireless: mt76: mt7921: fix airtime reporting and related tx hangs

   - wireless: iwlwifi: mvm: rfi: don't lock mvm->mutex when sending
     config command

  Previous releases - regressions:

   - rfkill: revert back to old userspace API by default

   - nfc: fix infinite loop, refcount & memory leaks in LLCP sockets

   - let skb_orphan_partial wake-up waiters

   - xfrm/compat: Cleanup WARN()s that can be user-triggered

   - vxlan, geneve: do not modify the shared tunnel info when PMTU
     triggers an ICMP reply

   - can: fix msg_namelen values depending on CAN_REQUIRED_SIZE

   - can: uapi: mark union inside struct can_frame packed

   - sched: cls: fix action overwrite reference counting

   - sched: cls: fix err handler in tcf_action_init()

   - ethernet: mlxsw: fix ECN marking in tunnel decapsulation

   - ethernet: nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx

   - ethernet: i40e: fix receiving of single packets in xsk zero-copy
     mode

   - ethernet: cxgb4: avoid collecting SGE_QBASE regs during traffic

  Previous releases - always broken:

   - bpf: Refuse non-O_RDWR flags in BPF_OBJ_GET

   - bpf: Refcount task stack in bpf_get_task_stack

   - bpf, x86: Validate computation of branch displacements

   - ieee802154: fix many similar syzbot-found bugs
       - fix NULL dereferences in netlink attribute handling
       - reject unsupported operations on monitor interfaces
       - fix error handling in llsec_key_alloc()

   - xfrm: make ipv4 pmtu check honor ip header df

   - xfrm: make hash generation lock per network namespace

   - xfrm: esp: delete NETIF_F_SCTP_CRC bit from features for esp
     offload

   - ethtool: fix incorrect datatype in set_eee ops

   - xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory
     model

   - openvswitch: fix send of uninitialized stack memory in ct limit
     reply

  Misc:

   - udp: add get handling for UDP_GRO sockopt"

* tag 'net-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (182 commits)
  net: fix hangup on napi_disable for threaded napi
  net: hns3: Trivial spell fix in hns3 driver
  lan743x: fix ethernet frame cutoff issue
  net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
  net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
  net: dsa: lantiq_gswip: Don't use PHY auto polling
  net: sched: sch_teql: fix null-pointer dereference
  ipv6: report errors for iftoken via netlink extack
  net: sched: fix err handler in tcf_action_init()
  net: sched: fix action overwrite reference counting
  Revert "net: sched: bump refcount for new action in ACT replace mode"
  ice: fix memory leak of aRFS after resuming from suspend
  i40e: Fix sparse warning: missing error code 'err'
  i40e: Fix sparse error: 'vsi->netdev' could be null
  i40e: Fix sparse error: uninitialized symbol 'ring'
  i40e: Fix sparse errors in i40e_txrx.c
  i40e: Fix parameters in aq_get_phy_register()
  nl80211: fix beacon head validation
  bpf, x86: Validate computation of branch displacements for x86-32
  bpf, x86: Validate computation of branch displacements for x86-64
  ...
2021-04-09 15:26:51 -07:00
Yauheni Kaliuta
cfc0889ceb selftests/bpf: ringbuf_multi: Test bpf_map__set_inner_map_fd
Test map__set_inner_map_fd() interaction with map-in-map
initialization. Use hashmap of maps just to make it different to
existing array of maps.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-9-yauheni.kaliuta@redhat.com
2021-04-08 23:54:48 -07:00
Yauheni Kaliuta
f3f4c23e12 selftests/bpf: ringbuf_multi: Use runtime page size
Set bpf table sizes dynamically according to the runtime page size
value.

Do not switch to ASSERT macros, keep CHECK, for consistency with the
rest of the test. Can be a separate cleanup patch.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-8-yauheni.kaliuta@redhat.com
2021-04-08 23:54:48 -07:00
Andrii Nakryiko
b3278099b2 libbpf: Add bpf_map__inner_map API
The API gives access to inner map for map in map types (array or
hash of map). It will be used to dynamically set max_entries in it.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-7-yauheni.kaliuta@redhat.com
2021-04-08 23:54:48 -07:00
Yauheni Kaliuta
23a6576606 selftests/bpf: ringbuf: Use runtime page size
Replace hardcoded 4096 with runtime value in the userspace part of
the test and set bpf table sizes dynamically according to the value.

Do not switch to ASSERT macros, keep CHECK, for consistency with the
rest of the test. Can be a separate cleanup patch.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-6-yauheni.kaliuta@redhat.com
2021-04-08 23:54:48 -07:00
Yauheni Kaliuta
34090aaf25 selftests/bpf: mmap: Use runtime page size
Replace hardcoded 4096 with runtime value in the userspace part of
the test and set bpf table sizes dynamically according to the value.

Do not switch to ASSERT macros, keep CHECK, for consistency with the
rest of the test. Can be a separate cleanup patch.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-5-yauheni.kaliuta@redhat.com
2021-04-08 23:54:48 -07:00
Yauheni Kaliuta
7a85e4dfa7 selftests/bpf: Pass page size from userspace in map_ptr
Use ASSERT to check result but keep CHECK where format was used to
report error.

Use bpf_map__set_max_entries() to set map size dynamically from
userspace according to page size.

Zero-initialize the variable in bpf prog, otherwise it will cause
problems on some versions of Clang.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-4-yauheni.kaliuta@redhat.com
2021-04-08 23:54:47 -07:00
Yauheni Kaliuta
361d32028c selftests/bpf: Pass page size from userspace in sockopt_sk
Since there is no convenient way for bpf program to get PAGE_SIZE
from inside of the kernel, pass the value from userspace.

Zero-initialize the variable in bpf prog, otherwise it will cause
problems on some versions of Clang.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-3-yauheni.kaliuta@redhat.com
2021-04-08 23:54:47 -07:00
Yauheni Kaliuta
cad99cce13 selftests/bpf: test_progs/sockopt_sk: Convert to use BPF skeleton
Switch the test to use BPF skeleton to save some boilerplate and
make it easy to access bpf program bss segment.

The latter will be used to pass PAGE_SIZE from userspace since there
is no convenient way for bpf program to get it from inside of the
kernel.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-2-yauheni.kaliuta@redhat.com
2021-04-08 23:54:47 -07:00
Yauheni Kaliuta
ff182bc572 selftests/bpf: test_progs/sockopt_sk: Remove version
As pointed by Andrii Nakryiko, _version is useless now, remove it.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-1-yauheni.kaliuta@redhat.com
2021-04-08 23:54:47 -07:00
Ciara Loftus
afd0be7299 libbpf: Fix potential NULL pointer dereference
Wait until after the UMEM is checked for null to dereference it.

Fixes: 43f1bc1eff ("libbpf: Restore umem state after socket create failure")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210408052009.7844-1-ciara.loftus@intel.com
2021-04-08 23:36:46 +02:00
Vlad Buslov
652e3124c3 tc-testing: add simple action test to verify batch change cleanup
Verify cleanup of failed actions batch change where second action in batch
fails after successful init of first action.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:50:14 -07:00
Vlad Buslov
79749ae19d tc-testing: add simple action test to verify batch add cleanup
Verify cleanup of failed actions batch add where second action in batch
fails after successful init of first action.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:50:14 -07:00
Masanari Iida
84de8154c5 tools: Fix a typo in kernel-chktaint
This patch fixes a spelling typo in kernel-chktaint

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210402070514.336376-1-standby24x7@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-04-08 11:41:22 -06:00
Smita Koralahalli
da66658638 perf vendor events amd: Add Zen3 events
Add PMU events for AMD Zen3 processors as documented in the AMD Processor
Programming Reference for Family 19h and Model 01h [1].

Below are the events which are new on Zen3:

  PMCx041 ls_mab_alloc.{all_allocations|hardware_prefetcher_allocations|load_store_allocations}
  PMCx043 ls_dmnd_fills_from_sys.ext_cache_local
  PMCx044 ls_any_fills_from_sys.{mem_io_remote|ext_cache_remote|mem_io_local|ext_cache_local|int_cache|lcl_l2}
  PMCx047 ls_misal_loads.{ma4k|ma64}
  PMCx059 ls_sw_pf_dc_fills.ext_cache_local
  PMCx05a ls_hw_pf_dc_fills.ext_cache_local
  PMCx05f ls_alloc_mab_count
  PMCx085 bp_l1_tlb_miss_l2_tlb_miss.coalesced_4k
  PMCx0ab de_dis_cops_from_decoder.disp_op_type.{any_integer_dispatch|any_fp_dispatch}
  PMCx0cc ex_ret_ind_brch_instr
  PMCx18e ic_tag_hit_miss.{all_instruction_cache_accesses|instruction_cache_miss|instruction_cache_hit}
  PMCx1c7 ex_ret_msprd_brnch_instr_dir_msmtch
  PMCx28f op_cache_hit_miss.{all_op_cache_accesses|op_cache_miss|op_cache_hit}

Section 2.1.17.2 "Performance Measurement" of "PPR for AMD Family 19h,
Model 01h, Revision B1 Processors - 55898 Rev 0.35 - Feb 5, 2021." lists
new metrics. Add them.

Preserve the events for Zen3 if they are measurable and non-zero as taken
from Zen2 directory even if the PPR of Zen3 [1] omits them. Those events
are the following:

  PMCx000 fpu_pipe_assignment.{total|total0|total1|total2|total3}
  PMCx004 fp_num_mov_elim_scal_op.{optimized|opt_potential|sse_mov_ops_elim|sse_mov_ops}
  PMCx02D ls_rdtsc
  PMCx040 ls_dc_accesses
  PMCx046 ls_tablewalker.{iside|ic_type1|ic_type0|dside|dc_type1|dc_type0}
  PMCx061 l2_request_g2.{group1|ls_rd_sized|ls_rd_sized_nc|ic_rd_sized|ic_rd_sized_nc|smc_inval|bus_lock_originator|bus_locks_responses}
  PMCx062 l2_latency.l2_cycles_waiting_on_fills
  PMCx063 l2_wcb_req.{wcb_write|wcb_close|zero_byte_store|cl_zero}
  PMCx06d l2_fill_pending.l2_fill_busy
  PMCx080 ic_fw32
  PMCx081 ic_fw32_miss
  PMCx086 bp_snp_re_sync
  PMCx087 ic_fetch_stall.{ic_stall_any|ic_stall_dq_empty|ic_stall_back_pressure}
  PMCx08a bp_l1_btb_correct
  PMCx08c ic_cache_inval.{l2_invalidating_probe|fill_invalidated}
  PMCx099 bp_tlb_rel
  PMCx0a9 de_dis_uop_queue_empty_di0
  PMCx0c7 ex_ret_brn_resync
  PMCx28a ic_oc_mode_switch.{oc_ic_mode_switch|ic_oc_mode_switch}
  L3PMCx01 l3_request_g1.caching_l3_cache_accesses
  L3PMCx06 l3_comb_clstr_state.{other_l3_miss_typs|request_miss}

[1] Processor Programming Reference (PPR) for AMD Family 19h, Model 01h,
Revision B1 Processors - 55898 Rev 0.35 - Feb 5, 2021.

[2] Processor Programming Reference (PPR) for AMD Family 17h Model 71h,
Revision B0 Processors, 56176 Rev 3.06 - Jul 17, 2019.

[3] Processor Programming Reference (PPR) for AMD Family 17h Models
01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019.

All of the PPRs can be found at:

https://bugzilla.kernel.org/show_bug.cgi?id=206537

Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-5-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:39 -03:00
Smita Koralahalli
e5f2b4e1b8 perf vendor events amd: Use 0x%02x format for event code and umask
Use 0x%02x format for all event codes and umasks as this helps in tracking
changes of automatically generated event tables.

Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-4-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:38 -03:00
Smita Koralahalli
ff64c98195 perf vendor events amd: Use lowercases for all the eventcodes and umasks
The values of event codes and umasks are inconsistent with letter cases.
Enforce a unique style and default everything to lower case as this
helps in tracking changes of automatically generated event tables.

Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-3-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:38 -03:00
Smita Koralahalli
86c2bc3da7 perf vendor events amd: Fix broken L2 Cache Hits from L2 HWPF metric
Commit 08ed77e414 ("perf vendor events amd: Add recommended events")
added the hits event "L2 Cache Hits from L2 HWPF" with the same metric
expression as the accesses event "L2 Cache Accesses from L2 HWPF":

$ perf list --details
...
  l2_cache_accesses_from_l2_hwpf
     [L2 Cache Accesses from L2 HWPF]
     [l2_pf_hit_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
  l2_cache_hits_from_l2_hwpf
     [L2 Cache Hits from L2 HWPF]
     [l2_pf_hit_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
...

This was wrong and led to counting hits the same as accesses. Section
2.1.15.2 "Performance Measurement" of "PPR for AMD Family 17h Model 31h
B0 - 55803 Rev 0.54 - Sep 12, 2019", documents the hits event with
EventCode 0x70 which is the same as l2_pf_hit_l2.

Fix this, and massage the description for l2_pf_hit_l2 as the hits event
is now the duplicate of l2_pf_hit_l2. AMD recommends using the recommended
event over other events if the duplicate exists and maintain both for
consistency. Hence, l2_cache_hits_from_l2_hwpf should override
l2_pf_hit_l2.

Before:

 # perf stat -M l2_cache_accesses_from_l2_hwpf,l2_cache_hits_from_l2_hwpf sleep 1

 Performance counter stats for 'sleep 1':

             1,436      l2_pf_miss_l2_l3          # 11114.00 l2_cache_accesses_from_l2_hwpf
                                                  # 11114.00 l2_cache_hits_from_l2_hwpf
             4,482      l2_pf_hit_l2
             5,196      l2_pf_miss_l2_hit_l3

       1.001765339 seconds time elapsed

After:

 # perf stat -M l2_cache_accesses_from_l2_hwpf sleep 1

 Performance counter stats for 'sleep 1':

             1,477      l2_pf_miss_l2_l3          # 10442.00 l2_cache_accesses_from_l2_hwpf
             3,978      l2_pf_hit_l2
             4,987      l2_pf_miss_l2_hit_l3

       1.001491186 seconds time elapsed

 # perf stat -e l2_cache_hits_from_l2_hwpf sleep 1

 Performance counter stats for 'sleep 1':

             3,983      l2_cache_hits_from_l2_hwpf

       1.001329970 seconds time elapsed

Note the difference in performance counter values for the accesses
versus the hits after the fix, and the hits event now counting the same
as l2_pf_hit_l2.

Fixes: 08ed77e414 ("perf vendor events amd: Add recommended events")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> # On a 3900X
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-2-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:38 -03:00
John Garry
0cc177cfc9 perf vendor events arm64: Add Hisi hip08 L3 metrics
Add L3 metrics.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/1617791570-165223-7-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:31 -03:00
John Garry
0383717348 perf vendor events arm64: Add Hisi hip08 L2 metrics
Add L2 metrics.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/1617791570-165223-6-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:25 -03:00
John Garry
c4e1dc4a94 perf vendor events arm64: Add Hisi hip08 L1 metrics
Add L1 metrics. Formula is as consistent as possible with MAN pages
description for these metrics.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/1617791570-165223-5-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:19 -03:00
John Garry
e126bef55f perf pmu: Add pmu_events_map__find() function to find the common PMU map for the system
Add a function to find the common PMU map for the system.

For arm64, a special variant is added. This is because arm64 supports
heterogeneous CPU systems. As such, it cannot be guaranteed that the
cpumap is same for all CPUs. So in case of heterogeneous systems, don't
return a cpumap.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Paul A. Clarke <pc@us.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/1617791570-165223-4-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:14 -03:00
John Garry
a48a995edc perf test: Handle metric reuse in pmu-events parsing test
The pmu-events parsing test does not handle metric reuse at all.

Introduce some simple handling to resolve metrics who reference other
metrics.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Paul A. Clarke <pc@us.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linuxarm@huawei.com
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/1617791570-165223-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:00 -03:00
John Garry
dedb76d359 perf metricgroup: Make find_metric() public with name change
Function find_metric() is required for the metric processing in the
pmu-events testcase, so make it public. Also change the name to include
"metricgroup".

Tested-by: Paul A. Clarke <pc@us.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linuxarm@huawei.com
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/1617791570-165223-2-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:23:39 -03:00
Kees Cook
68ef8735d2 lkdtm: Add REPORT_STACK for checking stack offsets
For validating the stack offset behavior, report the offset from a given
process's first seen stack address. Add s script to calculate the results
to the LKDTM kselftests.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210401232347.2791257-7-keescook@chromium.org
2021-04-08 14:05:20 +02:00
Jordan Niethe
812aa68ef7 selftests/powerpc: Suggest memtrace instead of /dev/mem for ci memory
The suggested alternative for getting cache-inhibited memory with 'mem='
and /dev/mem is pretty hacky. Also, PAPR guests do not allow system
memory to be mapped cache-inhibited so despite /dev/mem being available
this will not work which can cause confusion.  Instead recommend using
the memtrace buffers. memtrace is only available on powernv so there
will not be any chance of trying to do this in a guest.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210225032108.1458352-2-jniethe5@gmail.com
2021-04-08 21:17:44 +10:00
Jordan Niethe
29e3ea8cbd selftests/powerpc: Test for spurious kernel memory faults on radix
Previously when mapping kernel memory on radix, no ptesync was
included which would periodically lead to unhandled spurious faults.
Mapping kernel memory is used when code patching with Strict RWX
enabled. As suggested by Chris Riedl, turning ftrace on and off does a
large amount of code patching so is a convenient way to see this kind
of fault.

Add a selftest to try and trigger this kind of a spurious fault. It
tests for 30 seconds which is usually long enough for the issue to
show up.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Rename it to better reflect what it does, rather than the symptom]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210208032957.1232102-2-jniethe5@gmail.com
2021-04-08 21:17:42 +10:00
Fenghua Yu
e75074781f selftests/resctrl: Change a few printed messages
Change a few printed messages to report test progress more clearly.

Add a missing "\n" at the end of one printed message.

Suggested-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-07 16:37:49 -06:00
Geliang Tang
c3eaa5f667 selftests: mptcp: add the net device name testcase
This patch added a new testcase for setting the net device name. In it,
pass the net device name to pm_nl_ctl to set the ifindex field of struct
mptcp_pm_addr_entry.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:09:40 -07:00
Ian Rogers
92f1e8adf7 perf arm-spe: Avoid potential buffer overrun
SPE extended headers are > 1 byte so ensure the buffer contains at least
this before reading. This issue was detected by fuzzing.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20210407153955.317215-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-07 16:23:20 -03:00
Jin Yao
f2013278ae perf report: Fix wrong LBR block sorting
When '--total-cycles' is specified, it supports sorting for all blocks
by 'Sampled Cycles%'. This is useful to concentrate on the globally
hottest blocks.

'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles

But in current code, it doesn't use the cycles aggregation. Part of
'cycles' counting is possibly dropped for some overlap jumps. But for
identifying the hot block, we always need the full cycles.

  # perf record -b ./triad_loop
  # perf report --total-cycles --stdio

Before:

  #
  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                          [Program Block Range]      Shared Object
  # ...............  ..............  ...........  ..........  .............................................................  .................
  #
              0.81%             793        4.32%         793                           [setup-vdso.h:34 -> setup-vdso.h:40]         ld-2.27.so
              0.49%             480        0.87%         160                    [native_write_msr+0 -> native_write_msr+16]  [kernel.kallsyms]
              0.48%             476        0.52%          95                      [native_read_msr+0 -> native_read_msr+29]  [kernel.kallsyms]
              0.31%             303        1.65%         303                              [nmi_restore+0 -> nmi_restore+37]  [kernel.kallsyms]
              0.26%             255        1.39%         255      [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162]  [kernel.kallsyms]
              0.24%             234        1.28%         234                       [end_repeat_nmi+67 -> end_repeat_nmi+83]  [kernel.kallsyms]
              0.23%             227        1.24%         227            [__irqentry_text_end+96 -> __irqentry_text_end+126]  [kernel.kallsyms]
              0.20%             194        1.06%         194             [native_set_debugreg+52 -> native_set_debugreg+56]  [kernel.kallsyms]
              0.11%             106        0.14%          26                [native_sched_clock+0 -> native_sched_clock+98]  [kernel.kallsyms]
              0.10%              97        0.53%          97            [trigger_load_balance+0 -> trigger_load_balance+67]  [kernel.kallsyms]
              0.09%              85        0.46%          85             [get-dynamic-info.h:102 -> get-dynamic-info.h:111]         ld-2.27.so
  ...
              0.00%           92.7K        0.02%           4                           [triad_loop.c:64 -> triad_loop.c:65]         triad_loop

The hottest block '[triad_loop.c:64 -> triad_loop.c:65]' is not at
the top of output.

After:

  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                           [Program Block Range]      Shared Object
  # ...............  ..............  ...........  ..........  ..............................................................  .................
  #
             94.35%           92.7K        0.02%           4                            [triad_loop.c:64 -> triad_loop.c:65]         triad_loop
              0.81%             793        4.32%         793                            [setup-vdso.h:34 -> setup-vdso.h:40]         ld-2.27.so
              0.49%             480        0.87%         160                     [native_write_msr+0 -> native_write_msr+16]  [kernel.kallsyms]
              0.48%             476        0.52%          95                       [native_read_msr+0 -> native_read_msr+29]  [kernel.kallsyms]
              0.31%             303        1.65%         303                               [nmi_restore+0 -> nmi_restore+37]  [kernel.kallsyms]
              0.26%             255        1.39%         255       [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162]  [kernel.kallsyms]
              0.24%             234        1.28%         234                        [end_repeat_nmi+67 -> end_repeat_nmi+83]  [kernel.kallsyms]
              0.23%             227        1.24%         227             [__irqentry_text_end+96 -> __irqentry_text_end+126]  [kernel.kallsyms]
              0.20%             194        1.06%         194              [native_set_debugreg+52 -> native_set_debugreg+56]  [kernel.kallsyms]
              0.11%             106        0.14%          26                 [native_sched_clock+0 -> native_sched_clock+98]  [kernel.kallsyms]
              0.10%              97        0.53%          97             [trigger_load_balance+0 -> trigger_load_balance+67]  [kernel.kallsyms]
              0.09%              85        0.46%          85              [get-dynamic-info.h:102 -> get-dynamic-info.h:111]         ld-2.27.so
              0.08%              82        0.06%          11  [intel_pmu_drain_pebs_nhm+580 -> intel_pmu_drain_pebs_nhm+627]  [kernel.kallsyms]
              0.08%              77        0.42%          77                  [lru_add_drain_cpu+0 -> lru_add_drain_cpu+133]  [kernel.kallsyms]
              0.08%              74        0.10%          18                [handle_pmi_common+271 -> handle_pmi_common+310]  [kernel.kallsyms]
              0.08%              74        0.40%          74              [get-dynamic-info.h:131 -> get-dynamic-info.h:157]         ld-2.27.so
              0.07%              69        0.09%          17  [intel_pmu_drain_pebs_nhm+432 -> intel_pmu_drain_pebs_nhm+468]  [kernel.kallsyms]

Now the hottest block is reported at the top of output.

Fixes: b65a7d372b ("perf hist: Support block formats with compare/sort/display")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210407024452.29988-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-07 16:18:49 -03:00
Srinivas Pandruvada
61ce18ff01 tools/power/x86/intel-speed-select: v1.9 release
This release adds following changes:
- Support increased number of CPUs
- Return error when mailbox commmand fails to enable core-power
- Option to online all CPUs
- Removes build date and time print

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-07 19:47:22 +02:00
Antonio Terceiro
17de9a5654 tools/power/x86/intel-speed-select: Drop __DATE__ and __TIME__ macros
These macros introduce nondeterminism in builds, and break reproducible
builds.

Signed-off-by: Antonio Terceiro <antonio.terceiro@linaro.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-07 19:47:22 +02:00
Srinivas Pandruvada
0d3dfd7570 tools/power/x86/intel-speed-select: Add options to force online
It is possible that users manually offlined CPUs via sysfs interface
and then started this utility. In this case we will not be able to get
package and die id of the those CPUs. So add an option to force online
if required for some commands.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-07 19:47:22 +02:00
Srinivas Pandruvada
b84733a1c5 tools/power/x86/intel-speed-select: Process mailbox read error for core-power
Some older kernels don't support reading core-power status. In that case
mailbox command fails. So, display core-power status as "unknown" instead
of supported.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-07 19:47:22 +02:00
Srinivas Pandruvada
2e70b710f3 tools/power/x86/intel-speed-select: Increase string size
The current string size to print cpulist can accommodate upto 80
logical CPUs per package. But this limit is not enough. So increase
the string size. Also prevent buffer overflow, if the string size
reaches limit.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-04-07 19:47:22 +02:00
Colin Ian King
e527db8f39 ACPICA: Tree-wide: fix various typos and spelling mistakes
This commit squashes the following:
ACPICA commit bc8939e2d902653e71bb1601b129a993c37fcfad
ACPICA commit 2d9e5e98e23f2a569e5691e6bed183146e25798d
ACPICA commit 937358156631ea7a0eef3569c213c82a031097d5

Fix more spelling issues found using the codespell checker and found
without tools.

Link: https://github.com/acpica/acpica/commit/bc8939e2
Link: https://github.com/acpica/acpica/commit/2d9e5e98
Link: https://github.com/acpica/acpica/commit/93735815

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Erik Kaneda <erik.kaneda@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-07 19:09:00 +02:00
Eric Auger
4cffb2df42 KVM: selftests: vgic_init kvm selftests fixup
Bring some improvements/rationalization over the first version
of the vgic_init selftests:

- ucall_init is moved in run_cpu()
- vcpu_args_set is not called as not needed
- whenever a helper is supposed to succeed, call the non "_" version
- helpers do not return -errno, instead errno is checked by the caller
- vm_gic struct is used whenever possible, as well as vm_gic_destroy
- _kvm_create_device takes an addition fd parameter

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210407135937.533141-1-eric.auger@redhat.com
2021-04-07 15:50:50 +01:00
Wan Jiabing
69baf1a2a4 perf mem-events: Remove unnecessary 'struct mem_info' forward declaration
'struct mem_info' is defined at 22nd line.

The declaration here is unnecessary. Remove it.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kael_w@yeah.net
Link: http://lore.kernel.org/lkml/20210406105104.675879-1-wanjiabing@vivo.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-06 13:32:03 -03:00
Eric Auger
dc0e058eef KVM: selftests: aarch64/vgic-v3 init sequence tests
The tests exercise the VGIC_V3 device creation including the
associated KVM_DEV_ARM_VGIC_GRP_ADDR group attributes:

- KVM_VGIC_V3_ADDR_TYPE_DIST/REDIST
- KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION

Some other tests dedicate to KVM_DEV_ARM_VGIC_GRP_REDIST_REGS group
and especially the GICR_TYPER read. The goal was to test the case
recently fixed by commit 23bde34771
("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace").

The API under test can be found at
Documentation/virt/kvm/devices/arm-vgic-v3.rst

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-10-eric.auger@redhat.com
2021-04-06 14:52:31 +01:00
Mauro Carvalho Chehab
2f3eb922cd doc: update rcu_dereference.rst reference
Changeset b00aedf978 ("doc: Convert to rcu_dereference.txt to rcu_dereference.rst")
renamed: Documentation/RCU/rcu_dereference.txt
to: Documentation/RCU/rcu_dereference.rst.

Update its cross-reference accordingly.

Fixes: b00aedf978 ("doc: Convert to rcu_dereference.txt to rcu_dereference.rst")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-04-06 14:45:04 +02:00
Sean Christopherson
8ca52cc38d x86/sgx: Expose SGX architectural definitions to the kernel
Expose SGX architectural structures, as KVM will use many of the
architectural constants and structs to virtualize SGX.

Name the new header file as asm/sgx.h, rather than asm/sgx_arch.h, to
have single header to provide SGX facilities to share with other kernel
componments. Also update MAINTAINERS to include asm/sgx.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/6bf47acd91ab4d709e66ad1692c7803e4c9063a0.1616136308.git.kai.huang@intel.com
2021-04-06 09:43:41 +02:00
Hengqi Chen
1e1032b0c4 libbpf: Fix KERNEL_VERSION macro
Add missing ')' for KERNEL_VERSION macro.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210405040119.802188-1-hengqi.chen@gmail.com
2021-04-05 07:14:13 -07:00
Adrian Hunter
026334a3bb perf inject: Fix repipe usage
Since commit 14d3d54052 ("perf session: Try to read pipe data from
file") 'perf inject' has started printing "PERFILE2h" when not processing
pipes.

The commit exposed perf to the possiblity that the input is not a pipe
but the 'repipe' parameter gets used. That causes the printing because
perf inject sets 'repipe' to true always.

The 'repipe' parameter of perf_session__new() is used by 2 functions:

	- perf_file_header__read_pipe()
	- trace_report()

In both cases, the functions copy data to STDOUT_FILENO when 'repipe' is
true.

Fix by setting 'repipe' to true only if the output is a pipe.

Fixes: e558a5bd8b ("perf inject: Work with files")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andrew Vagin <avagin@openvz.org>
Link: http://lore.kernel.org/lkml/20210401103605.9000-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-05 10:15:51 -03:00
Greg Kroah-Hartman
e5c7bccc37 Merge 5.12-rc6 into staging-next
We need the staging fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 08:57:44 +02:00
Greg Kroah-Hartman
de800f290d Merge 5.12-rc6 into usb-next
We want the USB fixes in here as well and it resolves a merge issue with
xhci-mtk.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 08:56:10 +02:00
Greg Kroah-Hartman
b20e829390 Merge 5.12-rc6 into driver-core-next
We need the driver core fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 08:51:37 +02:00
Martin KaFai Lau
f73ea1eb4c bpf: selftests: Specify CONFIG_DYNAMIC_FTRACE in the testing config
The tracing test and the recent kfunc call test require
CONFIG_DYNAMIC_FTRACE.  This patch adds it to the config file.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210403002921.3419721-1-kafai@fb.com
2021-04-03 09:09:50 -07:00
Yang Yingliang
f07669df4c libbpf: Remove redundant semi-colon
Remove redundant semi-colon in finalize_btf_ext().

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210402012634.1965453-1-yangyingliang@huawei.com
2021-04-03 01:49:38 +02:00
John Fastabend
007bdc12d4 bpf, selftests: test_maps generating unrecognized data section
With a relatively recent clang master branch test_map skips a section,

 libbpf: elf: skipping unrecognized data section(5) .rodata.str1.1

the cause is some pointless strings from bpf_printks in the BPF program
loaded during testing. After just removing the prints to fix above error
Daniel points out the program is a bit pointless and could be simply the
empty program returning SK_PASS.

Here we do just that and return simply SK_PASS. This program is used with
test_maps selftests to test insert/remove of a program into the sockmap
and sockhash maps. Its not testing actual functionality of the TCP
sockmap programs, these are tested from test_sockmap. So we shouldn't
lose in test coverage and fix above warnings. This original test was
added before test_sockmap existed and has been copied around ever since,
clean it up now.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/161731595664.74613.1603087410166945302.stgit@john-XPS-13-9370
2021-04-03 01:21:08 +02:00
Matthieu Baerts
c2a55e8fd8 selftests: mptcp: dump more info on mpjoin errors
Very occasionally, MPTCP selftests fail. Yeah, I saw that at least once!

Here we provide more details in case of errors with mptcp_join.sh script
like it was done with mptcp_connect.sh, see
commit 767389c8dd ("selftests: mptcp: dump more info on errors")

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02 14:21:51 -07:00
Matthieu Baerts
76e5e27ca9 selftests: mptcp: init nstat history
Not to be impacted by packets sent between sub-tests.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02 14:21:51 -07:00
Matthieu Baerts
5888a61cb4 selftests: mptcp: launch mptcp_connect with timeout
'mptcp_connect' already has a timeout for poll() but in some cases, it
is not enough.

With "timeout" tool, we will force the command to fail if it doesn't
finish on time. Thanks to that, the script will continue and display
details about the current state before marking the test as failed.
Displaying this state is very important to be able to understand the
issue. Best to have our CI reporting the issue than just "the test
hanged".

Note that in mptcp_connect.sh, we were using a long timeout to validate
the fact we cannot create a socket if a sysctl is set. We don't need
this timeout.

In diag.sh, we want to send signals to mptcp_connect instances that have
been started in the netns. But we cannot send this signal to 'timeout'
otherwise that will stop the timeout and messages telling us SIGUSR1 has
been received will be printed. Instead of trying to find the right PID
and storing them in an array, we can simply use the output of
'ip netns pids' which is all the PIDs we want to send signal to.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/160
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02 14:21:50 -07:00
Daniel Latypov
9854781dba kunit: tool: make --kunitconfig accept dirs, add lib/kunit fragment
TL;DR
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit

Per suggestion from Ted [1], we can reduce the amount of typing by
assuming a convention that these files are named '.kunitconfig'.

In the case of [1], we now have
$ ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4

Also add in such a fragment for kunit itself so we can give that as an
example more close to home (and thus less likely to be accidentally
broken).

[1] https://lore.kernel.org/linux-ext4/YCNF4yP1dB97zzwD@mit.edu/

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 14:14:36 -06:00
Fenghua Yu
4808bf209e selftests/resctrl: Create .gitignore to include resctrl_tests
Create .gitignore to hold the test file resctrl_tests generated after
compiling.

Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:58:42 -06:00
Fenghua Yu
1205b688c9 selftests/resctrl: Fix checking for < 0 for unsigned values
Dan reported following static checker warnings

tools/testing/selftests/resctrl/resctrl_val.c:545 measure_vals()
warn: 'bw_imc' unsigned <= 0

tools/testing/selftests/resctrl/resctrl_val.c:549 measure_vals()
warn: 'bw_resc_end' unsigned <= 0

These warnings are reported because
1. measure_vals() declares 'bw_imc' and 'bw_resc_end' as unsigned long
   variables
2. Return value of get_mem_bw_imc() and get_mem_bw_resctrl() are assigned
   to 'bw_imc' and 'bw_resc_end' respectively
3. The returned values are checked for <= 0 to see if the calls failed

Checking for < 0 for an unsigned value doesn't make any sense.

Fix this issue by changing the implementation of get_mem_bw_imc() and
get_mem_bw_resctrl() such that they now accept reference to a variable
and set the variable appropriately upon success and return 0, else return
< 0 on error.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:58:37 -06:00
Fenghua Yu
d81343b5ee selftests/resctrl: Fix incorrect parsing of iMC counters
iMC (Integrated Memory Controller) counters are usually at
"/sys/bus/event_source/devices/" and are named as "uncore_imc_<n>".
num_of_imcs() function tries to count number of such iMC counters so that
it could appropriately initialize required number of perf_attr structures
that could be used to read these iMC counters.

num_of_imcs() function assumes that all the directories under this path
that start with "uncore_imc" are iMC counters. But, on some systems there
could be directories named as "uncore_imc_free_running" which aren't iMC
counters. Trying to read from such directories will result in "not found
file" errors and MBM/MBA tests will fail.

Hence, fix the logic in num_of_imcs() such that it looks at the first
character after "uncore_imc_" to check if it's a numerical digit or not. If
it's a digit then the directory represents an iMC counter, else, skip the
directory.

Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:58:31 -06:00
Fenghua Yu
4e5cb354c8 selftests/resctrl: Fix unmount resctrl FS
umount_resctrlfs() directly attempts to unmount resctrl file system without
checking if resctrl FS is already mounted or not. It returns 0 on success
and on failure it prints an error message and returns an error status.
Calling umount_resctrlfs() when resctrl FS isn't mounted will return an
error status.

There could be situations where-in the caller might not know if resctrl
FS is already mounted or not and the caller might still want to unmount
resctrl FS if it's already mounted (For example during teardown).

To support above use cases, change umount_resctrlfs() such that it now
first checks if resctrl FS is already mounted or not and unmounts resctrl
FS only if it's already mounted.

unmount resctrl FS upon exit. For example, running only mba test on a
Broadwell (BDW) machine (MBA isn't supported on BDW CPU).

This happens because validate_resctrl_feature_request() would mount resctrl
FS to check if mba is enabled on the platform or not and finds that the H/W
doesn't support mba and hence will return false to run_mba_test(). This in
turn makes the main() function return without unmounting resctrl FS.

Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:58:26 -06:00
Fenghua Yu
f1dd71982d selftests/resctrl: Skip the test if requested resctrl feature is not supported
There could be two reasons why a resctrl feature might not be enabled on
the platform
1. H/W might not support the feature
2. Even if the H/W supports it, the user might have disabled the feature
   through kernel command line arguments

Hence, any resctrl unit test (like cmt, cat, mbm and mba) before starting
the test will first check if the feature is enabled on the platform or not.
If the feature isn't enabled, then the test returns with an error status.
For example, if MBA isn't supported on a platform and if the user tries to
run MBA, the output will look like this

ok mounting resctrl to "/sys/fs/resctrl"
not ok MBA: schemata change

But, not supporting a feature isn't a test failure. So, instead of treating
it as an error, use the SKIP directive of the TAP protocol. With the
change, the output will look as below

ok MBA # SKIP Hardware does not support MBA or MBA is disabled

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:58:20 -06:00
Fenghua Yu
c9fb4e7cee selftests/resctrl: Modularize resctrl test suite main() function
Resctrl test suite main() function does the following things
1. Parses command line arguments passed by user
2. Some setup checks
3. Logic that calls into each unit test
4. Print result and clean up after running each unit test

Introduce wrapper functions for steps 3 and 4 to modularize the main()
function. Adding these wrapper functions makes it easier to add any logic
to each individual test.

Please note that this is a preparatory patch for the next one and no
functional changes are intended.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:58:14 -06:00
Fenghua Yu
09a6793462 selftests/resctrl: Don't hard code value of "no_of_bits" variable
Cache related tests (like CAT and CMT) depend on a variable called
no_of_bits to run. no_of_bits defines the number of contiguous bits
that should be set in the CBM mask and a user can pass a value for
no_of_bits using -n command line argument. If a user hasn't passed any
value, it defaults to 5 (randomly chosen value).

Hard coding no_of_bits to 5 will make the cache tests fail to run on
systems that support maximum cbm mask that is less than or equal to 5 bits.
Hence, don't hard code no_of_bits value.

If a user passes a value for "no_of_bits" using -n option, use it.
Otherwise, no_of_bits is equal to half of the maximum number of bits in
the cbm mask.

Please note that CMT test is still hard coded to 5 bits. It will change in
subsequent patches that change CMT test.

Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:58:08 -06:00
Fenghua Yu
06bd03a57f selftests/resctrl: Fix MBA/MBM results reporting format
MBM unit test starts fill_buf (default built-in benchmark) in a new con_mon
group (c1, m1) and records resctrl reported mbm values and iMC (Integrated
Memory Controller) values every second. It does this for five seconds
(randomly chosen value) in total. It then calculates average of resctrl_mbm
values and imc_mbm values and if the difference is greater than 300 MB/sec
(randomly chosen value), the test treats it as a failure. MBA unit test is
similar to MBM but after every run it changes schemata.

Checking for a difference of 300 MB/sec doesn't look very meaningful when
the mbm values are changing over a wide range. For example, below are the
values running MBA test on SKL with different allocations

1. With 10% as schemata both iMC and resctrl mbm_values are around 2000
   MB/sec
2. With 100% as schemata both iMC and resctrl mbm_values are around 10000
   MB/sec

A 300 MB/sec difference between resctrl_mbm and imc_mbm values is
acceptable at 100% schemata but it isn't acceptable at 10% schemata because
that's a huge difference.

So, fix this by checking for percentage difference instead of absolute
difference i.e. check if the difference between resctrl_mbm value and
imc_mbm value is within 5% (randomly chosen value) of imc_mbm value. If the
difference is greater than 5% of imc_mbm value, treat it is a failure.

Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:58:02 -06:00
Fenghua Yu
ee0415681e selftests/resctrl: Use resctrl/info for feature detection
Resctrl test suite before running any unit test (like cmt, cat, mbm and
mba) should first check if the feature is enabled (by kernel and not just
supported by H/W) on the platform or not.
validate_resctrl_feature_request() is supposed to do that. This function
intends to grep for relevant flags in /proc/cpuinfo but there are several
issues here

1. validate_resctrl_feature_request() calls fgrep() to get flags from
   /proc/cpuinfo. But, fgrep() can only return a string with maximum of 255
   characters and hence the complete cpu flags are never returned.
2. The substring search logic is also busted. If strstr() finds requested
   resctrl feature in the cpu flags, it returns pointer to the first
   occurrence. But, the logic negates the return value of strstr() and
   hence validate_resctrl_feature_request() returns false if the feature is
   present in the cpu flags and returns true if the feature is not present.
3. validate_resctrl_feature_request() checks if a resctrl feature is
   reported in /proc/cpuinfo flags or not. Having a cpu flag means that the
   H/W supports the feature, but it doesn't mean that the kernel enabled
   it. A user could selectively enable only a subset of resctrl features
   using kernel command line arguments. Hence, /proc/cpuinfo isn't a
   reliable source to check if a feature is enabled or not.

The 3rd issue being the major one and fixing it requires changing the way
validate_resctrl_feature_request() works. Since, /proc/cpuinfo isn't the
right place to check if a resctrl feature is enabled or not, a more
appropriate place is /sys/fs/resctrl/info directory. Change
validate_resctrl_feature_request() such that,

1. For cat, check if /sys/fs/resctrl/info/L3 directory is present or not
2. For mba, check if /sys/fs/resctrl/info/MB directory is present or not
3. For cmt, check if /sys/fs/resctrl/info/L3_MON directory is present and
   check if /sys/fs/resctrl/info/L3_MON/mon_features has llc_occupancy
4. For mbm, check if /sys/fs/resctrl/info/L3_MON directory is present and
   check if /sys/fs/resctrl/info/L3_MON/mon_features has
   mbm_<total/local>_bytes

Please note that only L3_CAT, L3_CMT, MBA and MBM are supported. CDP and L2
variants can be added later.

Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:57:57 -06:00
Fenghua Yu
a3611fbc6e selftests/resctrl: Check for resctrl mount point only if resctrl FS is supported
check_resctrlfs_support() does the following
1. Checks if the platform supports resctrl file system or not by looking
   for resctrl in /proc/filesystems
2. Calls opendir() on default resctrl file system path
   (i.e. /sys/fs/resctrl)
3. Checks if resctrl file system is mounted or not by looking at
   /proc/mounts

Steps 2 and 3 will fail if the platform does not support resctrl file
system. So, there is no need to check for them if step 1 fails.

Fix this by returning immediately if the platform does not support
resctrl file system.

Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:57:49 -06:00
Fenghua Yu
b67a7665a9 selftests/resctrl: Add config dependencies
Add the config file for test dependencies.

Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:57:42 -06:00
Reinette Chatre
f29838e6fa selftests/resctrl: Fix a printed message
Add a missing newline to the printed help text to improve readability.

Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:54:20 -06:00
Fenghua Yu
03216ed7bb selftests/resctrl: Share show_cache_info() by CAT and CMT tests
show_cache_info() functions are defined separately in CAT and CMT
tests. But the functions are same for the tests and unnecessary
to be defined separately. Share the function by the tests.

Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:54:14 -06:00
Fenghua Yu
ca2f4214f9 selftests/resctrl: Call kselftest APIs to log test results
Call kselftest APIs instead of using printf() to log test results
for cleaner code and better future extension.

Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:54:08 -06:00
Fenghua Yu
2f320911d9 selftests/resctrl: Rename CQM test as CMT test
CMT (Cache Monitoring Technology) [1] is a H/W feature that reports cache
occupancy of a process. resctrl selftest suite has a unit test to test CMT
for LLC but the test is named as CQM (Cache Quality Monitoring).
Furthermore, the unit test source file is named as cqm_test.c and several
functions, variables, comments, preprocessors and statements widely use
"cqm" as either suffix or prefix. This rampant misusage of CQM for CMT
might confuse someone who is newly looking at resctrl selftests because
this feature is named CMT in the Intel Software Developer's Manual.

Hence, rename all the occurrences (unit test source file name, functions,
variables, comments and preprocessors) of cqm with cmt.

[1] Please see Intel SDM, Volume 3, chapter 17 and section 18 for more
    information on CMT: https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:53:54 -06:00
Fenghua Yu
d7af3d0d51 selftests/resctrl: Fix missing options "-n" and "-p"
resctrl test suite accepts command line arguments (like -b, -t, -n and -p)
as documented in the help. But passing -n and -p throws an invalid option
error. This happens because -n and -p are missing in the list of
characters that getopt() recognizes as valid arguments. Hence, they are
treated as invalid options.

Fix this by adding them to the list of characters that getopt() recognizes
as valid arguments. Please note that the main() function already has the
logic to deal with the values passed as part of these arguments and hence
no changes are needed there.

Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:53:02 -06:00
Reinette Chatre
f5f16ae4fa selftests/resctrl: Ensure sibling CPU is not same as original CPU
The resctrl tests can accept a CPU on which the tests are run and use
default of CPU #1 if it is not provided. In the CAT test a "sibling CPU"
is determined that is from the same package where another thread will be
run.

The current algorithm with which a "sibling CPU" is determined does not
take the provided/default CPU into account and when that CPU is the
first CPU in a package then the "sibling CPU" will be selected to be the
same CPU since it starts by picking the first CPU from core_siblings_list.

Fix the "sibling CPU" selection by taking the provided/default CPU into
account and ensuring a sibling that is a different CPU is selected.

Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:51:43 -06:00
Fenghua Yu
2428673638 selftests/resctrl: Clean up resctrl features check
Checking resctrl features call strcmp() to compare feature strings
(e.g. "mba", "cat" etc). The checkings are error prone and don't have
good coding style. Define the constant strings in macros and call
strncmp() to solve the potential issues.

Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:51:36 -06:00
Fenghua Yu
896016d2ad selftests/resctrl: Fix compilation issues for other global variables
Reinette reported following compilation issue on Fedora 32, gcc version
10.1.1

/usr/bin/ld: resctrl_tests.o:<src_dir>/resctrl.h:65: multiple definition
of `bm_pid'; cache.o:<src_dir>/resctrl.h:65: first defined here

Other variables are ppid, tests_run, llc_occup_path, is_amd. Compiler
isn't happy because these variables are defined globally in two .c files
but are not declared as extern.

To fix issues for the global variables, declare them as extern.

Chang Log:
- Split this patch from v4's patch 1 (Shuah).

Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:51:30 -06:00
Fenghua Yu
8236c51d85 selftests/resctrl: Fix compilation issues for global variables
Reinette reported following compilation issue on Fedora 32, gcc version
10.1.1

/usr/bin/ld: cqm_test.o:<src_dir>/cqm_test.c:22: multiple definition of
`cache_size'; cat_test.o:<src_dir>/cat_test.c:23: first defined here

The same issue is reported for long_mask, cbm_mask, count_of_bits etc
variables as well. Compiler isn't happy because these variables are
defined globally in two .c files namely cqm_test.c and cat_test.c and
the compiler during compilation finds that the variable is already
defined (multiple definition error).

Taking a closer look at the usage of these variables reveals that these
variables are used only locally in functions such as cqm_resctrl_val()
(defined in cqm_test.c) and cat_perf_miss_val() (defined in cat_test.c).
These variables are not shared between those functions. So, there is no
need for these variables to be global. Hence, fix this issue by making
them static variables.

Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:51:24 -06:00
Fenghua Yu
a9d26a302d selftests/resctrl: Enable gcc checks to detect buffer overflows
David reported a buffer overflow error in the check_results() function of
the cmt unit test and he suggested enabling _FORTIFY_SOURCE gcc compiler
option to automatically detect any such errors.

Feature Test Macros man page describes_FORTIFY_SOURCE as below

"Defining this macro causes some lightweight checks to be performed to
detect some buffer overflow errors when employing various string and memory
manipulation functions (for example, memcpy, memset, stpcpy, strcpy,
strncpy, strcat, strncat, sprintf, snprintf, vsprintf, vsnprintf, gets, and
wide character variants thereof). For some functions, argument consistency
is checked; for example, a check is made that open has been supplied with a
mode argument when the specified flags include O_CREAT. Not all problems
are detected, just some common cases.

If _FORTIFY_SOURCE is set to 1, with compiler optimization level 1 (gcc
-O1) and above, checks that shouldn't change the behavior of conforming
programs are performed.

With _FORTIFY_SOURCE set to 2, some more checking is added, but some
conforming programs might fail.

Some of the checks can be performed at compile time (via macros logic
implemented in header files), and result in compiler warnings; other checks
take place at run time, and result in a run-time error if the check fails.

Use of this macro requires compiler support, available with gcc since
version 4.0."

Fix the buffer overflow error in the check_results() function of the cmt
unit test and enable _FORTIFY_SOURCE gcc check to catch any future buffer
overflow errors.

Reported-by: David Binderman <dcb314@hotmail.com>
Suggested-by: David Binderman <dcb314@hotmail.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-04-02 13:51:16 -06:00
David S. Miller
c2bcb4cf02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-04-01

The following pull-request contains BPF updates for your *net-next* tree.

We've added 68 non-merge commits during the last 7 day(s) which contain
a total of 70 files changed, 2944 insertions(+), 1139 deletions(-).

The main changes are:

1) UDP support for sockmap, from Cong.

2) Verifier merge conflict resolution fix, from Daniel.

3) xsk selftests enhancements, from Maciej.

4) Unstable helpers aka kernel func calling, from Martin.

5) Batches ops for LPM map, from Pedro.

6) Fix race in bpf_get_local_storage, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02 11:03:07 -07:00
David S. Miller
29684d802e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2021-04-01

The following pull-request contains BPF updates for your *net* tree.

We've added 11 non-merge commits during the last 8 day(s) which contain
a total of 10 files changed, 151 insertions(+), 26 deletions(-).

The main changes are:

1) xsk creation fixes, from Ciara.

2) bpf_get_task_stack fix, from Dave.

3) trampoline in modules fix, from Jiri.

4) bpf_obj_get fix for links and progs, from Lorenz.

5) struct_ops progs must be gpl compatible fix, from Toke.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02 11:00:46 -07:00
Xiang Chen
ca947482b0 dma-mapping: benchmark: Add support for multi-pages map/unmap
Currently it only support one page map/unmap once a time for dma-map
benchmark, but there are some other scenaries which need to support for
multi-page map/unmap: for those multi-pages interfaces such as
dma_alloc_coherent() and dma_map_sg(), the time spent on multi-pages
map/unmap is not the time of a single page * npages (not linear) as it
may use block description instead of page description when it is satified
with the size such as 2M/1G, and also it can send a single TLB invalidation
command to invalidate multi-pages instead of multi-times when RIL is
enabled (which will short the time of unmap). So it is necessary to add
support for multi-pages map/unmap.

Add a parameter "-g" to support multi-pages map/unmap.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Acked-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 16:41:08 +02:00
Hao Fang
42e4eefb08 dma-mapping: benchmark: use the correct HiSilicon copyright
s/Hisilicon/HiSilicon/g.
It should use capital S, according to
https://www.hisilicon.com/en/terms-of-use.

Signed-off-by: Hao Fang <fanghao11@huawei.com>
Acked-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 16:41:08 +02:00
Wan Jiabing
fd6103cb67 perf evsel: Remove duplicate 'struct target' forward declaration
'struct target' is declared twice. One has been declared at 21st line.
Remove the duplicate.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kael_w@yeah.net
Link: http://lore.kernel.org/lkml/20210401062424.991737-1-wanjiabing@vivo.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-02 10:44:32 -03:00
Peter Zijlstra
9bc0bb5072 objtool/x86: Rewrite retpoline thunk calls
When the compiler emits: "CALL __x86_indirect_thunk_\reg" for an
indirect call, have objtool rewrite it to:

	ALTERNATIVE "call __x86_indirect_thunk_\reg",
		    "call *%reg", ALT_NOT(X86_FEATURE_RETPOLINE)

Additionally, in order to not emit endless identical
.altinst_replacement chunks, use a global symbol for them, see
__x86_indirect_alt_*.

This also avoids objtool from having to do code generation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151300.320177914@infradead.org
2021-04-02 12:47:28 +02:00
Peter Zijlstra
50e7b4a1a1 objtool: Skip magical retpoline .altinstr_replacement
When the .altinstr_replacement is a retpoline, skip the alternative.
We already special case retpolines anyway.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151300.259429287@infradead.org
2021-04-02 12:46:57 +02:00
Peter Zijlstra
7bd2a600f3 objtool: Cache instruction relocs
Track the reloc of instructions in the new instruction->reloc field
to avoid having to look them up again later.

( Technically x86 instructions can have two relocations, but not jumps
  and calls, for which we're using this. )

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151300.195441549@infradead.org
2021-04-02 12:46:15 +02:00
Peter Zijlstra
43d5430ad7 objtool: Keep track of retpoline call sites
Provide infrastructure for architectures to rewrite/augment compiler
generated retpoline calls. Similar to what we do for static_call()s,
keep track of the instructions that are retpoline calls.

Use the same list_head, since a retpoline call cannot also be a
static_call.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151300.130805730@infradead.org
2021-04-02 12:45:27 +02:00
Peter Zijlstra
2f2f7e47f0 objtool: Add elf_create_undef_symbol()
Allow objtool to create undefined symbols; this allows creating
relocations to symbols not currently in the symbol table.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151300.064743095@infradead.org
2021-04-02 12:45:05 +02:00
Peter Zijlstra
9a7827b778 objtool: Extract elf_symbol_add()
Create a common helper to add symbols.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151300.003468981@infradead.org
2021-04-02 12:45:01 +02:00
Peter Zijlstra
417a4dc91e objtool: Extract elf_strtab_concat()
Create a common helper to append strings to a strtab.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151259.941474004@infradead.org
2021-04-02 12:44:56 +02:00
Peter Zijlstra
d0c5c4cc73 objtool: Create reloc sections implicitly
Have elf_add_reloc() create the relocation section implicitly.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151259.880174448@infradead.org
2021-04-02 12:44:37 +02:00
Peter Zijlstra
ef47cc01cb objtool: Add elf_create_reloc() helper
We have 4 instances of adding a relocation. Create a common helper
to avoid growing even more.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151259.817438847@infradead.org
2021-04-02 12:44:18 +02:00
Peter Zijlstra
3a647607b5 objtool: Rework the elf_rebuild_reloc_section() logic
Instead of manually calling elf_rebuild_reloc_section() on sections
we've called elf_add_reloc() on, have elf_write() DTRT.

This makes it easier to add random relocations in places without
carefully tracking when we're done and need to flush what section.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151259.754213408@infradead.org
2021-04-02 12:43:32 +02:00
Peter Zijlstra
a958c4fea7 objtool: Fix static_call list generation
Currently, objtool generates tail call entries in add_jump_destination()
but waits until validate_branch() to generate the regular call entries.
Move these to add_call_destination() for consistency.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151259.691529901@infradead.org
2021-04-02 12:43:19 +02:00
Peter Zijlstra
530b4ddd9d objtool: Handle per arch retpoline naming
The __x86_indirect_ naming is obviously not generic. Shorten to allow
matching some additional magic names later.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151259.630296706@infradead.org
2021-04-02 12:43:02 +02:00
Peter Zijlstra
bcb1b6ff39 objtool: Correctly handle retpoline thunk calls
Just like JMP handling, convert a direct CALL to a retpoline thunk
into a retpoline safe indirect CALL.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151259.567568238@infradead.org
2021-04-02 12:42:54 +02:00
Peter Zijlstra
119251855f x86/retpoline: Simplify retpolines
Due to:

  c9c324dc22 ("objtool: Support stack layout changes in alternatives")

it is now possible to simplify the retpolines.

Currently our retpolines consist of 2 symbols:

 - __x86_indirect_thunk_\reg: the compiler target
 - __x86_retpoline_\reg:  the actual retpoline.

Both are consecutive in code and aligned such that for any one register
they both live in the same cacheline:

  0000000000000000 <__x86_indirect_thunk_rax>:
   0:   ff e0                   jmpq   *%rax
   2:   90                      nop
   3:   90                      nop
   4:   90                      nop

  0000000000000005 <__x86_retpoline_rax>:
   5:   e8 07 00 00 00          callq  11 <__x86_retpoline_rax+0xc>
   a:   f3 90                   pause
   c:   0f ae e8                lfence
   f:   eb f9                   jmp    a <__x86_retpoline_rax+0x5>
  11:   48 89 04 24             mov    %rax,(%rsp)
  15:   c3                      retq
  16:   66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)

The thunk is an alternative_2, where one option is a JMP to the
retpoline. This was done so that objtool didn't need to deal with
alternatives with stack ops. But that problem has been solved, so now
it is possible to fold the entire retpoline into the alternative to
simplify and consolidate unused bytes:

  0000000000000000 <__x86_indirect_thunk_rax>:
   0:   ff e0                   jmpq   *%rax
   2:   90                      nop
   3:   90                      nop
   4:   90                      nop
   5:   90                      nop
   6:   90                      nop
   7:   90                      nop
   8:   90                      nop
   9:   90                      nop
   a:   90                      nop
   b:   90                      nop
   c:   90                      nop
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop
  10:   90                      nop
  11:   66 66 2e 0f 1f 84 00 00 00 00 00        data16 nopw %cs:0x0(%rax,%rax,1)
  1c:   0f 1f 40 00             nopl   0x0(%rax)

Notice that since the longest alternative sequence is now:

   0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
   5:   f3 90                   pause
   7:   0f ae e8                lfence
   a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
   c:   48 89 04 24             mov    %rax,(%rsp)
  10:   c3                      retq

17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW, if
we can shrink the retpoline by 1 byte we can pack it more densely).

 [ bp: Massage commit message. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210326151259.506071949@infradead.org
2021-04-02 12:42:04 +02:00
Peter Zijlstra
23c1ad538f x86/alternatives: Optimize optimize_nops()
Currently, optimize_nops() scans to see if the alternative starts with
NOPs. However, the emit pattern is:

  141:	\oldinstr
  142:	.skip (len-(142b-141b)), 0x90

That is, when 'oldinstr' is short, the tail is padded with NOPs. This case
never gets optimized.

Rewrite optimize_nops() to replace any trailing string of NOPs inside
the alternative to larger NOPs. Also run it irrespective of patching,
replacing NOPs in both the original and replaced code.

A direct consequence is that 'padlen' becomes superfluous, so remove it.

 [ bp:
   - Adjust commit message
   - remove a stale comment about needing to pad
   - add a comment in optimize_nops()
   - exit early if the NOP verif. loop catches a mismatch - function
     should not not add NOPs in that case
   - fix the "optimized NOPs" offsets output ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210326151259.442992235@infradead.org
2021-04-02 12:41:17 +02:00
Ingo Molnar
b1f480bc06 Merge branch 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve a semantic conflict
Conflict-merge this main commit in essence:

  a89dfde3dc: ("x86: Remove dynamic NOP selection")

With this upstream commit:

  b908297047: ("bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG")

Semantic merge conflict:

  arch/x86/net/bpf_jit_comp.c

  - memcpy(prog, ideal_nops[NOP_ATOMIC5], X86_PATCH_SIZE);
  + memcpy(prog, x86_nops[5], X86_PATCH_SIZE);

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-04-02 12:36:30 +02:00
Ingo Molnar
e855e80d00 Linux 5.12-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFRBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBhB7AeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGCPUH+KKkSoOlN2YNu1oc
 iy2nznwZoSQTk5ZLz7PypO/WWmmtgzudkObG7yqIURdrncsAkHR17Wu2P7rdBr1j
 Ma+VhF9MQ+xx+r86upH7c3gYfhyfdUMvzuLy0rwLQ1Yrzrb7xFcVkj3BHk54TAQA
 w05sRPuVJ3/c/HPYV2iXkkdnnMbXSTCebeDDwjFb9D3qagr4vcd/PjDHmGbfNF8R
 o6gLpbK5Ly6ww1nth9gGGUjzrW95yVItvcroP6vQWljxhuy+NE1lXRm8LsGhxqtW
 foFFptJup5nhSNJXWtQt/U3huVD6mZ3W3y9cOThPjXZRy2wva3I1IpBKoEFReUpG
 /Tq8EA==
 =tPUY
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc5' into WIP.x86/core, to pick up recent NOP related changes

In particular we want to have this upstream commit:

  b908297047: ("bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG")

... before merging in x86/cpu changes and the removal of the NOP optimizations, and
applying PeterZ's !retpoline objtool series.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-04-02 12:33:16 +02:00
Ciara Loftus
ca7a83e248 libbpf: Only create rx and tx XDP rings when necessary
Prior to this commit xsk_socket__create(_shared) always attempted to create
the rx and tx rings for the socket. However this causes an issue when the
socket being setup is that which shares the fd with the UMEM. If a
previous call to this function failed with this socket after the rings were
set up, a subsequent call would always fail because the rings are not torn
down after the first call and when we try to set them up again we encounter
an error because they already exist. Solve this by remembering whether the
rings were set up by introducing new bools to struct xsk_umem which
represent the ring setup status and using them to determine whether or
not to set up the rings.

Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331061218.1647-4-ciara.loftus@intel.com
2021-04-01 14:45:43 -07:00
Ciara Loftus
43f1bc1eff libbpf: Restore umem state after socket create failure
If the call to xsk_socket__create fails, the user may want to retry the
socket creation using the same umem. Ensure that the umem is in the
same state on exit if the call fails by:
1. ensuring the umem _save pointers are unmodified.
2. not unmapping the set of umem rings that were set up with the umem
during xsk_umem__create, since those maps existed before the call to
xsk_socket__create and should remain in tact even in the event of
failure.

Fixes: 2f6324a393 ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331061218.1647-3-ciara.loftus@intel.com
2021-04-01 14:45:43 -07:00
Ciara Loftus
df66201631 libbpf: Ensure umem pointer is non-NULL before dereferencing
Calls to xsk_socket__create dereference the umem to access the
fill_save and comp_save pointers. Make sure the umem is non-NULL
before doing this.

Fixes: 2f6324a393 ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20210331061218.1647-2-ciara.loftus@intel.com
2021-04-01 14:45:43 -07:00
Linus Torvalds
6905b1dc3c x86:
* Fixes for missing TLB flushes with TDP MMU
 
 * Fixes for race conditions in nested SVM
 
 * Fixes for lockdep splat with Xen emulation
 
 * Fix for kvmclock underflow
 
 * Fix srcdir != builddir builds
 
 * Other small cleanups
 
 ARM:
 * Fix GICv3 MMIO compatibility probing
 
 * Prevent guests from using the ARMv8.4 self-hosted tracing extension
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmBlum4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM5sgf9HmO3FOAhMZg6byK8lVBd5M+voNnx
 0oC2EWhcT4uuEJ6MZN8CYGorHBtiMFGya5+USCINM9Te2u92jgBhqVaOsc3SRVfE
 GPDbwcaSM2LP8T1Ao2ilaMSbcBEbphBrLbiBw2bToIuqDnFXUwL6psdBHyKKYRv+
 LbtjfrapdB8lyll9BOhF4Iq0l74jcJEAkD/y7FlMCEgDLFCVpfbkA1HcdV/1oXsJ
 +d6WKlAH9643V8HrMoX7jiXamnJVafkX2Q75Lay6xkkHtdB5wnbRFzfJGXELv9qi
 6eJ7Oh5oNmrSUIrtdFkeGMdZZoJJgE9GwCXpeXM49VeqTUKkUEx9v9GAsg==
 =5B67
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "It's a bit larger than I (and probably you) would like by the time we
  get to -rc6, but perhaps not entirely unexpected since the changes in
  the last merge window were larger than usual.

  x86:
   - Fixes for missing TLB flushes with TDP MMU

   - Fixes for race conditions in nested SVM

   - Fixes for lockdep splat with Xen emulation

   - Fix for kvmclock underflow

   - Fix srcdir != builddir builds

   - Other small cleanups

  ARM:
   - Fix GICv3 MMIO compatibility probing

   - Prevent guests from using the ARMv8.4 self-hosted tracing
     extension"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)
  KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update()
  KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken
  KVM: x86: reduce pvclock_gtod_sync_lock critical sections
  KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
  KVM: SVM: load control fields from VMCB12 before checking them
  KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
  KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
  KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
  KVM: make: Fix out-of-source module builds
  selftests: kvm: make hardware_disable_test less verbose
  KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE
  KVM: x86: remove unused declaration of kvm_write_tsc()
  KVM: clean up the unused argument
  tools/kvm_stat: Add restart delay
  KVM: arm64: Fix CPU interface MMIO compatibility detection
  KVM: arm64: Disable guest access to trace filter controls
  KVM: arm64: Hide system instruction access to Trace registers
2021-04-01 12:42:55 -07:00
Cong Wang
8d7cb74f2c selftests/bpf: Add a test case for loading BPF_SK_SKB_VERDICT
This adds a test case to ensure BPF_SK_SKB_VERDICT and
BPF_SK_STREAM_VERDICT will never be attached at the same time.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331023237.41094-17-xiyou.wangcong@gmail.com
2021-04-01 10:56:14 -07:00
Cong Wang
d6378af615 selftests/bpf: Add a test case for udp sockmap
Add a test case to ensure redirection between two UDP sockets work.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331023237.41094-16-xiyou.wangcong@gmail.com
2021-04-01 10:56:14 -07:00
Cong Wang
a7ba4558e6 sock_map: Introduce BPF_SK_SKB_VERDICT
Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is
confusing and more importantly we still want to distinguish them
from user-space. So we can just reuse the stream verdict code but
introduce a new type of eBPF program, skb_verdict. Users are not
allowed to attach stream_verdict and skb_verdict programs to the
same map.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-10-xiyou.wangcong@gmail.com
2021-04-01 10:56:14 -07:00
Matthew Wilcox (Oracle)
2c7e57a027 idr test suite: Improve reporting from idr_find_test_1
Instead of just reporting an assertion failure, report enough information
that we can start diagnosing exactly went wrong.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01 07:50:42 -04:00
Matthew Wilcox (Oracle)
094ffbd1d8 idr test suite: Create anchor before launching throbber
The throbber could race with creation of the anchor entry and cause the
IDR to have zero entries in it, which would cause the test to fail.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01 07:50:19 -04:00
Matthew Wilcox (Oracle)
703586410d idr test suite: Take RCU read lock in idr_find_test_1
When run on a single CPU, this test would frequently access already-freed
memory.  Due to timing, this bug never showed up on multi-CPU tests.

Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01 07:44:48 -04:00
Matthew Wilcox (Oracle)
1bb4bd266c radix tree test suite: Register the main thread with the RCU library
Several test runners register individual worker threads with the
RCU library, but neglect to register the main thread, which can lead
to objects being freed while the main thread is in what appears to be
an RCU critical section.

Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-04-01 07:41:30 -04:00
Vitaly Kuznetsov
55626ca9c6 selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)
Add a test for the issue when KVM_SET_CLOCK(0) call could cause
TSC page value to go very big because of a signedness issue around
hv_clock->system_time.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210326155551.17446-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01 05:14:19 -04:00
Carlos Llamas
040806343b selftests/net: so_txtime multi-host support
SO_TXTIME hardware offload requires testing across devices, either
between machines or separate network namespaces.

Split up SO_TXTIME test into tx and rx modes, so traffic can be
sent from one process to another. Create a veth-pair on different
namespaces and bind each process to an end point via [-S]ource and
[-D]estination parameters. Optional start [-t]ime parameter can be
passed to synchronize the test across the hosts (with synchorinzed
clocks).

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31 17:48:21 -07:00
Jakub Kicinski
1da07e5db3 selftests: ethtool: add a netdevsim FEC test
Test FEC settings, iterate over configs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31 14:15:23 -07:00
Stanislav Fomichev
e27bfefb21 tools/resolve_btfids: Fix warnings
* make eprintf static, used only in main.c
* initialize ret in eprintf
* remove unused *tmp

v3:
* remove another err (Song Liu)

v2:
* remove unused 'int err = -1'

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210329223143.3659983-1-sdf@google.com
2021-03-31 12:09:12 -07:00
Borislav Petkov
f2ac256b9a Merge 'x86/alternatives'
Pick up dependent changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-31 18:04:19 +02:00
Peter Zijlstra
52fa82c21f x86: Add insn_decode_kernel()
Add a helper to decode kernel instructions; there's no point in
endlessly repeating those last two arguments.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210326151259.379242587@infradead.org
2021-03-31 16:20:22 +02:00
Martin Liška
3406ac5347 perf annotate: Add --demangle and --demangle-kernel
'perf annotate' supports --symbol but it's impossible to filter a C++
symbol. With --no-demangle one can filter easily by mangled function
name.

Signed-off-by: Martin Liška <mliska@suse.cz>
Link: http://lore.kernel.org/lkml/c3c7e959-9f7f-18e2-e795-f604275cbac3@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-31 10:39:48 -03:00
KP Singh
63f8af0fc3 selftests/bpf: Add an option for a debug shell in vmtest.sh
The newly introduced -s command line option starts an interactive shell.
If a command is specified, the shell is started after the command
finishes executing. It's useful to have a shell especially when
debugging failing tests or developing new tests.

Since the user may terminate the VM forcefully, an extra "sync" is added
after the execution of the command to persist any logs from the command
into the log file.

Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210323014752.3198283-1-kpsingh@kernel.org
2021-03-30 22:05:23 -07:00
Geliang Tang
5e287fe761 selftests: mptcp: remove id 0 address testcases
This patch added the testcases for removing the id 0 subflow and the id 0
address.

In do_transfer, use the removing addresses number '9' for deleting the id
0 address.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 17:42:23 -07:00
Geliang Tang
2d121c9a88 selftests: mptcp: add addr argument for del_addr
For the id 0 address, different MPTCP connections could be using
different IP addresses for id 0.

This patch added an extra argument IP address for del_addr when
using id 0.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 17:42:23 -07:00
Matthieu Baerts
6254ad4088 selftests: mptcp: avoid calling pm_nl_ctl with bad IDs
IDs are supposed to be between 0 and 255.

In pm_nl_ctl, for both the 'add' and 'get' instruction, the ID is casted
in a u_int8_t. So if we give 256, we will delete ID 0. Obviously, the
goal is not to delete this ID by giving 256.

We could modify pm_nl_ctl and stop if the ID is negative or higher than
255 but probably better not to increase the number of lines for such
things in this tool which is only used in selftests. Instead, we use it
within the limits.

This modification also means that we will no longer add a new ID for the
2nd entry. That's why we removed an expected entry from the dump and
introduced with
commit dc8eb10e95 ("selftests: mptcp: add testcases for setting the address ID").

So now we delete ID 9 like before and we add entries for IDs 10 to 255
that are deleted just after.

Note that this could be seen as a fix but it was not really an issue so
far: we were simply playing with ID 0/1 once again. With the following
commit ("selftests: mptcp: add addr argument for del_addr"), it will be
different because ID 0 is going to required an address. We don't want
errors when trying to delete ID 0 without the address argument.

Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 17:42:23 -07:00
Vlad Buslov
e48792a9ec tc-testing: add simple action change test
Use act_simple to verify that action created with 'tc actions change'
command exists after command returns. The goal is to verify internal action
API reference counting to ensure that the case when netlink message has
NLM_F_REPLACE flag set but action with specified index doesn't exist is
handled correctly.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 17:11:20 -07:00
Paolo Abeni
a062260a9d selftests: net: add UDP GRO forwarding self-tests
Create a bunch of virtual topologies and verify that
NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD-enabled
devices aggregate the ingress packets as expected.
Additionally check that the aggregate packets are
segmented correctly when landing on a socket

Also test SKB_GSO_FRAGLIST and SKB_GSO_UDP_L4 aggregation
on top of UDP tunnel (vxlan)

v1 -> v2:
 - hopefully clarify the commit message
 - moved the overlay network ipv6 range into the 'documentation'
   reserved range (Willem)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 17:06:50 -07:00
Matthew Wilcox (Oracle)
7487de534d radix tree test suite: Fix compilation
Commit 4bba4c4bb0 added tools/include/linux/compiler_types.h which
includes linux/compiler-gcc.h.  Unfortunately, we had our own (empty)
compiler_types.h which overrode the one added by that commit, and
so we lost the definition of __must_be_array().  Removing our empty
compiler_types.h fixes the problem and reduces our divergence from the
rest of the tools.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-03-30 13:44:35 -04:00
Vitaly Kuznetsov
f982fb62a3 selftests: kvm: make hardware_disable_test less verbose
hardware_disable_test produces 512 snippets like
...
 main: [511] waiting semaphore
 run_test: [511] start vcpus
 run_test: [511] all threads launched
 main: [511] waiting 368us
 main: [511] killing child

and this doesn't have much value, let's print this info with pr_debug().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210323104331.1354800-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30 13:07:10 -04:00
Stefan Raspl
75f94ecbd0 tools/kvm_stat: Add restart delay
If this service is enabled and the system rebooted, Systemd's initial
attempt to start this unit file may fail in case the kvm module is not
loaded. Since we did not specify a delay for the retries, Systemd
restarts with a minimum delay a number of times before giving up and
disabling the service. Which means a subsequent kvm module load will
have kvm running without monitoring.
Adding a delay to fix this.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Message-Id: <20210325122949.1433271-1-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30 13:07:09 -04:00
Björn Töpel
ae6b6a1780 selftests: xsk: Remove unused defines
Remove two unused defines.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-18-maciej.fijalkowski@intel.com
2021-03-30 09:24:40 -07:00
Björn Töpel
96539f1c5e selftests: xsk: Remove mutex and condition variable
The usage of the condition variable is broken, and overkill. Replace it
with a pthread barrier.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-17-maciej.fijalkowski@intel.com
2021-03-30 09:24:40 -07:00
Björn Töpel
7651910257 selftests: xsk: Remove thread attribute
There is really no reason to have a non-default thread stack
size. Remove that.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-16-maciej.fijalkowski@intel.com
2021-03-30 09:24:40 -07:00
Maciej Fijalkowski
27e1ca2525 selftests: xsk: Implement bpf_link test
Introduce a test that is supposed to verify the persistence of BPF
resources based on underlying bpf_link usage.

Test will:
1) create and bind two sockets on queue ids 0 and 1
2) run a traffic on queue ids 0
3) remove xsk sockets from queue 0 on both veth interfaces
4) run a traffic on queues ids 1

Running traffic successfully on qids 1 means that BPF resources were
not removed on step 3).

In order to make it work, change the command that creates veth pair to
have the 4 queue pairs by default.

Introduce the arrays of xsks and umems to ifobject struct but keep a
pointers to single entities, so rest of the logic around Rx/Tx can be
kept as-is.

For umem handling, double the size of mmapped space and split that
between the two sockets.

Rename also bidi_pass to a variable 'second_step' of a boolean type as
it's now used also for the test that is introduced here and it doesn't
have anything in common with bi-directional testing.

Drop opt_queue command line argument as it wasn't working before anyway.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-15-maciej.fijalkowski@intel.com
2021-03-30 09:24:39 -07:00
Maciej Fijalkowski
0464b1ed07 selftests: xsk: Remove sync_mutex_tx and atomic var
Although thread_common_ops() are called in both Tx and Rx threads,
testapp_validate() will not spawn Tx thread until Rx thread signals that
it has finished its initialization via condition variable.

Therefore, locking in thread_common_ops is not needed and furthermore Tx
thread does not have to spin on atomic variable.

Note that this simplification wouldn't be possible if there would still
be a common worker thread.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-13-maciej.fijalkowski@intel.com
2021-03-30 09:24:39 -07:00
Maciej Fijalkowski
9445f8c765 selftests: xsk: Refactor teardown/bidi test cases and testapp_validate
Currently, there is a testapp_sockets() that acts like a wrapper around
testapp_validate() and it is called for bidi and teardown test types.
Other test types call testapp_validate() directly.

Split testapp_sockets() onto two separate functions so a bunch of bidi
specific logic can be moved there and out of testapp_validate() itself.

Introduce function pointer to ifobject struct which will be used for
assigning the Rx/Tx function that is assigned to worker thread. Let's
also have a global ifobject Rx/Tx pointers so it's easier to swap the
vectors on a second run of a bi-directional test. Thread creation now is
easey to follow.

switching_notify variable is useless, info about vector switch can be
printed based on bidi_pass state.

Last but not least, init/destroy synchronization variables only once,
not per each test.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-12-maciej.fijalkowski@intel.com
2021-03-30 09:24:39 -07:00
Maciej Fijalkowski
99f9bcb657 selftests: xsk: Remove Tx synchronization resources
Tx thread needs to be started after the Rx side is fully initialized so
that packets are not xmitted until xsk Rx socket is ready to be used.

It can be observed that atomic variable spinning_tx is not checked from
Rx side in any way, so thread_common_ops can be modified to only address
the spinning_rx. This means that spinning_tx can be removed altogheter.

signal_tx_condition is never utilized, so simply remove it.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-11-maciej.fijalkowski@intel.com
2021-03-30 09:24:39 -07:00
Maciej Fijalkowski
9866bcd663 selftests: xsk: Split worker thread
Let's a have a separate Tx/Rx worker threads instead of a one common
thread packed with Tx/Rx specific checks.

Move mmap for umem buffer space and a switch_namespace() call to
thread_common_ops.

This also allows for a bunch of simplifactions that are the subject of
the next commits. The final result will be a code base that is much
easier to follow.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-10-maciej.fijalkowski@intel.com
2021-03-30 09:24:39 -07:00
Maciej Fijalkowski
ef92807897 selftests: xsk: Remove thread for netns switch
Currently, there is a dedicated thread for following remote ns operations:
- grabbing the ifindex of the interface moved to remote netns
- removing xdp prog from that interface

With bpf_link usage in place, this can be simply omitted, so remove
mentioned thread, as BPF resources will be managed by bpf_link itself,
so there's no further need for creating the thread that will switch to
remote netns and do the cleanup.

Keep most of the logic for switching the ns, though, but make
switch_namespace() return the fd so that it will be possible to close it
at the process termination time. Get rid of logic around making sure
that it's possible to switch ns in validate_interfaces().

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-9-maciej.fijalkowski@intel.com
2021-03-30 09:24:39 -07:00
Maciej Fijalkowski
10397994d3 libbpf: xsk: Use bpf_link
Currently, if there are multiple xdpsock instances running on a single
interface and in case one of the instances is terminated, the rest of
them are left in an inoperable state due to the fact of unloaded XDP
prog from interface.

Consider the scenario below:

// load xdp prog and xskmap and add entry to xskmap at idx 10
$ sudo ./xdpsock -i ens801f0 -t -q 10

// add entry to xskmap at idx 11
$ sudo ./xdpsock -i ens801f0 -t -q 11

terminate one of the processes and another one is unable to work due to
the fact that the XDP prog was unloaded from interface.

To address that, step away from setting bpf prog in favour of bpf_link.
This means that refcounting of BPF resources will be done automatically
by bpf_link itself.

Provide backward compatibility by checking if underlying system is
bpf_link capable. Do this by looking up/creating bpf_link on loopback
device. If it failed in any way, stick with netlink-based XDP prog.
therwise, use bpf_link-based logic.

When setting up BPF resources during xsk socket creation, check whether
bpf_link for a given ifindex already exists via set of calls to
bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
and comparing the ifindexes from bpf_link and xsk socket.

For case where resources exist but they are not AF_XDP related, bail out
and ask user to remove existing prog and then retry.

Lastly, do a bit of refactoring within __xsk_setup_xdp_prog and pull out
existing code branches based on prog_id value onto separate functions
that are responsible for resource initialization if prog_id was 0 and
for lookup existing resources for non-zero prog_id as that implies that
XDP program is present on the underlying net device. This in turn makes
it easier to follow, especially the teardown part of both branches.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-7-maciej.fijalkowski@intel.com
2021-03-30 09:24:38 -07:00
Maciej Fijalkowski
aa2d61c154 selftests: xsk: Simplify frame traversal in dumping thread
Store offsets to each layer in a separate variables rather than compute
them every single time.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-6-maciej.fijalkowski@intel.com
2021-03-30 09:24:38 -07:00
Maciej Fijalkowski
965d2cb0f6 selftests: xsk: Remove inline keyword from source file
Follow the kernel coding style guidelines and let compiler do the
decision about inlining.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-5-maciej.fijalkowski@intel.com
2021-03-30 09:24:38 -07:00
Maciej Fijalkowski
7519c387e6 selftests: xsk: Remove unused function
Probably it was ported from xdpsock but is not used anywhere.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-4-maciej.fijalkowski@intel.com
2021-03-30 09:24:38 -07:00
Maciej Fijalkowski
e623bfdef7 selftests: xsk: Remove struct ifaceconfigobj
ifaceconfigobj is not really useful, it is possible to keep the
functionality and simplify the code.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-3-maciej.fijalkowski@intel.com
2021-03-30 09:24:38 -07:00
Maciej Fijalkowski
9f33df73a9 selftests: xsk: Don't call worker_pkt_dump() for stats test
For TEST_TYPE_STATS, worker_pkt_validate() that places frames onto
pkt_buf is not called. Therefore, when dump mode is set, don't call
worker_pkt_dump() for mentioned test type, so that it won't crash on
pkt_buf() access.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-2-maciej.fijalkowski@intel.com
2021-03-30 09:24:38 -07:00