Commit graph

38482 commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo
93c65d6143 perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit()
The changes in ("perf evsel: Rename evsel__increase_rlimit to
rlimit__increase_nofile") ended up breaking the python binding that now
references the rlimit__increase_nofile function, add the util/rlimit.o
to the tools/perf/util/python-ext-sources to cure that.

This was detected by the 'perf test python' regression test:

  $ perf test python
   14: 'import perf' in python        : FAILED!

  $ perf test -v python
  Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
   14: 'import perf' in python                                         :
  --- start ---
  test child forked, pid 2912462
  python usage test: "echo "import sys ; sys.path.insert(0, '/tmp/build/perf-tools-next/python'); import perf" | '/usr/bin/python3' "
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ImportError: /tmp/build/perf-tools-next/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: rlimit__increase_nofile
  test child finished with -1
  ---- end ----
  'import perf' in python: FAILED!
  $

Fixes: e093a222d7 ("perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/lkml/ZTrCS5Z3PZAmfPdV@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-27 19:23:52 -07:00
Jiri Pirko
d96e48a3d5 tools: ynl: introduce option to process unknown attributes or types
In case the kernel sends message back containing attribute not defined
in family spec, following exception is raised to the user:

$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml --do trap-get --json '{"bus-name": "netdevsim", "dev-name": "netdevsim1", "trap-name": "source_mac_is_multicast"}'
Traceback (most recent call last):
  File "/home/jiri/work/linux/tools/net/ynl/lib/ynl.py", line 521, in _decode
    attr_spec = attr_space.attrs_by_val[attr.type]
                ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 132

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jiri/work/linux/./tools/net/ynl/cli.py", line 61, in <module>
    main()
  File "/home/jiri/work/linux/./tools/net/ynl/cli.py", line 49, in main
    reply = ynl.do(args.do, attrs, args.flags)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiri/work/linux/tools/net/ynl/lib/ynl.py", line 731, in do
    return self._op(method, vals, flags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiri/work/linux/tools/net/ynl/lib/ynl.py", line 719, in _op
    rsp_msg = self._decode(decoded.raw_attrs, op.attr_set.name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiri/work/linux/tools/net/ynl/lib/ynl.py", line 525, in _decode
    raise Exception(f"Space '{space}' has no attribute with value '{attr.type}'")
Exception: Space 'devlink' has no attribute with value '132'

Introduce a command line option "process-unknown" and pass it down to
YnlFamily class constructor to allow user to process unknown
attributes and types and print them as binaries.

$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml --do trap-get --json '{"bus-name": "netdevsim", "dev-name": "netdevsim1", "trap-name": "source_mac_is_multicast"}' --process-unknown
{'UnknownAttr(129)': {'UnknownAttr(0)': b'\x00\x00\x00\x00\x00\x00\x00\x00',
                      'UnknownAttr(1)': b'\x00\x00\x00\x00\x00\x00\x00\x00',
                      'UnknownAttr(2)': b'\x0e\x00\x00\x00\x00\x00\x00\x00'},
 'UnknownAttr(132)': b'\x00',
 'UnknownAttr(133)': b'',
 'UnknownAttr(134)': {'UnknownAttr(0)': b''},
 'bus-name': 'netdevsim',
 'dev-name': 'netdevsim1',
 'trap-action': 'drop',
 'trap-group-name': 'l2_drops',
 'trap-name': 'source_mac_is_multicast'}

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231027092525.956172-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27 14:54:31 -07:00
Vishal Verma
8f61d48c83 tools/testing/cxl: Slow down the mock firmware transfer
The cxl-cli unit test for firmware update does operations like starting
an asynchronous firmware update, making sure it is in progress, and
attempting to cancel it. In some cases, such as with no or minimal
dynamic debugging turned on, the firmware update completes too quickly,
not allowing the test to have a chance to verify it was in progress.
This caused a failure of the signature:

  expected fw_update_in_progress:true
  test/cxl-update-firmware.sh: failed at line 88

Fix this by adding a delay (~1.5 - 2 ms) to each firmware transfer
request handled by the mocked interface.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20231026-vv-fw_upd_test_fix-v2-1-5282fd193883@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 13:04:52 -07:00
Jim Harris
98a04c7ace cxl/region: Fix x1 root-decoder granularity calculations
Root decoder granularity must match value from CFWMS, which may not
be the region's granularity for non-interleaved root decoders.

So when calculating granularities for host bridge decoders, use the
region's granularity instead of the root decoder's granularity to ensure
the correct granularities are set for the host bridge decoders and any
downstream switch decoders.

Test configuration is 1 host bridge * 2 switches * 2 endpoints per switch.

Region created with 2048 granularity using following command line:

cxl create-region -m -d decoder0.0 -w 4 mem0 mem2 mem1 mem3 \
		  -g 2048 -s 2048M

Use "cxl list -PDE | grep granularity" to get a view of the granularity
set at each level of the topology.

Before this patch:
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":512,
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":512,
"interleave_granularity":256,

After:
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":4096,
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":4096,
"interleave_granularity":2048,

Fixes: 27b3f8d138 ("cxl/region: Program target lists")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in
[djbw: fixup the prebuilt cxl_test region]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 13:04:52 -07:00
Mickaël Salaün
f12f8f8450
selftests/landlock: Add tests for FS topology changes with network rules
Add 2 tests to the layout1 fixture:
* topology_changes_with_net_only: Checks that FS topology
  changes are not denied by network-only restrictions.
* topology_changes_with_net_and_fs: Make sure that FS topology
  changes are still denied with FS and network restrictions.

This specifically test commit d722036403 ("landlock: Allow FS topology
changes for domains without such rule type").

Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20231027154615.815134-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-27 17:53:31 +02:00
Geliang Tang
629b35a225 selftests: mptcp: display simult in extra_msg
Just like displaying "invert" after "Info: ", "simult" should be
displayed too when rm_subflow_nr doesn't match the expect value in
chk_rm_nr():

      syn                                 [ ok ]
      synack                              [ ok ]
      ack                                 [ ok ]
      add                                 [ ok ]
      echo                                [ ok ]
      rm                                  [ ok ]
      rmsf                                [ ok ] 3 in [2:4]
      Info: invert simult

      syn                                 [ ok ]
      synack                              [ ok ]
      ack                                 [ ok ]
      add                                 [ ok ]
      echo                                [ ok ]
      rm                                  [ ok ]
      rmsf                                [ ok ]
      Info: invert

Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-10-db8f25f798eb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27 08:47:30 -07:00
Geliang Tang
e71aab6777 selftests: mptcp: sockopt: drop mptcp_connect var
Global var mptcp_connect defined at the front of mptcp_sockopt.sh is
duplicate with local var mptcp_connect defined in do_transfer(), drop
this useless global one.

Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-9-db8f25f798eb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27 08:47:30 -07:00
Geliang Tang
9168ea02b8 selftests: mptcp: fix wait_rm_addr/sf parameters
The second input parameter of 'wait_rm_addr/sf $1 1' is misused. If it's
1, wait_rm_addr/sf will never break, and will loop ten times, then
'wait_rm_addr/sf' equals to 'sleep 1'. This delay time is too long,
which can sometimes make the tests fail.

A better way to use wait_rm_addr/sf is to use rm_addr/sf_count to obtain
the current value, and then pass into wait_rm_addr/sf.

Fixes: 4369c198e5 ("selftests: mptcp: test userspace pm out of transfer")
Cc: stable@vger.kernel.org
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-2-db8f25f798eb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27 08:47:07 -07:00
Geliang Tang
f4a75e9d11 selftests: mptcp: run userspace pm tests slower
Some userspace pm tests failed are reported by CI:

112 userspace pm add & remove address
      syn                                 [ ok ]
      synack                              [ ok ]
      ack                                 [ ok ]
      add                                 [ ok ]
      echo                                [ ok ]
      mptcp_info subflows=1:1             [ ok ]
      subflows_total 2:2                  [ ok ]
      mptcp_info add_addr_signal=1:1      [ ok ]
      rm                                  [ ok ]
      rmsf                                [ ok ]
      Info: invert
      mptcp_info subflows=0:0             [ ok ]
      subflows_total 1:1                  [fail]
                         got subflows 0:0 expected 1:1
Server ns stats
TcpPassiveOpens                 2                  0.0
TcpInSegs                       118                0.0

This patch fixes them by changing 'speed' to 5 to run the tests much more
slowly.

Fixes: 4369c198e5 ("selftests: mptcp: test userspace pm out of transfer")
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-1-db8f25f798eb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27 08:47:07 -07:00
Ido Schimmel
0514dd0593 selftests: vxlan_mdb: Use MDB get instead of dump
Test the new MDB get functionality by converting dump and grep to MDB
get.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27 10:51:42 +01:00
Ido Schimmel
e8bba9e83c selftests: bridge_mdb: Use MDB get instead of dump
Test the new MDB get functionality by converting dump and grep to MDB
get.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27 10:51:42 +01:00
Jakub Kicinski
c6f9b7138b bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZTp12QAKCRDbK58LschI
 g8BrAQDifqp5liEEdXV8jdReBwJtqInjrL5tzy5LcyHUMQbTaAEA6Ph3Ct3B+3oA
 mFnIW/y6UJiJrby0Xz4+vV5BXI/5WQg=
 =pLCV
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-10-26

We've added 51 non-merge commits during the last 10 day(s) which contain
a total of 75 files changed, 5037 insertions(+), 200 deletions(-).

The main changes are:

1) Add open-coded task, css_task and css iterator support.
   One of the use cases is customizable OOM victim selection via BPF,
   from Chuyi Zhou.

2) Fix BPF verifier's iterator convergence logic to use exact states
   comparison for convergence checks, from Eduard Zingerman,
   Andrii Nakryiko and Alexei Starovoitov.

3) Add BPF programmable net device where bpf_mprog defines the logic
   of its xmit routine. It can operate in L3 and L2 mode,
   from Daniel Borkmann and Nikolay Aleksandrov.

4) Batch of fixes for BPF per-CPU kptr and re-enable unit_size checking
   for global per-CPU allocator, from Hou Tao.

5) Fix libbpf which eagerly assumed that SHT_GNU_verdef ELF section
   was going to be present whenever a binary has SHT_GNU_versym section,
   from Andrii Nakryiko.

6) Fix BPF ringbuf correctness to fold smp_mb__before_atomic() into
   atomic_set_release(), from Paul E. McKenney.

7) Add a warning if NAPI callback missed xdp_do_flush() under
   CONFIG_DEBUG_NET which helps checking if drivers were missing
   the former, from Sebastian Andrzej Siewior.

8) Fix missed RCU read-lock in bpf_task_under_cgroup() which was throwing
   a warning under sleepable programs, from Yafang Shao.

9) Avoid unnecessary -EBUSY from htab_lock_bucket by disabling IRQ before
   checking map_locked, from Song Liu.

10) Make BPF CI linked_list failure test more robust,
    from Kumar Kartikeya Dwivedi.

11) Enable samples/bpf to be built as PIE in Fedora, from Viktor Malik.

12) Fix xsk starving when multiple xsk sockets were associated with
    a single xsk_buff_pool, from Albert Huang.

13) Clarify the signed modulo implementation for the BPF ISA standardization
    document that it uses truncated division, from Dave Thaler.

14) Improve BPF verifier's JEQ/JNE branch taken logic to also consider
    signed bounds knowledge, from Andrii Nakryiko.

15) Add an option to XDP selftests to use multi-buffer AF_XDP
    xdp_hw_metadata and mark used XDP programs as capable to use frags,
    from Larysa Zaremba.

16) Fix bpftool's BTF dumper wrt printing a pointer value and another
    one to fix struct_ops dump in an array, from Manu Bretelle.

* tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (51 commits)
  netkit: Remove explicit active/peer ptr initialization
  selftests/bpf: Fix selftests broken by mitigations=off
  samples/bpf: Allow building with custom bpftool
  samples/bpf: Fix passing LDFLAGS to libbpf
  samples/bpf: Allow building with custom CFLAGS/LDFLAGS
  bpf: Add more WARN_ON_ONCE checks for mismatched alloc and free
  selftests/bpf: Add selftests for netkit
  selftests/bpf: Add netlink helper library
  bpftool: Extend net dump with netkit progs
  bpftool: Implement link show support for netkit
  libbpf: Add link-based API for netkit
  tools: Sync if_link uapi header
  netkit, bpf: Add bpf programmable net device
  bpf: Improve JEQ/JNE branch taken logic
  bpf: Fold smp_mb__before_atomic() into atomic_set_release()
  bpf: Fix unnecessary -EBUSY from htab_lock_bucket
  xsk: Avoid starving the xsk further down the list
  bpf: print full verifier states on infinite loop detection
  selftests/bpf: test if state loops are detected in a tricky case
  bpf: correct loop detection for iterators convergence
  ...
====================

Link: https://lore.kernel.org/r/20231026150509.2824-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 20:02:41 -07:00
Jakub Kicinski
eb9df66838 tools: ynl-gen: respect attr-cnt-name at the attr set level
Davide reports that we look for the attr-cnt-name in the wrong
object. We try to read it from the family, but the schema only
allows for it to exist at attr-set level.

Reported-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/all/CAKa-r6vCj+gPEUKpv7AsXqM77N6pB0evuh7myHq=585RA3oD5g@mail.gmail.com/
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231025182739.184706-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 19:42:47 -07:00
Jakub Kicinski
bc30bb88ff netlink: specs: support conditional operations
Page pool code is compiled conditionally, but the operations
are part of the shared netlink family. We can handle this
by reporting empty list of pools or -EOPNOTSUPP / -ENOSYS
but the cleanest way seems to be removing the ops completely
at compilation time. That way user can see that the page
pool ops are not present using genetlink introspection.
Same way they'd check if the kernel is "new enough" to
support the ops.

Extend the specs with the ability to specify the config
condition under which op (and its policies, etc.) should
be hidden.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231025162253.133159-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 19:42:15 -07:00
Jakub Kicinski
ea23fbd2a8 netlink: make range pointers in policies const
struct nla_policy is usually constant itself, but unless
we make the ranges inside constant we won't be able to
make range structs const. The ranges are not modified
by the core.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231025162204.132528-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 16:24:09 -07:00
Jakub Kicinski
ec4c20ca09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/mac80211/rx.c
  91535613b6 ("wifi: mac80211: don't drop all unprotected public action frames")
  6c02fab724 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value")

Adjacent changes:

drivers/net/ethernet/apm/xgene/xgene_enet_main.c
  61471264c0 ("net: ethernet: apm: Convert to platform remove callback returning void")
  d2ca43f306 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF")

net/vmw_vsock/virtio_transport.c
  64c99d2d6a ("vsock/virtio: support to send non-linear skb")
  53b08c4985 ("vsock/virtio: initialize the_virtio_vsock before using VQs")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 13:46:28 -07:00
Konstantin Meskhidze
a549d055a2
selftests/landlock: Add network tests
Add 82 test suites to check edge cases related to bind() and connect()
actions. They are defined with 6 fixtures and their variants:

The "protocol" fixture is extended with 12 variants defined as a matrix
of: sandboxed/not-sandboxed, IPv4/IPv6/unix network domain, and
stream/datagram socket. 4 related tests suites are defined:
* bind: Tests bind action.
* connect: Tests connect action.
* bind_unspec: Tests bind action with the AF_UNSPEC socket family.
* connect_unspec: Tests connect action with the AF_UNSPEC socket family.

The "ipv4" fixture is extended with 4 variants defined as a matrix
of: sandboxed/not-sandboxed, and stream/datagram socket. 1 related test
suite is defined:
* from_unix_to_inet: Tests to make sure unix sockets' actions are not
  restricted by Landlock rules applied to TCP ones.

The "tcp_layers" fixture is extended with 8 variants defined as a matrix
of: IPv4/IPv6 network domain, and different number of landlock rule
layers. 2 related tests suites are defined:
* ruleset_overlap: Tests nested layers with less constraints.
* ruleset_expand: Tests nested layers with more constraints.

In the "mini" fixture 4 tests suites are defined:
* network_access_rights: Tests handling of known access rights.
* unknown_access_rights: Tests handling of unknown access rights.
* inval: Tests unhandled allowed access and zero access value.
* tcp_port_overflow: Tests with port values greater than 65535.

The "ipv4_tcp" fixture supports IPv4 network domain with stream socket.
2 tests suites are defined:
* port_endianness: Tests with big/little endian port formats.
* with_fs: Tests a ruleset with both filesystem and network
  restrictions.

The "port_specific" fixture is extended with 4 variants defined
as a matrix of: sandboxed/not-sandboxed, IPv4/IPv6 network domain,
and stream socket. 2 related tests suites are defined:
* bind_connect_zero: Tests with port 0.
* bind_connect_1023: Tests with port 1023.

Test coverage for security/landlock is 92.4% of 710 lines according to
gcc/gcov-13.

Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20231026014751.414649-11-konstantin.meskhidze@huawei.com
[mic: Extend commit message, update test coverage, clean up capability
use, fix useless TEST_F_FORK, and improve ipv4_tcp.with_fs]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26 21:07:16 +02:00
Konstantin Meskhidze
1fa335209f
selftests/landlock: Share enforce_ruleset() helper
Move enforce_ruleset() helper function to common.h so that it can be
used both by filesystem tests and network ones.

Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20231026014751.414649-10-konstantin.meskhidze@huawei.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26 21:07:15 +02:00
Konstantin Meskhidze
fff69fb03d
landlock: Support network rules with TCP bind and connect
Add network rules support in the ruleset management helpers and the
landlock_create_ruleset() syscall. Extend user space API to support
network actions:
* Add new network access rights: LANDLOCK_ACCESS_NET_BIND_TCP and
  LANDLOCK_ACCESS_NET_CONNECT_TCP.
* Add a new network rule type: LANDLOCK_RULE_NET_PORT tied to struct
  landlock_net_port_attr. The allowed_access field contains the network
  access rights, and the port field contains the port value according to
  the controlled protocol. This field can take up to a 64-bit value
  but the maximum value depends on the related protocol (e.g. 16-bit
  value for TCP). Network port is in host endianness [1].
* Add a new handled_access_net field to struct landlock_ruleset_attr
  that contains network access rights.
* Increment the Landlock ABI version to 4.

Implement socket_bind() and socket_connect() LSM hooks, which enable
to control TCP socket binding and connection to specific ports.

Expand access_masks_t from u16 to u32 to be able to store network access
rights alongside filesystem access rights for rulesets' handled access
rights.

Access rights are not tied to socket file descriptors but checked at
bind() or connect() call time against the caller's Landlock domain. For
the filesystem, a file descriptor is a direct access to a file/data.
However, for network sockets, we cannot identify for which data or peer
a newly created socket will give access to. Indeed, we need to wait for
a connect or bind request to identify the use case for this socket.
Likewise a directory file descriptor may enable to open another file
(i.e. a new data item), but this opening is also restricted by the
caller's domain, not the file descriptor's access rights [2].

[1] https://lore.kernel.org/r/278ab07f-7583-a4e0-3d37-1bacd091531d@digikod.net
[2] https://lore.kernel.org/r/263c1eb3-602f-57fe-8450-3f138581bee7@digikod.net

Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20231026014751.414649-9-konstantin.meskhidze@huawei.com
[mic: Extend commit message, fix typo in comments, and specify
endianness in the documentation]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26 21:07:15 +02:00
James Clark
0b783d2e82 perf tests: test_arm_coresight: Simplify source iteration
There are two reasons to do this, firstly there is a shellcheck warning
in cs_etm_dev_name(), which can be completely deleted. And secondly the
current iteration method doesn't support systems with both ETE and ETM
because it picks one or the other. There isn't a known system with this
configuration, but it could happen in the future.

Iterating over all the sources for each CPU can be done by going through
/sys/bus/event_source/devices/cs_etm/cpu* and following the symlink back
to the Coresight device in /sys/bus/coresight/devices. This will work
whether the device is ETE, ETM or any future name, and is much simpler
and doesn't require any hard coded version numbers

Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: tianruidong@linux.alibaba.com
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: atrajeev@linux.vnet.ibm.com
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231023131550.487760-1-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-26 10:58:10 -07:00
Ian Rogers
4ece2a7e88 perf vendor events intel: Add tigerlake two metrics
Add tma_info_system_socket_clks and uncore_freq metrics.

The associated converter script fix is in:
https://github.com/intel/perfmon/pull/112

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Link: https://lore.kernel.org/r/20230926205948.1399594-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-26 10:15:04 -07:00
Ian Rogers
19a214bffd perf vendor events intel: Add broadwellde two metrics
Add tma_info_system_socket_clks and uncore_freq metrics that require a
broadwellx style uncore event for UNC_CLOCK.

The associated converter script fix is in:
https://github.com/intel/perfmon/pull/112

Fixes: 7d124303d6 ("perf vendor events intel: Update broadwell variant events/metrics")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Link: https://lore.kernel.org/r/20230926205948.1399594-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-26 10:14:53 -07:00
Catalin Marinas
2baca17e6a Merge branch 'for-next/feat_lse128' into for-next/core
* for-next/feat_lse128:
  : HWCAP for FEAT_LSE128
  kselftest/arm64: add FEAT_LSE128 to hwcap test
  arm64: add FEAT_LSE128 HWCAP
2023-10-26 17:10:07 +01:00
Catalin Marinas
023113fe66 Merge branch 'for-next/feat_lrcpc3' into for-next/core
* for-next/feat_lrcpc3:
  : HWCAP for FEAT_LRCPC3
  selftests/arm64: add HWCAP2_LRCPC3 test
  arm64: add FEAT_LRCPC3 HWCAP
2023-10-26 17:10:05 +01:00
Catalin Marinas
2a3f8ce3bb Merge branch 'for-next/feat_sve_b16b16' into for-next/core
* for-next/feat_sve_b16b16:
  : Add support for FEAT_SVE_B16B16 (BFloat16)
  kselftest/arm64: Verify HWCAP2_SVE_B16B16
  arm64/sve: Report FEAT_SVE_B16B16 to userspace
2023-10-26 17:10:01 +01:00
Nicolin Chen
55a01657cb iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs
The IOMMU_HWPT_ALLOC ioctl now supports passing user_data to allocate a
user-managed domain for nested HWPTs. Add its coverage for that. Also,
update _test_cmd_hwpt_alloc() and add test_cmd/err_hwpt_alloc_nested().

Link: https://lore.kernel.org/r/20231026043938.63898-11-yi.l.liu@intel.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:15:57 -03:00
Yafang Shao
399f6185a1 selftests/bpf: Fix selftests broken by mitigations=off
When we configure the kernel command line with 'mitigations=off' and set
the sysctl knob 'kernel.unprivileged_bpf_disabled' to 0, the commit
bc5bc309db ("bpf: Inherit system settings for CPU security mitigations")
causes issues in the execution of `test_progs -t verifier`. This is
because 'mitigations=off' bypasses Spectre v1 and Spectre v4 protections.

Currently, when a program requests to run in unprivileged mode
(kernel.unprivileged_bpf_disabled = 0), the BPF verifier may prevent
it from running due to the following conditions not being enabled:

  - bypass_spec_v1
  - bypass_spec_v4
  - allow_ptr_leaks
  - allow_uninit_stack

While 'mitigations=off' enables the first two conditions, it does not
enable the latter two. As a result, some test cases in
'test_progs -t verifier' that were expected to fail to run may run
successfully, while others still fail but with different error messages.
This makes it challenging to address them comprehensively.

Moreover, in the future, we may introduce more fine-grained control over
CPU mitigations, such as enabling only bypass_spec_v1 or bypass_spec_v4.

Given the complexity of the situation, rather than fixing each broken test
case individually, it's preferable to skip them when 'mitigations=off' is
in effect and introduce specific test cases for the new 'mitigations=off'
scenario. For instance, we can introduce new BTF declaration tags like
'__failure__nospec', '__failure_nospecv1' and '__failure_nospecv4'.

In this patch, the approach is to simply skip the broken test cases when
'mitigations=off' is enabled. The result of `test_progs -t verifier` as
follows after this commit,

Before this commit
==================

- without 'mitigations=off'
  - kernel.unprivileged_bpf_disabled = 2
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED
  - kernel.unprivileged_bpf_disabled = 0
    Summary: 74/1336 PASSED, 0 SKIPPED, 0 FAILED    <<<<
- with 'mitigations=off'
  - kernel.unprivileged_bpf_disabled = 2
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED
  - kernel.unprivileged_bpf_disabled = 0
    Summary: 63/1276 PASSED, 0 SKIPPED, 11 FAILED   <<<< 11 FAILED

After this commit
=================

- without 'mitigations=off'
  - kernel.unprivileged_bpf_disabled = 2
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED
  - kernel.unprivileged_bpf_disabled = 0
    Summary: 74/1336 PASSED, 0 SKIPPED, 0 FAILED    <<<<
- with this patch, with 'mitigations=off'
  - kernel.unprivileged_bpf_disabled = 2
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED
  - kernel.unprivileged_bpf_disabled = 0
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED   <<<< SKIPPED

Fixes: bc5bc309db ("bpf: Inherit system settings for CPU security mitigations")
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Closes: https://lore.kernel.org/bpf/CAADnVQKUBJqg+hHtbLeeC2jhoJAWqnmRAzXW3hmUCNSV9kx4sQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20231025031144.5508-1-laoar.shao@gmail.com
2023-10-26 15:42:03 +02:00
Rafael J. Wysocki
bf224871c2 Merge branches 'pm-sleep', 'powercap' and 'pm-tools'
Merge updates related to system sleep handling, one power capping update
and one PM utility update for 6.7-rc1:

 - Use __get_safe_page() rather than touching the list in hibernation
   snapshot code (Brian Geffon).

 - Fix symbol export for _SIMPLE_ variants of _PM_OPS() (Raag Jadav).

 - Clean up sync_read handling in snapshot_write_next() (Brian Geffon).

 - Fix kerneldoc comments for swsusp_check() and swsusp_close() to
   better match code (Christoph Hellwig).

 - Downgrade BIOS locked limits pr_warn() in the Intel RAPL power
   capping driver to pr_debug() (Ville Syrjälä).

 - Change the minimum python version for the intel_pstate_tracer utility
   from 2.7 to 3.6 (Doug Smythies).

* pm-sleep:
  PM: hibernate: fix the kerneldoc comment for swsusp_check() and swsusp_close()
  PM: hibernate: Clean up sync_read handling in snapshot_write_next()
  PM: sleep: Fix symbol export for _SIMPLE_ variants of _PM_OPS()
  PM: hibernate: Use __get_safe_page() rather than touching the list

* powercap:
  powercap: intel_rapl: Downgrade BIOS locked limits pr_warn() to pr_debug()

* pm-tools:
  tools/power/x86/intel_pstate_tracer: python minimum version
2023-10-26 15:16:03 +02:00
Ian Rogers
3779416eed perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric
Broadwell-de has a consumer core and server uncore. The uncore_arb PMU
isn't present and the broadwellx style cbox PMU should be used
instead. Fix the tma_info_system_dram_bw_use metric to use the server
metric rather than client.

The associated converter script fix is in:
https://github.com/intel/perfmon/pull/111

Fixes: 7d124303d6 ("perf vendor events intel: Update broadwell variant events/metrics")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Link: https://lore.kernel.org/r/20230926031034.1201145-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 15:19:28 -07:00
Ian Rogers
56e144fe98 perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
Fix leak where mem_info__put wouldn't release the maps/map as used by
perf mem. Add exit functions and use elsewhere that the maps and map
are released.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:39:58 -07:00
Ian Rogers
dec07fe5d4 perf callchain: Minor layout changes to callchain_list
Avoid 6 byte hole for padding. Place more frequently used fields
first in an attempt to use just 1 cacheline in the common case.

Before:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        const char  *              srcline;              /*   104     8 */
        struct list_head           list;                 /*   112    16 */

        /* size: 128, cachelines: 2, members: 13 */
        /* sum members: 122, holes: 1, sum holes: 6 */
};
```

After:
```
struct callchain_list {
        struct list_head           list;                 /*     0    16 */
        u64                        ip;                   /*    16     8 */
        struct map_symbol          ms;                   /*    24    24 */
        const char  *              srcline;              /*    48     8 */
        u64                        branch_count;         /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        from_count;           /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        u64                        predicted_count;      /*   104     8 */
        u64                        abort_count;          /*   112     8 */
        struct {
                _Bool              unfolded;             /*   120     1 */
                _Bool              has_children;         /*   121     1 */
        };                                               /*   120     2 */

        /* size: 128, cachelines: 2, members: 13 */
        /* padding: 6 */
};
```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:39:32 -07:00
Ian Rogers
6ba29fbb0b perf callchain: Make brtype_stat in callchain_list optional
struct callchain_list is 352bytes in size, 232 of which are
brtype_stat. brtype_stat is only used for certain callchain_list
items so make it optional, allocating when necessary. So that
printing doesn't need to deal with an optional brtype_stat, pass
an empty/zero version.

Before:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat    brtype_stat;          /*    96   232 */
        /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
        const char  *              srcline;              /*   328     8 */
        struct list_head           list;                 /*   336    16 */

        /* size: 352, cachelines: 6, members: 13 */
        /* sum members: 346, holes: 1, sum holes: 6 */
        /* last cacheline: 32 bytes */
};
```

After:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        const char  *              srcline;              /*   104     8 */
        struct list_head           list;                 /*   112    16 */

        /* size: 128, cachelines: 2, members: 13 */
        /* sum members: 122, holes: 1, sum holes: 6 */
};
```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:39:08 -07:00
Ian Rogers
d47d876d72 perf callchain: Make display use of branch_type_stat const
Display code doesn't modify the branch_type_stat so switch uses to
const. This is done to aid refactoring struct callchain_list where
current the branch_type_stat is embedded even if not used.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:38:50 -07:00
Ian Rogers
67a3ebf1c3 perf offcpu: Add missed btf_free
Caught by address/leak sanitizer.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:38:33 -07:00
Ian Rogers
7b2e444b76 perf threads: Remove unused dead thread list
Commit 40826c45eb ("perf thread: Remove notion of dead threads")
removed dead threads but the list head wasn't removed. Remove it here.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:38:09 -07:00
Ian Rogers
c1149037f6 perf hist: Add missing puts to hist__account_cycles
Caught using reference count checking on perf top with
"--call-graph=lbr". After this no memory leaks were detected.

Fixes: 57849998e2 ("perf report: Add processing for cycle histograms")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:37:48 -07:00
Ian Rogers
78c32f4cb1 libperf rc_check: Add RC_CHK_EQUAL
Comparing pointers with reference count checking is tricky to avoid a
SEGV. Add a convenience macro to simplify and use.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:37:22 -07:00
Ian Rogers
75265320d2 libperf rc_check: Make implicit enabling work for GCC
Make the implicit REFCOUNT_CHECKING robust to when building with GCC.

Fixes: 9be6ab181b ("libperf rc_check: Enable implicitly with sanitizers")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:36:50 -07:00
Ian Rogers
ab8ce15078 perf machine: Avoid out of bounds LBR memory read
Running perf top with address sanitizer and "--call-graph=lbr" fails
due to reading sample 0 when no samples exist. Add a guard to prevent
this.

Fixes: e2b23483eb ("perf machine: Factor out lbr_callchain_add_lbr_ip()")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:36:20 -07:00
Ian Rogers
7a8f349e9d perf rwsem: Add debug mode that uses a mutex
Mutex error check will capture trying to take the lock recursively and
other problems that rwlock won't. At the expense of concurrency, adda
debug mode that uses a mutex in place of a rwsem.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:35:35 -07:00
Arnaldo Carvalho de Melo
b27778ed5d perf build: Address stray '\' before # that is warned about since grep 3.8
To address this grep 3.8 warning:

  grep: warning: stray \ before #

We needed to remove the '' around the grep expression and keep the \
before # so that it is escaped by the $(shell grep ...) and thus doesn't
get to grep.

We need that \ before the #, otherwise we get this:

  Makefile.perf:364: *** unterminated call to function 'shell': missing ')'.  Stop.

As everything after the # will be considered a comment.

Removing the single quotes needs some more escaping so that _some_ of
the escaped chars gets to grep, like the '\|' that becomes '\\\|´.

Running on debian:10, where there is no libtraceevent-devel available,
we get:

  Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
  make[1]: *** [Makefile.perf:242: sub-make] Error 2

I.e. both the comments and the util/trace-event.c were removed.

When using:

msg := $(error PYTHON_EXT_SRCS=$(PYTHON_EXT_SRCS))

While on the more recent fedora:38, with the new grep and make packages
and libtraceevent-devel installed:

  Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c util/trace-event.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
  make[1]: *** [Makefile.perf:242: sub-make] Error 2
  make: *** [Makefile:113: install-bin] Error 2
  make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
  $

I.e. only the comments were removed.

If we build it on the same fedora:38 system, but using NO_LIBTRACEEVENT=1

  $ make NO_LIBTRACEEVENT=1 CORESIGHT=1 O=/tmp/build/$(basename $PWD) -C tools/perf install-bin
  Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
  make[1]: *** [Makefile.perf:242: sub-make] Error 2
  make: *** [Makefile:113: install-bin] Error 2
  make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
  $

Both comments and the util/trace-event.c file removed.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/ZTj6mfM9UqY2DggC@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 10:05:03 -07:00
Namhyung Kim
a6e4a4a14a perf report: Fix hierarchy mode on pipe input
The hierarchy mode needs to setup output formats for each evsel.
Normally setup_sorting() handles this at the beginning, but it cannot
do that if data comes from a pipe since there's no evsel info before
reading the data.  And then perf report cannot process the samples
in hierarchy mode and think as if there's no sample.

Let's check the condition and setup the output formats after reading
data so that it can find evsels.

Before:

  $ ./perf record -o- true | ./perf report -i- --hierarchy -q
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.000 MB - ]
  Error:
  The - data has no samples!

After:

  $ ./perf record -o- true | ./perf report -i- --hierarchy -q
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.000 MB - ]
      94.76%        true
         94.76%        [kernel.kallsyms]
            94.76%        [k] filemap_fault
       5.24%        perf-ex
          5.24%        [kernel.kallsyms]
             5.06%        [k] __memset
             0.18%        [k] native_write_msr

Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231025003121.2811738-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 10:04:19 -07:00
Namhyung Kim
b5711042a1 perf lock contention: Use per-cpu array map for spinlocks
Currently lock contention timestamp is maintained in a hash map keyed by
pid.  That means it needs to get and release a map element (which is
proctected by spinlock!) on each contention begin and end pair.  This
can impact on performance if there are a lot of contention (usually from
spinlocks).

It used to go with task local storage but it had an issue on memory
allocation in some critical paths.  Although it's addressed in recent
kernels IIUC, the tool should support old kernels too.  So it cannot
simply switch to the task local storage at least for now.

As spinlocks create lots of contention and they disabled preemption
during the spinning, it can use per-cpu array to keep the timestamp to
avoid overhead in hashmap update and delete.

In contention_begin, it's easy to check the lock types since it can see
the flags.  But contention_end cannot see it.  So let's try to per-cpu
array first (unconditionally) if it has an active element (lock != 0).
Then it should be used and per-task tstamp map should not be used until
the per-cpu array element is cleared which means nested spinlock
contention (if any) was finished and it nows see (the outer) lock.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org
2023-10-25 10:02:55 -07:00
Namhyung Kim
6a070573f2 perf lock contention: Check race in tstamp elem creation
When pelem is NULL, it'd create a new entry with zero data.  But it
might be preempted by IRQ/NMI just before calling bpf_map_update_elem()
then there's a chance to call it twice for the same pid.  So it'd be
better to use BPF_NOEXIST flag and check the return value to prevent
the race.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-2-namhyung@kernel.org
2023-10-25 10:02:47 -07:00
Namhyung Kim
d99317f214 perf lock contention: Clear lock addr after use
It checks the current lock to calculated the delta of contention time.
The address is saved in the tstamp map which is allocated at begining of
contention and released at end of contention.

But it's possible for bpf_map_delete_elem() to fail.  In that case, the
element in the tstamp map kept for the current lock and it makes the
next contention for the same lock tracked incorrectly.  Specificially
the next contention begin will see the existing element for the task and
it'd just return.  Then the next contention end will see the element and
calculate the time using the timestamp for the previous begin.

This can result in a large value for two small contentions happened from
time to time.  Let's clear the lock address so that it can be updated
next time even if the bpf_map_delete_elem() failed.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-1-namhyung@kernel.org
2023-10-25 10:02:34 -07:00
Yang Jihong
e093a222d7 perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile
evsel__increase_rlimit() helper does nothing with evsel, and description
of the functionality is inaccurate, rename it and move to util/rlimit.c.

By the way, fix a checkppatch warning about misplaced license tag:

  WARNING: Misplaced SPDX-License-Identifier tag - use line 1 instead
  #160: FILE: tools/perf/util/rlimit.h:3:
  /* SPDX-License-Identifier: LGPL-2.1 */

No functional change.

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231023033144.1011896-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 10:02:11 -07:00
Namhyung Kim
79a3371bdf perf bench sched pipe: Add -G/--cgroups option
The -G/--cgroups option is to put sender and receiver in different
cgroups in order to measure cgroup context switch overheads.

Users need to make sure the cgroups exist and accessible.  The following
example should the effect of this change.  Please don't forget taskset
before the perf bench to measure cgroup switches properly.  Otherwise
each task would run on a different CPU and generate cgroup switches
regardless of this change.

  # perf stat -e context-switches,cgroup-switches \
  > taskset -c 0 perf bench sched pipe -l 10000 > /dev/null

   Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000':

              20,001      context-switches
                   2      cgroup-switches

         0.053449651 seconds time elapsed

         0.011286000 seconds user
         0.041869000 seconds sys

  # perf stat -e context-switches,cgroup-switches \
  > taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB > /dev/null

   Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB':

              20,001      context-switches
              20,001      cgroup-switches

         0.052768627 seconds time elapsed

         0.006284000 seconds user
         0.046266000 seconds sys

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20231017202342.1353124-1-namhyung@kernel.org
2023-10-25 10:02:10 -07:00
Michael Petlan
cbf5f58461 perf test: Skip CoreSight tests if cs_etm// event is not available
CoreSight might be not available, in such case, skip the tests.

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: vmolnaro@redhat.com
Link: https://lore.kernel.org/r/20231019091137.22525-1-mpetlan@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 10:02:01 -07:00
Vegard Nossum
6feb1a9641 cpupower: fix reference to nonexistent document
This file was renamed from .txt to .rst and left a dangling reference.
Fix it.

Fixes: 151f4e2bdc ("docs: power: convert docs to ReST and rename to *.rst")
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-25 10:14:35 -06:00
Rafael J. Wysocki
607218deac - Add support for Mediatek LVTS MT8192 driver along with the
suspend/resume routines (Balsam Chihi)
 
 - Fix probe for THERMAL_V2 for the Mediatek LVTS driver (Markus
   Schneider-Pargmann)
 
 - Remove duplicate error message in the max76620 driver when
   thermal_of_zone_register() fails as the sub routine already show one
   (Thierry Reding)
 
 - Add i.MX7D compatible bindings to fix a warning from dtbs_check for
   the imx6ul platform (Alexander Stein)
 
 - Add sa8775p compatible for the QCom tsens driver (Priyansh Jain)
 
 - Fix error check in lvts_debugfs_init() which is checking against
   NULL instead of PTR_ERR() on the LVTS Mediatek driver (Minjie Du)
 
 - Remove unused variable in the thermal/tools (Kuan-Wei Chiu)
 
 - Document the imx8dl thermal sensor (Fabio Estevam)
 
 - Add variable names in callback prototypes to prevent warning from
   checkpatch.pl for the imx8mm driver (Bragatheswaran Manickavel)
 
 - Add missing unevaluatedProperties on child node schemas for tegra124
   (Rob Herring)
 
 - Add mt7988 support for the Mediatek LVTS driver (Frank Wunderlich)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmU35s8ACgkQqDIjiipP
 6E8nggf/VQei7x/fRTFRYJVyP6SdwzpZPcyMYEcwL50wpp/d/R5xapFzM7nofV6n
 SYFXkpuejGH5d5JbJjpkv1S56Q/Bf6yy7cGciKiBmxaU2cWY636KmhlUPe2kfmX5
 ipiNi/ZU0zAjZM7FRpBVPgp0HTvnGehDzIaSi6Y/vKVAw5TPTzjLz8oNoeHGP4/6
 xNRgJuTwAvqL1lqZoo9YV7tVdkcuQamf3dkDkHqvSSL9UilHZEqrtLyh1Iz12+py
 LESjlePz1jzK169oG5QiA4thIMV7CPTwkrkAJXwT2H2zWccg7JzZ7K1S3aJq2vHL
 5UMcsUE2wCczfdnlhxD6wkJ3CDyirA==
 =i0K9
 -----END PGP SIGNATURE-----

Merge tag 'thermal-v6.7-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Merge thermal control (ARM drivers mostly) updates for 6.7-rc1 from
Daniel Lezcano:

"- Add support for Mediatek LVTS MT8192 driver along with the
   suspend/resume routines (Balsam Chihi)

 - Fix probe for THERMAL_V2 for the Mediatek LVTS driver (Markus
   Schneider-Pargmann)

 - Remove duplicate error message in the max76620 driver when
   thermal_of_zone_register() fails as the sub routine already show one
   (Thierry Reding)

 - Add i.MX7D compatible bindings to fix a warning from dtbs_check for
   the imx6ul platform (Alexander Stein)

 - Add sa8775p compatible for the QCom tsens driver (Priyansh Jain)

 - Fix error check in lvts_debugfs_init() which is checking against
   NULL instead of PTR_ERR() on the LVTS Mediatek driver (Minjie Du)

 - Remove unused variable in the thermal/tools (Kuan-Wei Chiu)

 - Document the imx8dl thermal sensor (Fabio Estevam)

 - Add variable names in callback prototypes to prevent warning from
   checkpatch.pl for the imx8mm driver (Bragatheswaran Manickavel)

 - Add missing unevaluatedProperties on child node schemas for tegra124
  (Rob Herring)

 - Add mt7988 support for the Mediatek LVTS driver (Frank Wunderlich)"

* tag 'thermal-v6.7-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/qcom/tsens: Drop ops_v0_1
  thermal/drivers/mediatek/lvts_thermal: Update calibration data documentation
  thermal/drivers/mediatek/lvts_thermal: Add mt8192 support
  thermal/drivers/mediatek/lvts_thermal: Add suspend and resume
  dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for mt8192
  thermal/drivers/mediatek: Fix probe for THERMAL_V2
  thermal/drivers/max77620: Remove duplicate error message
  dt-bindings: timer: add imx7d compatible
  dt-bindings: net: microchip: Allow nvmem-cell usage
  dt-bindings: imx-thermal: Add #thermal-sensor-cells property
  dt-bindings: thermal: tsens: Add sa8775p compatible
  thermal/drivers/mediatek/lvts_thermal: Fix error check in lvts_debugfs_init()
  tools/thermal: Remove unused 'mds' and 'nrhandler' variables
  dt-bindings: thermal: fsl,scu-thermal: Document imx8dl
  thermal/drivers/imx8mm_thermal: Fix function pointer declaration by adding identifier name
  dt-bindings: thermal: nvidia,tegra124-soctherm: Add missing unevaluatedProperties on child node schemas
  thermal/drivers/mediatek/lvts_thermal: Add mt7988 support
  thermal/drivers/mediatek/lvts_thermal: Make coeff configurable
  dt-bindings: thermal: mediatek: Add LVTS thermal sensors for mt7988
  dt-bindings: thermal: mediatek: Add mt7988 lvts compatible
2023-10-25 14:41:10 +02:00
Daniel Borkmann
ace15f91e5 selftests/bpf: Add selftests for netkit
Add a bigger batch of test coverage to assert correct operation of
netkit devices and their BPF program management:

  # ./test_progs -t tc_netkit
  [...]
  [    1.166267] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.166831] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  [    1.270957] tsc: Refined TSC clocksource calibration: 3407.988 MHz
  [    1.272579] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns
  [    1.275336] clocksource: Switched to clocksource tsc
  #257     tc_netkit_basic:OK
  #258     tc_netkit_device:OK
  #259     tc_netkit_multi_links:OK
  #260     tc_netkit_multi_opts:OK
  #261     tc_netkit_neigh_links:OK
  Summary: 5/0 PASSED, 0 SKIPPED, 0 FAILED
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-8-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24 16:07:43 -07:00
Daniel Borkmann
51f1892b52 selftests/bpf: Add netlink helper library
Add a minimal netlink helper library for the BPF selftests. This has been
taken and cut down and cleaned up from iproute2. This covers basics such
as netdevice creation which we need for BPF selftests / BPF CI given
iproute2 package cannot cover it yet.

Stanislav Fomichev suggested that this could be replaced in future by ynl
tool generated C code once it has RTNL support to create devices. Once we
get to this point the BPF CI would also need to add libmnl. If no further
extensions are needed, a second option could be that we remove this code
again once iproute2 package has support.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-7-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24 16:07:39 -07:00
Daniel Borkmann
bec981a4ad bpftool: Extend net dump with netkit progs
Add support to dump BPF programs on netkit via bpftool. This includes both
the BPF link and attach ops programs. Dumped information contain the attach
location, function entry name, program ID and link ID when applicable.

Example with tc BPF link:

  # ./bpftool net
  xdp:

  tc:
  nk1(22) netkit/peer tc1 prog_id 43 link_id 12

  [...]

Example with json dump:

  # ./bpftool net --json | jq
  [
    {
      "xdp": [],
      "tc": [
        {
          "devname": "nk1",
          "ifindex": 18,
          "kind": "netkit/primary",
          "name": "tc1",
          "prog_id": 29,
          "prog_flags": [],
          "link_id": 8,
          "link_flags": []
        }
      ],
      "flow_dissector": [],
      "netfilter": []
    }
  ]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-6-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24 16:07:32 -07:00
Daniel Borkmann
92a85e18ad bpftool: Implement link show support for netkit
Add support to dump netkit link information to bpftool in similar way as
we have for XDP. The netkit link info only exposes the ifindex and the
attach_type.

Below shows an example link dump output, and a cgroup link is included for
comparison, too:

  # bpftool link
  [...]
  10: cgroup  prog 2466
        cgroup_id 1  attach_type cgroup_inet6_post_bind
  [...]
  8: netkit  prog 35
        ifindex nk1(18)  attach_type netkit_primary
  [...]

Equivalent json output:

  # bpftool link --json
  [...]
  {
    "id": 10,
    "type": "cgroup",
    "prog_id": 2466,
    "cgroup_id": 1,
    "attach_type": "cgroup_inet6_post_bind"
  },
  [...]
  {
    "id": 12,
    "type": "netkit",
    "prog_id": 61,
    "devname": "nk1",
    "ifindex": 21,
    "attach_type": "netkit_primary"
  }
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-5-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24 16:07:24 -07:00
Daniel Borkmann
05c31b4ab2 libbpf: Add link-based API for netkit
This adds bpf_program__attach_netkit() API to libbpf. Overall it is very
similar to tcx. The API looks as following:

  LIBBPF_API struct bpf_link *
  bpf_program__attach_netkit(const struct bpf_program *prog, int ifindex,
                             const struct bpf_netkit_opts *opts);

The struct bpf_netkit_opts is done in similar way as struct bpf_tcx_opts
for supporting bpf_mprog control parameters. The attach location for the
primary and peer device is derived from the program section "netkit/primary"
and "netkit/peer", respectively.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-4-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24 16:06:58 -07:00
Daniel Borkmann
5c1b994de4 tools: Sync if_link uapi header
Sync if_link uapi header to the latest version as we need the refresher
in tooling for netkit device. Given it's been a while since the last sync
and the diff is fairly big, it has been done as its own commit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-3-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24 16:06:43 -07:00
Daniel Borkmann
35dfaad718 netkit, bpf: Add bpf programmable net device
This work adds a new, minimal BPF-programmable device called "netkit"
(former PoC code-name "meta") we recently presented at LSF/MM/BPF. The
core idea is that BPF programs are executed within the drivers xmit routine
and therefore e.g. in case of containers/Pods moving BPF processing closer
to the source.

One of the goals was that in case of Pod egress traffic, this allows to
move BPF programs from hostns tcx ingress into the device itself, providing
earlier drop or forward mechanisms, for example, if the BPF program
determines that the skb must be sent out of the node, then a redirect to
the physical device can take place directly without going through per-CPU
backlog queue. This helps to shift processing for such traffic from softirq
to process context, leading to better scheduling decisions/performance (see
measurements in the slides).

In this initial version, the netkit device ships as a pair, but we plan to
extend this further so it can also operate in single device mode. The pair
comes with a primary and a peer device. Only the primary device, typically
residing in hostns, can manage BPF programs for itself and its peer. The
peer device is designated for containers/Pods and cannot attach/detach
BPF programs. Upon the device creation, the user can set the default policy
to 'pass' or 'drop' for the case when no BPF program is attached.

Additionally, the device can be operated in L3 (default) or L2 mode. The
management of BPF programs is done via bpf_mprog, so that multi-attach is
supported right from the beginning with similar API and dependency controls
as tcx. For details on the latter see commit 053c8e1f23 ("bpf: Add generic
attach/detach/query API for multi-progs"). tc BPF compatibility is provided,
so that existing programs can be easily migrated.

Going forward, we plan to use netkit devices in Cilium as the main device
type for connecting Pods. They will be operated in L3 mode in order to
simplify a Pod's neighbor management and the peer will operate in default
drop mode, so that no traffic is leaving between the time when a Pod is
brought up by the CNI plugin and programs attached by the agent.
Additionally, the programs we attach via tcx on the physical devices are
using bpf_redirect_peer() for inbound traffic into netkit device, hence the
latter is also supporting the ndo_get_peer_dev callback. Similarly, we use
bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device
directly. Also, BIG TCP is supported on netkit device. For the follow-up
work in single device mode, we plan to convert Cilium's cilium_host/_net
devices into a single one.

An extensive test suite for checking device operations and the BPF program
and link management API comes as BPF selftests in this series.

Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://github.com/borkmann/iproute2/tree/pr/netkit
Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.)
Link: https://lore.kernel.org/r/20231024214904.29825-2-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24 16:06:03 -07:00
Raghavendra Rao Ananta
62708be351 KVM: selftests: aarch64: vPMU test for validating user accesses
Add a vPMU test scenario to validate the userspace accesses for
the registers PM{C,I}NTEN{SET,CLR} and PMOVS{SET,CLR} to ensure
that KVM honors the architectural definitions of these registers
for a given PMCR.N.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-13-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:31 +00:00
Reiji Watanabe
e1cc872063 KVM: selftests: aarch64: vPMU register test for unimplemented counters
Add a new test case to the vpmu_counter_access test to check
if PMU registers or their bits for unimplemented counters are not
accessible or are RAZ, as expected.

Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-12-rananta@google.com
[Oliver: fix issues relating to exception return address]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:31 +00:00
Reiji Watanabe
ada1ae6826 KVM: selftests: aarch64: vPMU register test for implemented counters
Add a new test case to the vpmu_counter_access test to check if PMU
registers or their bits for implemented counters on the vCPU are
readable/writable as expected, and can be programmed to count events.

Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-11-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:31 +00:00
Reiji Watanabe
8d0aebe1ca KVM: selftests: aarch64: Introduce vpmu_counter_access test
Introduce vpmu_counter_access test for arm64 platforms.
The test configures PMUv3 for a vCPU, sets PMCR_EL0.N for the vCPU,
and check if the guest can consistently see the same number of the
PMU event counters (PMCR_EL0.N) that userspace sets.
This test case is done with each of the PMCR_EL0.N values from
0 to 31 (With the PMCR_EL0.N values greater than the host value,
the test expects KVM_SET_ONE_REG for the PMCR_EL0 to fail).

Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-10-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:31 +00:00
Raghavendra Rao Ananta
9f4b3273df tools: Import arm_pmuv3.h
Import kernel's include/linux/perf/arm_pmuv3.h, with the
definition of PMEVN_SWITCH() additionally including an assert()
for the 'default' case. The following patches will use macros
defined in this header.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-9-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:30 +00:00
Swarup Laxman Kotiaklapudi
37a38e439d selftests: net: change ifconfig with ip command
Change ifconfig with ip command, on a system where ifconfig is
not used this script will not work correcly.

Test result with this patchset:

sudo make TARGETS="net" kselftest
....
TAP version 13
1..1
 timeout set to 1500
 selftests: net: route_localnet.sh
 run arp_announce test
 net.ipv4.conf.veth0.route_localnet = 1
 net.ipv4.conf.veth1.route_localnet = 1
 net.ipv4.conf.veth0.arp_announce = 2
 net.ipv4.conf.veth1.arp_announce = 2
 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84)
  bytes of data.
 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.038 ms
 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.068 ms
 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.068 ms
 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.068 ms
 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.068 ms

 --- 127.25.3.14 ping statistics ---
 5 packets transmitted, 5 received, 0% packet loss, time 4073ms
 rtt min/avg/max/mdev = 0.038/0.062/0.068/0.012 ms
 ok
 run arp_ignore test
 net.ipv4.conf.veth0.route_localnet = 1
 net.ipv4.conf.veth1.route_localnet = 1
 net.ipv4.conf.veth0.arp_ignore = 3
 net.ipv4.conf.veth1.arp_ignore = 3
 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84)
  bytes of data.
 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.032 ms
 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.065 ms
 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.066 ms
 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.065 ms
 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.065 ms

 --- 127.25.3.14 ping statistics ---
 5 packets transmitted, 5 received, 0% packet loss, time 4092ms
 rtt min/avg/max/mdev = 0.032/0.058/0.066/0.013 ms
 ok
ok 1 selftests: net: route_localnet.sh
...

Signed-off-by: Swarup Laxman Kotiaklapudi <swarupkotikalapudi@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20231023123422.2895-1-swarupkotikalapudi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24 13:53:39 -07:00
Davide Caratti
0c63ad3795 tools: ynl-gen: add support for exact-len validation
add support for 'exact-len' validation on netlink attributes.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340
Acked-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-2-16b1f701f900@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24 13:00:31 -07:00
Linus Torvalds
4f82870119 20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues
or aren't considered necessary for earlier kernel versions.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZTfz/QAKCRDdBJ7gKXxA
 joMyAP99hLaLYeJbjlf+4tLJzhlpbVoFra1ieun2D+ZgFE78xQD/T4T3PYrZhYqD
 WdrxGT9fiKOykXM5pmQRH9Zr4EvJBA0=
 =Obbk
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-10-24-09-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "20 hotfixes. 12 are cc:stable and the remainder address post-6.5
  issues or aren't considered necessary for earlier kernel versions"

* tag 'mm-hotfixes-stable-2023-10-24-09-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  maple_tree: add GFP_KERNEL to allocations in mas_expected_entries()
  selftests/mm: include mman header to access MREMAP_DONTUNMAP identifier
  mailmap: correct email aliasing for Oleksij Rempel
  mailmap: map Bartosz's old address to the current one
  mm/damon/sysfs: check DAMOS regions update progress from before_terminate()
  MAINTAINERS: Ondrej has moved
  kasan: disable kasan_non_canonical_hook() for HW tags
  kasan: print the original fault addr when access invalid shadow
  hugetlbfs: close race between MADV_DONTNEED and page fault
  hugetlbfs: extend hugetlb_vma_lock to private VMAs
  hugetlbfs: clear resv_map pointer if mmap fails
  mm: zswap: fix pool refcount bug around shrink_worker()
  mm/migrate: fix do_pages_move for compat pointers
  riscv: fix set_huge_pte_at() for NAPOT mappings when a swap entry is set
  riscv: handle VM_FAULT_[HWPOISON|HWPOISON_LARGE] faults instead of panicking
  mmap: fix error paths with dup_anon_vma()
  mmap: fix vma_iterator in error path of vma_merge()
  mm: fix vm_brk_flags() to not bail out while holding lock
  mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer
  mm/page_alloc: correct start page when guard page debug is enabled
2023-10-24 09:52:16 -10:00
Joao Martins
0795b305da iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag
Change test_mock_dirty_bitmaps() to pass a flag where it specifies the flag
under test. The test does the same thing as the GET_DIRTY_BITMAP regular
test. Except that it tests whether the dirtied bits are fetched all the
same a second time, as opposed to observing them cleared.

Link: https://lore.kernel.org/r/20231024135109.73787-19-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
ae36fe70ce iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO
Enumerate the capabilities from the mock device and test whether it
advertises as expected. Include it as part of the iommufd_dirty_tracking
fixture.

Link: https://lore.kernel.org/r/20231024135109.73787-18-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
a9af47e382 iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP
Add a new test ioctl for simulating the dirty IOVAs in the mock domain, and
implement the mock iommu domain ops that get the dirty tracking supported.

The selftest exercises the usual main workflow of:

1) Setting dirty tracking from the iommu domain
2) Read and clear dirty IOPTEs

Different fixtures will test different IOVA range sizes, that exercise
corner cases of the bitmaps.

Link: https://lore.kernel.org/r/20231024135109.73787-17-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
7adf267d66 iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING
Change mock_domain to supporting dirty tracking and add tests to exercise
the new SET_DIRTY_TRACKING API in the iommufd_dirty_tracking selftest
fixture.

Link: https://lore.kernel.org/r/20231024135109.73787-16-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
266ce58989 iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING
In order to selftest the iommu domain dirty enforcing implement the
mock_domain necessary support and add a new dev_flags to test that the
hwpt_alloc/attach_device fails as expected.

Expand the existing mock_domain fixture with a enforce_dirty test that
exercises the hwpt_alloc and device attachment.

Link: https://lore.kernel.org/r/20231024135109.73787-15-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
e04b23c8d4 iommufd/selftest: Expand mock_domain with dev_flags
Expand mock_domain test to be able to manipulate the device capabilities.
This allows testing with mockdev without dirty tracking support advertised
and thus make sure enforce_dirty test does the expected.

To avoid breaking IOMMUFD_TEST UABI replicate the mock_domain struct and
thus add an input dev_flags at the end.

Link: https://lore.kernel.org/r/20231024135109.73787-14-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Sumeet Pawnikar
956dbd3de4 tools/power/turbostat: Add initial support for LunarLake
Add initial support for LunarLake platform.

It shares the same features with CannonLake.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
2023-10-24 13:38:09 +08:00
Sumeet Pawnikar
7b57e7b683 tools/power/turbostat: Add initial support for ArrowLake
Add initial support for ArrowLake platform.

It shares the same features with CannonLake.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
2023-10-24 13:38:09 +08:00
Zhang Rui
5a6efcb910 tools/power/turbostat: Add initial support for GrandRidge
Add initial support for GrandRidge.

It shares the same features as SierraForest, except that it does not
support PC2/PC6.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-24 13:38:09 +08:00
Zhang Rui
d33605f367 tools/power/turbostat: Add initial support for SierraForest
Add initial support for SierraForest.

It shares the same features with SapphireRapids, except that it has
MSR_MODULE_C6_RES_MS support.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-24 13:38:09 +08:00
Zhang Rui
5feab4a6b8 tools/power/turbostat: Add initial support for GraniteRapids
Add initial support for GraniteRapids.

It shares the same features with SapphireRapids.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-24 13:38:09 +08:00
Zhang Rui
0e3f10e6aa tools/power/turbostat: Add MSR_CORE_C1_RES support for spr_features
Add MSR_CORE_C1_RES support for spr_features because both Sapphirerapids
and Emeraldrapids support this MSR.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-24 13:38:09 +08:00
Srinivas Pandruvada
37f68a2940 tools/power/turbostat: Move process to root cgroup
When available CPUs are reduced via cgroup cpuset controller, turbostat
will exit with errors (For example):
	get_counters: Could not migrate to CPU 0
	turbostat: re-initialized with num_cpus 20
	get_counters: Could not migrate to CPU 0
	turbostat: re-initialized with num_cpus 20

Move the turbostat to root cgroup, which has every CPU.

Writing the value 0 to a cgroup.procs file causes the writing
process to be moved to the corresponding cgroup.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
2023-10-24 13:38:09 +08:00
Zhang Rui
f638858da0 tools/power/turbostat: Handle cgroup v2 cpu limitation
CPUs can be isolated via cgroup settings and turbostat should avoid
migrating to these CPUs, just like it does for the '-c' cpus.

Introduce cpu_effective_set to save the cgroup cpu limitation info from
/sys/fs/cgroup/cpuset.cpus.effective. And use cpu_allowed_set as the
intersection of cpu_present_set, cpu_effective_set and cpu_subset.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-24 13:38:09 +08:00
Zhang Rui
8c3dd2c9e5 tools/power/turbostat: Abstrct function for parsing cpu string
Abstract parse_cpu_str() which can update any specified cpu_set by a
given cpu string. This can be used to handle further CPU limitations
from other sources like cgroup.

The cpu string parsing code is also enhanced to handle the strings that
have an extra '\n' before string terminator.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-24 13:37:46 +08:00
Eduard Zingerman
64870feebe selftests/bpf: test if state loops are detected in a tricky case
A convoluted test case for iterators convergence logic that
demonstrates that states with branch count equal to 0 might still be
a part of not completely explored loop.

E.g. consider the following state diagram:

               initial     Here state 'succ' was processed first,
                 |         it was eventually tracked to produce a
                 V         state identical to 'hdr'.
    .---------> hdr        All branches from 'succ' had been explored
    |            |         and thus 'succ' has its .branches == 0.
    |            V
    |    .------...        Suppose states 'cur' and 'succ' correspond
    |    |       |         to the same instruction + callsites.
    |    V       V         In such case it is necessary to check
    |   ...     ...        whether 'succ' and 'cur' are identical.
    |    |       |         If 'succ' and 'cur' are a part of the same loop
    |    V       V         they have to be compared exactly.
    |   succ <- cur
    |    |
    |    V
    |   ...
    |    |
    '----'

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231024000917.12153-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-23 21:49:32 -07:00
Eduard Zingerman
389ede06c2 selftests/bpf: tests with delayed read/precision makrs in loop body
These test cases try to hide read and precision marks from loop
convergence logic: marks would only be assigned on subsequent loop
iterations or after exploring states pushed to env->head stack first.
Without verifier fix to use exact states comparison logic for
iterators convergence these tests (except 'triple_continue') would be
errorneously marked as safe.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231024000917.12153-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-23 21:49:31 -07:00
Eduard Zingerman
2793a8b015 bpf: exact states comparison for iterator convergence checks
Convergence for open coded iterators is computed in is_state_visited()
by examining states with branches count > 1 and using states_equal().
states_equal() computes sub-state relation using read and precision marks.
Read and precision marks are propagated from children states,
thus are not guaranteed to be complete inside a loop when branches
count > 1. This could be demonstrated using the following unsafe program:

     1. r7 = -16
     2. r6 = bpf_get_prandom_u32()
     3. while (bpf_iter_num_next(&fp[-8])) {
     4.   if (r6 != 42) {
     5.     r7 = -32
     6.     r6 = bpf_get_prandom_u32()
     7.     continue
     8.   }
     9.   r0 = r10
    10.   r0 += r7
    11.   r8 = *(u64 *)(r0 + 0)
    12.   r6 = bpf_get_prandom_u32()
    13. }

Here verifier would first visit path 1-3, create a checkpoint at 3
with r7=-16, continue to 4-7,3 with r7=-32.

Because instructions at 9-12 had not been visitied yet existing
checkpoint at 3 does not have read or precision mark for r7.
Thus states_equal() would return true and verifier would discard
current state, thus unsafe memory access at 11 would not be caught.

This commit fixes this loophole by introducing exact state comparisons
for iterator convergence logic:
- registers are compared using regs_exact() regardless of read or
  precision marks;
- stack slots have to have identical type.

Unfortunately, this is too strict even for simple programs like below:

    i = 0;
    while(iter_next(&it))
      i++;

At each iteration step i++ would produce a new distinct state and
eventually instruction processing limit would be reached.

To avoid such behavior speculatively forget (widen) range for
imprecise scalar registers, if those registers were not precise at the
end of the previous iteration and do not match exactly.

This a conservative heuristic that allows to verify wide range of
programs, however it precludes verification of programs that conjure
an imprecise value on the first loop iteration and use it as precise
on the second.

Test case iter_task_vma_for_each() presents one of such cases:

        unsigned int seen = 0;
        ...
        bpf_for_each(task_vma, vma, task, 0) {
                if (seen >= 1000)
                        break;
                ...
                seen++;
        }

Here clang generates the following code:

<LBB0_4>:
      24:       r8 = r6                          ; stash current value of
                ... body ...                       'seen'
      29:       r1 = r10
      30:       r1 += -0x8
      31:       call bpf_iter_task_vma_next
      32:       r6 += 0x1                        ; seen++;
      33:       if r0 == 0x0 goto +0x2 <LBB0_6>  ; exit on next() == NULL
      34:       r7 += 0x10
      35:       if r8 < 0x3e7 goto -0xc <LBB0_4> ; loop on seen < 1000

<LBB0_6>:
      ... exit ...

Note that counter in r6 is copied to r8 and then incremented,
conditional jump is done using r8. Because of this precision mark for
r6 lags one state behind of precision mark on r8 and widening logic
kicks in.

Adding barrier_var(seen) after conditional is sufficient to force
clang use the same register for both counting and conditional jump.

This issue was discussed in the thread [1] which was started by
Andrew Werner <awerner32@gmail.com> demonstrating a similar bug
in callback functions handling. The callbacks would be addressed
in a followup patch.

[1] https://lore.kernel.org/bpf/97a90da09404c65c8e810cf83c94ac703705dc0e.camel@gmail.com/

Co-developed-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Co-developed-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231024000917.12153-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-23 21:49:31 -07:00
Linus Torvalds
84186fcb83 Urgent pull request for nolibc into v6.6
This pull request contains the following fixes:
 
 o     tools/nolibc: i386: Fix a stack misalign bug on _start
 
 o     MAINTAINERS: nolibc: update tree location
 
 o     tools/nolibc: mark start_c as weak to avoid linker errors
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmUtvC8THHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jMyED/9nsHjSYKUvzdn8kb8Xjr+OkUlx6DCl
 ITRqAScxl/Q+TTAKTSL508b/fVB56+h0ZmqOHeV0+askVI9c3G2wmLYCJ06P1bpI
 siy6pqBtcaDVvU38ielbAVYtAeSahj9Jro44gCwBD9OE2TPi4ehl7PMIsX1vG39a
 hmlbOSw3GG6jFHc5HkTlrOiOy1UB7oIPFI7qfH0XsKJ35vvmDSWPpiHIGwZyx3iv
 hInVPV4kEBREAXONjru7Ginn9dnxZXFqOwQeqW3ZfYudDUHeKzLMrtYsE6pqZxRP
 UEyFiI6bhr2fDPoXHGYSm63OrYAm3uZ0jsgC72ZjrLH2ISY0oCuGGf6HP5SjkRfP
 jPcqMD5h3K+aEHLN3XZV0v3FwelE3qjHVcDQhpu+nJCNDlWK3DNMOjwadynqDOHJ
 5FIbusDk5h4rCgOh607zuRPBv0EmtLw3oXGzBzLl8bqRaj58iZP1Te+tnGZEYVMj
 YydEqXPlKWNaSeBL82gyCpWneYT+vdMjJdbl9b/EKXQxogYLFx4yC4z3h+7K7G56
 InDXbxBu0BIMYMgJTWn7nsBgdjenho4PUrper3v6VMr6TKuXuEzmhpqgovaHLM1g
 ITmdl/+ExPBUBI8u2s1qqLIdEFe5SXkOhFpYnC1E7RsS+GRCho9XCmtfcaPWfiU5
 KcFVXKBlIJ4cNQ==
 =PDcP
 -----END PGP SIGNATURE-----

Merge tag 'urgent/nolibc.2023.10.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull nolibc fixes from Paul McKenney:

 - tools/nolibc: i386: Fix a stack misalign bug on _start

 - MAINTAINERS: nolibc: update tree location

 - tools/nolibc: mark start_c as weak to avoid linker errors

* tag 'urgent/nolibc.2023.10.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  tools/nolibc: mark start_c as weak
  MAINTAINERS: nolibc: update tree location
  tools/nolibc: i386: Fix a stack misalign bug on _start
2023-10-23 14:19:11 -10:00
Jakub Kicinski
4fb56e3e92 Merge branch 'devlink-finish-conversion-to-generated-split_ops'
Jiri Pirko says:

====================
devlink: finish conversion to generated split_ops

This patchset converts the remaining genetlink commands to generated
split_ops and removes the existing small_ops arrays entirely
alongside with shared netlink attribute policy.

Patches #1-#6 are just small preparations and small fixes on multiple
              places. Note that couple of patches contain the "Fixes"
              tag but no need to put them into -net tree.
Patch #7 is a simple rename preparation
Patch #8 is the main one in this set and adds actual definitions of cmds
         in to yaml file.
Patches #9-#10 finalize the change removing bits that are no longer in
               use.
====================

Link: https://lore.kernel.org/r/20231021112711.660606-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 16:16:52 -07:00
Jiri Pirko
f2f9dd164d netlink: specs: devlink: add the remaining command to generate complete split_ops
Currently, some of the commands are not described in devlink yaml file
and are manually filled in net/devlink/netlink.c in small_ops. To make
all part of split_ops, add definitions of the rest of the commands
alongside with needed attributes and enums.

Note that this focuses on the kernel side. The requests are fully
described in order to generate split_op alongside with policies.
Follow-up will describe the replies in order to make the userspace
helpers complete.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231021112711.660606-9-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 16:16:32 -07:00
Jiri Pirko
c48066b0cc netlink: specs: devlink: remove reload-action from devlink-get cmd reply
devlink-get command does not contain reload-action attr in reply.
Remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231021112711.660606-5-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 16:12:46 -07:00
Jiri Pirko
2260d39cd0 tools: ynl-gen: render rsp_parse() helpers if cmd has only dump op
Due to the check in RenderInfo class constructor, type_consistent
flag is set to False to avoid rendering the same response parsing
helper for do and dump ops. However, in case there is no do, the helper
needs to be rendered for dump op. So split check to achieve that.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231021112711.660606-4-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 16:12:46 -07:00
Jiri Pirko
4e2846fd66 tools: ynl-gen: introduce support for bitfield32 attribute type
Introduce support for attribute type bitfield32.
Note that since the generated code works with struct nla_bitfield32,
the generator adds netlink.h to the list of includes for userspace
headers in case any bitfield32 is present.

Note that this is added only to genetlink-legacy scheme as requested
by Jakub Kicinski.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231021112711.660606-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 16:12:46 -07:00
Linus Torvalds
7c14564010 virtio: last minute fixes
a collection of small fixes that look like worth having in
 this release.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmUv/JMPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpO/kH/j/uunE6oOE/BhtfO1USciebjRhLJ7lvoAvS
 OD4/bcA45GRGLGIZaJtkcCIOOb9djUWLsS3QqA2UUFX+NN2/teEX6lsnv1tJTjdC
 a2DkDS6AVYwp+rpzxSE5PUn/ImpiDt0/+R0ZbN56R3rHTOl7nFeXvutMbzxNXZvL
 eWLcSDmRg7nmAdF+YbZ5omdgSL11Wi+dBFEJ0unEsecyu8pO7WcAGYvU6x/x04XJ
 uLrjsaAGKr3rtoLLZ1DtnSmoED/b/lwDwzVR5REsg4kf2aiHxj1+kKGNXfrtqMl5
 2OVxZEorcLufHM212LW4KT3Ncw4KE4xJzjt2mzEwO/ztgtomnBM=
 =Rhxy
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "A collection of small fixes that look like worth having in this
  release"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_pci: fix the common cfg map size
  virtio-crypto: handle config changed by work queue
  vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATE
  vdpa/mlx5: Fix firmware error on creation of 1k VQs
  virtio_balloon: Fix endless deflation and inflation on arm64
  vdpa/mlx5: Fix double release of debugfs entry
  virtio-mmio: fix memory leak of vm_dev
  vdpa_sim_blk: Fix the potential leak of mgmt_dev
  tools/virtio: Add dma sync api for virtio test
2023-10-23 07:42:48 -10:00
Jakub Kicinski
c0119e62b2 tools: ynl-gen: change spacing around __attribute__
checkpatch gets confused and treats __attribute__ as a function call.
It complains about white space before "(":

WARNING:SPACING: space prohibited between function name and open parenthesis '('
+	struct netdev_queue_get_rsp obj __attribute__ ((aligned (8)));

No spaces wins in the kernel:

  $ git grep 'attribute__((.*aligned(' | wc -l
  480
  $ git grep 'attribute__ ((.*aligned (' | wc -l
  110
  $ git grep 'attribute__ ((.*aligned(' | wc -l
  94
  $ git grep 'attribute__((.*aligned (' | wc -l
  63

So, whatever, change the codegen.

Note that checkpatch also thinks we should use __aligned(),
but this is user space code.

Link: https://lore.kernel.org/all/202310190900.9Dzgkbev-lkp@intel.com/
Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231020221827.3436697-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 10:15:36 -07:00
Eric Dumazet
2a7c8d291f tcp: introduce tcp_clock_ms()
It delivers current TCP time stamp in ms unit, and is used
in place of confusing tcp_time_stamp_raw()

It is the same family than tcp_clock_ns() and tcp_clock_ms().

tcp_time_stamp_raw() will be replaced later for TSval
contexts with a more descriptive name.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-23 09:35:01 +01:00
Linus Torvalds
023cc83605 Probes fixes for v6.6-rc6.2:
- kprobe-events: Fix kprobe events to reject if the attached symbol
   is not unique name because it may not the function which the user
   want to attach to. (User can attach a probe to such symbol using
   the nearest unique symbol + offset.)
 
 - selftest: Add a testcase to ensure the kprobe event rejects non
   unique symbol correctly.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmUzdQobHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bMNAH/inFWv8e+rMm8F5Po6ZI
 CmBxuZbxy2l+KfYDjXqSHu7TLKngVd6Bhdb5H2K7fgdwiZxrS0i6qvdppo+Cxgop
 Yod06peDTM80IKavioCcOJOwLPGXXpZkMlK5fdC48HN6vrf9km4vws5ZAagfc1ng
 YhnYm1HHeXcIYwtLkE2dCr6HkwkaOebWTLdZ8c70d1OPw0L9rzxH+edjhKCq8uIw
 6WUg9ERxJYPUuCkQxOxVJrTdzNMRXsgf28FHc0LyYRm8kDpECT2BP6e/Y+TBbsX5
 2pN5cUY5qfI6t3Pc1HDs2KX8ui2QCmj0mCvT0VixhdjThdHpRf0VjIFFAANf3LNO
 XVA=
 =O1Aa
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.6-rc6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - kprobe-events: Fix kprobe events to reject if the attached symbol is
   not unique name because it may not the function which the user want
   to attach to. (User can attach a probe to such symbol using the
   nearest unique symbol + offset.)

 - selftest: Add a testcase to ensure the kprobe event rejects non
   unique symbol correctly.

* tag 'probes-fixes-v6.6-rc6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  selftests/ftrace: Add new test case which checks non unique symbol
  tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols
2023-10-21 11:00:36 -07:00
Pedro Tammela
ee3d122854 selftests: tc-testing: add test for 'rt' upgrade on hfsc
Add a test to check if inner rt curves are upgraded to sc curves.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-21 11:46:41 +01:00
Linus Torvalds
9c5d00cb7b perf tools fixes for v6.6: 2nd batch
- Fix regression in reading scale and unit files from sysfs for PMU
   events, so that we can use that info to pretty print instead of
   printing raw numbers:
 
   # perf stat -e power/energy-ram/,power/energy-gpu/ sleep 2
 
    Performance counter stats for 'system wide':
 
               1.64 Joules power/energy-ram/
               0.20 Joules power/energy-gpu/
 
        2.001228914 seconds time elapsed
   #
   # grep -m1 "model name" /proc/cpuinfo
   model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
   #
 
 - The small llvm.cpp file used to check if the llvm devel files are present was
   incorrectly deleted when removing the BPF event in 'perf trace', put it back
   as it is also used by tools/bpf/bpftool, that uses llvm routines to do
   disassembly of BPF object files.
 
 - Fix use of addr_location__exit() in dlfilter__object_code(), making sure that
   it is only used to pair a previous addr_location__init() call.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZTKh5AAKCRCyPKLppCJ+
 J/g/AP0f6SNyHJz21JzDTzyjXAeSdMzKwic0LXv+kATQy31HJAD+Kf7UKQieUeZB
 fxvp60aKyFN8IVIgpYiAjZMS3k9XPAY=
 =N7Gv
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.6-2-2023-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix regression in reading scale and unit files from sysfs for PMU
   events, so that we can use that info to pretty print instead of
   printing raw numbers:

     # perf stat -e power/energy-ram/,power/energy-gpu/ sleep 2

      Performance counter stats for 'system wide':

                 1.64 Joules power/energy-ram/
                 0.20 Joules power/energy-gpu/

          2.001228914 seconds time elapsed
     #
     # grep -m1 "model name" /proc/cpuinfo
     model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
     #

 - The small llvm.cpp file used to check if the llvm devel files are
   present was incorrectly deleted when removing the BPF event in 'perf
   trace', put it back as it is also used by tools/bpf/bpftool, that
   uses llvm routines to do disassembly of BPF object files.

 - Fix use of addr_location__exit() in dlfilter__object_code(), making
   sure that it is only used to pair a previous addr_location__init()
   call.

* tag 'perf-tools-fixes-for-v6.6-2-2023-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  tools build: Fix llvm feature detection, still used by bpftool
  perf dlfilter: Add a test for object_code()
  perf dlfilter: Fix use of addr_location__exit() in dlfilter__object_code()
  perf pmu: Fix perf stat output with correct scale and unit
2023-10-20 14:49:24 -07:00
Linus Torvalds
444ccf1b11 linux_kselftest_active-fixes-6.6-rc7
This Kselftest update for Linux 6.6-rc7 consists of one single fix
 to assert check in user_events abi_test to properly check bit value
 on Big Endian architectures. The current code treats the bit values
 as Little Endian and the check fails on Big Endian.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmUytR4ACgkQCwJExA0N
 QxzjoQ/+MWgtyggLZo/dP+qJ7AU+tG08YXWWuh99lkSxVs3xHvwl2EGaYf1PXN9c
 ZUXb/KGfls8G4tv20KU2+/uSRSirkf+CFLN6HaBBG+2cun8o0KpHlVKGfmvRjb13
 ZUX8UQJ5u9kTSTqV7gCxVbemV5iOhTazuwVQ1Aq7wFL6e/0oL4eolbgNP0ub76vy
 UsZl9j/8pFhtVfdqRJqorKQ9H5RgvW7CCmobkUruGJoXg7jFuFdeqXtS4U1ziyJT
 g42LtXuC053KrEnEmhj1EadZC4E1eXadffssanfyWolAXjRD3N+2vLYasJOiGDAO
 mT111kQLaRNvZLJBULlrnWITkbVOhfE9Cxu14idxvdLShQiUO8kRCjoN2TbLZ2ET
 n7u/4anHBu99ljO6uZVNe7JY5ZrwaM1o7Myvk9eOVRe4WKkgOXJKUM3lfwvfMK8Q
 sEWvfBY04gL5y697DDiQvJ4g0fRwwnoadpHFvJTJX977H5C8/c750YZw/Emuip9D
 zxyHGEVtncM6awbHP8bGBgg2f6XISWKs5qvzrwLmdXx42oi0VjU4z+cEVOUSx/rY
 K+B+LVzcdYc/6UhjQzmylNBUs5CO7K6D05/tes07QaZ1MCCyw3b+DAR+LJn2U2+C
 Y3+x/kQT16abMIIpTM8qVWqxswvaFSyZ+5hCkVrAsYwkM1fpB4c=
 =PJhV
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest_active-fixes-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fix from Shuah Khan:
 "One single fix to assert check in user_events abi_test to properly
  check bit value on Big Endian architectures. The code treated the bit
  values as Little Endian and the check failed on Big Endian"

* tag 'linux_kselftest_active-fixes-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/user_events: Fix abi_test for BE archs
2023-10-20 14:45:41 -07:00
Hou Tao
d440ba91ca selftests/bpf: Add more test cases for bpf memory allocator
Add the following 3 test cases for bpf memory allocator:
1) Do allocation in bpf program and free through map free
2) Do batch per-cpu allocation and per-cpu free in bpf program
3) Do per-cpu allocation in bpf program and free through map free

For per-cpu allocation, because per-cpu allocation can not refill timely
sometimes, so test 2) and test 3) consider it is OK for
bpf_percpu_obj_new_impl() to return NULL.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231020133202.4043247-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-20 14:15:13 -07:00
Kumar Kartikeya Dwivedi
da1055b673 selftests/bpf: Make linked_list failure test more robust
The linked list failure test 'pop_front_off' and 'pop_back_off'
currently rely on matching exact instruction and register values.  The
purpose of the test is to ensure the offset is correctly incremented for
the returned pointers from list pop helpers, which can then be used with
container_of to obtain the real object. Hence, somehow obtaining the
information that the offset is 48 will work for us. Make the test more
robust by relying on verifier error string of bpf_spin_lock and remove
dependence on fragile instruction index or register number, which can be
affected by different clang versions used to build the selftests.

Fixes: 300f19dcdb ("selftests/bpf: Add BPF linked list API tests")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231020144839.2734006-1-memxor@gmail.com
2023-10-20 09:29:39 -07:00
Francis Laniel
03b80ff802 selftests/ftrace: Add new test case which checks non unique symbol
If name_show() is non unique, this test will try to install a kprobe on this
function which should fail returning EADDRNOTAVAIL.
On kernel where name_show() is not unique, this test is skipped.

Link: https://lore.kernel.org/all/20231020104250.9537-3-flaniel@linux.microsoft.com/

Cc: stable@vger.kernel.org
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-10-20 22:11:49 +09:00
Anup Patel
d9c00f44e5 KVM: riscv: selftests: Add SBI DBCN extension to get-reg-list test
We have a new SBI debug console (DBCN) extension supported by in-kernel
KVM so let us add this extension to get-reg-list test.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-20 16:50:39 +05:30
Josh Poimboeuf
34de4fe7d1 objtool: Fix return thunk patching in retpolines
With CONFIG_RETHUNK enabled, the compiler replaces every RET with a tail
call to a return thunk ('JMP __x86_return_thunk').  Objtool annotates
all such return sites so they can be patched during boot by
apply_returns().

The implementation of __x86_return_thunk() is just a bare RET.  It's
only meant to be used temporarily until apply_returns() patches all
return sites with either a JMP to another return thunk or an actual RET.

Removing the .text..__x86.return_thunk section would break objtool's
detection of return sites in retpolines.  Since retpolines and return
thunks would land in the same section, the compiler no longer uses
relocations for the intra-section jumps between the retpolines and the
return thunk, causing objtool to overlook them.

As a result, none of the retpolines' return sites would get patched.
Each one stays at 'JMP __x86_return_thunk', effectively a bare RET.

Fix it by teaching objtool to detect when a non-relocated jump target is
a return thunk (or retpoline).

  [ bp: Massage the commit message now that the offending commit
    removing the .text..__x86.return_thunk section has been zapped.
    Still keep the objtool change here as it makes objtool more robust
    wrt handling such intra-TU jumps without relocations, should some
    toolchain and/or config generate them in the future. ]

Reported-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231012024737.eg5phclogp67ik6x@treble
2023-10-20 12:51:41 +02:00
Jakub Kicinski
7d4caf54d2 netlink: specs: add support for auto-sized scalars
Support uint / sint types in specs and YNL.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:43:35 +01:00
Jakub Kicinski
e1b347c5f7 tools: ynl-gen: make the mnl_type() method public
uint/sint support will add more logic to mnl_type(),
deduplicate it and make it more accessible.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:43:35 +01:00
Yang Jihong
c4a852635e perf data: Increase RLIMIT_NOFILE limit when open too many files in perf_data__create_dir()
If using parallel threads to collect data, perf record needs at least 6 fds
per CPU. (one for sys_perf_event_open, four for pipe msg and ack of the
pipe, see record__thread_data_open_pipes(), and one for open perf.data.XXX)
For an environment with more than 100 cores, if perf record uses both
`-a` and `--threads` options, it is easy to exceed the upper limit of the
file descriptor number, when we run out of them try to increase the limits.

Before:
  $ ulimit -n
  1024
  $ lscpu | grep 'On-line CPU(s)'
  On-line CPU(s) list:                0-159
  $ perf record --threads -a sleep 1
  Failed to create data directory: Too many open files

After:
  $ ulimit -n
  1024
  $ lscpu | grep 'On-line CPU(s)'
  On-line CPU(s) list:                0-159
  $ perf record --threads -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.394 MB perf.data (1576 samples) ]

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231013075945.698874-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-19 23:38:27 -07:00
Kajol Jain
3f8b6e5b11 perf vendor events: Update PMC used in PM_RUN_INST_CMPL event for power10 platform
The CPI_STALL_RATIO metric group can be used to present the high
level CPI stall breakdown metrics in powerpc, which will show:

- DISPATCH_STALL_CPI ( Dispatch stall cycles per insn )
- ISSUE_STALL_CPI ( Issue stall cycles per insn )
- EXECUTION_STALL_CPI ( Execution stall cycles per insn )
- COMPLETION_STALL_CPI ( Completion stall cycles per insn )

Commit cf26e043c2 ("perf vendor events power10: Add JSON
metric events to present CPI stall cycles in powerpc)" which added
the CPI_STALL_RATIO metric group, also modified
the PMC value used in PM_RUN_INST_CMPL event from PMC4 to PMC5,
to avoid multiplexing of events.
But that got revert in recent changes. Fix this issue by changing
back the PMC value used in PM_RUN_INST_CMPL to PMC5.

Result with the fix:

 ./perf stat --metric-no-group -M CPI_STALL_RATIO <workload>

 Performance counter stats for 'workload':

        68,745,426      PM_CMPL_STALL                    #     0.21 COMPLETION_STALL_CPI
         7,692,827      PM_ISSUE_STALL                   #     0.02 ISSUE_STALL_CPI
       322,638,223      PM_RUN_INST_CMPL                 #     0.05 DISPATCH_STALL_CPI
                                                  #     0.48 EXECUTION_STALL_CPI
        16,858,553      PM_DISP_STALL_CYC
       153,880,133      PM_EXEC_STALL

       0.089774592 seconds time elapsed

"--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled
in all group for more accuracy.

Fixes: 7d473f475b ("perf vendor events: Move JSON/events to appropriate files for power10 platform")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Cc: maddy@linux.ibm.com
Link: https://lore.kernel.org/r/20231016143110.244255-1-kjain@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-19 23:35:20 -07:00
Zhang Rui
c25ef0e5d9 tools/power/turbostat: Handle offlined CPUs in cpu_subset
It is possible that the cpu_subset contains offlined CPUs.

If this happens during start, exit immediately because this is likely an
operator error that is best fixed by re-invoking.
If this happens at runtime, give a warning only because turbostat should
do its best effort to continue running.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-20 14:21:27 +08:00
Zhang Rui
0fe3752901 tools/power/turbostat: Obey allowed CPUs for system summary
System summary should summarize the information for allowed CPUs instead
of all the present CPUs.

Introduce topology information for allowed CPUs, and use them to
get system summary.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-20 14:20:07 +08:00
Greg Kroah-Hartman
b4b6cc10c6 IIO: 1st set of new device support, features and cleanup for 6.7
Particularly great to see a resolver driver move out of staging via a
 massive set of changes.  Only took 13 years :)
 
 One small patch added then reverted due to a report of test breakage
 (ashai-kasei,ak8975: Drop deprecated enums.)
 
 An immutable branch was used for some hid-senors changes in case
 there was a need to take them into the HID tree as well.
 
 New device support
 -----------------
 
 adi,hmc425a
   - Add support for HMC540SLP3E broadband 4-bit digital attenuator.
 kionix,kx022a
   - Add support for the kx132-1211 accelerometer. Require significant
     driver rework to enable this including add a chip type specific
     structure to deal with the chip differences.
   - Add support for the kx132acr-lbz accelerometer (subset of the kx022a
     feature set).
 lltc,ltc2309
   - New driver for this 8 channel ADC.
 microchip,mcp3911
   - Add support for rest of mcp391x family of ADCs (there are various
     differences beyond simple channel count variation.
     Series includes some general driver cleanup.
 microchip,mcp3564
   - New driver for MCP3461, MCP3462, MCP3464, MCP3541, MCP3562, MCP3564
     and their R variants of 16/24bit ADCs. A few minor fixed followed.
 rohm,bu1390
   - New driver for this pressure sensor.
 
 Staging graduation
 ------------------
 
 adi,ad1210 (after 13 or so years :)
   - More or less a complete (step-wise) rewrite of this resolver driver
     to bring it up to date with modern IIO standards.  The fault signal
     handling mapping to event channels was particularly complex and
     significant part of the changes.
 
 Features
 --------
 
 iio-core
  - Add chromacity and color temperature channel types.
 adi,ad7192
   - Oversampling ratio control (called fast settling in datasheet).
 adi,adis16475
   - Add core support and then driver support for delta angle and delta
     velocity channels. These are intended for summation to establish
     angle and velocity changes over larger timescales.  Fix was
     needed for alignment after the temperature channel.  Further fix
     reduced set of devices for which the buffer support was applicable
     as seems burst reads don't cover these on all devices.
 hid-sensors-als
   - Chromacity and color temperatures support including in amd sfh.
 stx104
   - Add support for counter subsystem to this multipurpose device.
 ti,twl6030
   - Add missing device tree binding description.
 
 Clean up and minor fixes.
 ------------------------
 
 treewide
   - Drop some unused declarations across IIO.
   - Make more use of device_get_match_data() instead of OF specific
     approaches.
 Similar cleanup to sets of drivers.
   - Stop platform remove callbacks returning anything by using the
     temporary remove_new() callback.
   - Use i2c_get_match_data() to cope nicely with all types of ID table
     entry.
   - Use device_get_match_data() for various platform device to cope
     with more types of firmware.
   - Convert from enum to pointer in ID tables allowing use of
     i2c_get_match_data().
   - Fix sorting on some ID tables.
   - Include specific string helper headers rather than simply string_helpers.h
 docs
   - Better description of the ordering requirements etc for
     available_scan_masks.
 tools
   - Handle alignment of mixed sizes where the last element isn't the biggest
     correctly. Seems that doesn't happen often!
 adi,ad2s1210
   - Lots of work from David Lechner on this driver including a few fixes
     that are going with the rework to avoid slowing that down.
 adi,ad4310
   - Replace deprecated devm_clk_register()
 adi,ad74413r
   - Bring the channel function setting inline with the datasheet.
 adi,ad7192
   - Change to FIELD_PREP(), FIELD_GET().
   - Calculate f_order from the sinc filter and chop filter states.
   - Move more per chip config into data in struct ad7192_chip_info
   - Cleanup unused parameter in channel macros.
 adi,adf4350
   - Make use of devm_* to simplify error handling for many of the setup
     calls in probe() / tear down in remove() and error paths.  Some more
     work to be done on this one.
   - Use dev_err_probe() for errors in probe() callback.
 adi,adf4413
   - Typo in function name prefix.
 adi,adxl345
   - Add channel scale to the chip type specific structure and drop
     using a type field previously used for indirection.
 asahi,ak8985
   - Fix a mismatch introduced when switching from enum->pointers
     in the match tables.
 amlogic,meson
   - Expand error logging during probe.
 invensense,mpu6050
   - Support level-shifter control. Whilst no one is sure exactly what this
     is doing it is needed for some old boards.
   - Document mount-matrix dt-binding.
 mediatek,mt6577
   - Use devm_clk_get_enabled() to replace open coded version and move
     everything over to being device managed. Drop now empty remove()
     callback. Fix follows to put the drvdata back.
   - Use dev_err_probe() for error reporting in probe() callback.
 memsic,mxc4005
   - Add of_match_table.
 microchip,mcp4725
   - Move various chip specific data from being looked up by chip ID to
     data in the chip type specific structure.
 silicon-labs,si7005
   - Add of_match_table and entry in trivial-devices.yaml
 st,lsm6dsx
   - Add missing mount-matrix dt binding documentation.
 st,spear
   - Use devm_clk_get_enabled() and some other devm calls to move everything
     over to being device managed.  Drop now empty remove() callback.
   - Use dev_err_probe() to better handled deferred probing and tidy up
     error reporting in probe() callback.
 st,stm32-adc
   - Add a bit of additional checking in probe() to protect against a NULL
     pointer (no known path to trigger it today).
   - Replace deprecated strncpy()
 ti,ads1015
   - Allow for edge triggers.
   - Document interrupt in dt-bindings.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmUu3sURHGppYzIzQGtl
 cm5lbC5vcmcACgkQVIU0mcT0FohL9w/+PVaeiYsmuJfWcxjVhiGFz+HBbwSFcHFp
 ASXuYq9rYoea62JlvcLB3GJ43ziB80Am5qA5bWTkPHFqbAPZlgDzoGZGPMrpjfI1
 iV6NwiPigRRTw2JjB0TDS8HepQomA1qA0FwXngLrSy1eQmN0/NJOp0k8m54OpCV+
 FdW3dDy7UomXyVCb+OAOWNYvV20ZOK1/WK9yWCpPWZtOKMfX2cLLqiwtjm6VdXTg
 wtqSRVfBB/p7k3UapXiLuz4fExEancW4z2qYEaBK2beba6LFuFzvfwq/t6CJwVWD
 wd6mif+1eTtR9wxZcsmefsbB6r9zOd4eWRaCBjmW3fm9xNY2UDzWT29YpyDOyA8R
 llu44gDtlJvDpyUYi44rLDyZO886fJtlVLyOHOaVJy77SV+so16P9qC2qd9dJXtw
 8/exWxNmiA/LwVq+SvvVgfY33yCynKv5St1cHMDDFzC1eZMnVGAaZ4HBp1wiGGuN
 T1ZWKMDWViH5F7ug2pKopKoemQyhmsa8JbUQc+NZS+5efdn6LlVhe1BVZ+4/A3jb
 BQ838lkl3O8D0XkS7urM6Ggs8m/D0eBLBpfgID/C4OaEJnn6G1ZVvNue5r99SNws
 JV9T7zJQ8G1NKl3s1+JrwBf7XeSKtlZa0cbWejbe4Ib2u+G9M863YS0Ivrk5yH+C
 XIojuxjOOmQ=
 =wth0
 -----END PGP SIGNATURE-----

Merge tag 'iio-for-6.7a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next

Jonathan writes:

IIO: 1st set of new device support, features and cleanup for 6.7

Particularly great to see a resolver driver move out of staging via a
massive set of changes.  Only took 13 years :)

One small patch added then reverted due to a report of test breakage
(ashai-kasei,ak8975: Drop deprecated enums.)

An immutable branch was used for some hid-senors changes in case
there was a need to take them into the HID tree as well.

New device support
-----------------

adi,hmc425a
  - Add support for HMC540SLP3E broadband 4-bit digital attenuator.
kionix,kx022a
  - Add support for the kx132-1211 accelerometer. Require significant
    driver rework to enable this including add a chip type specific
    structure to deal with the chip differences.
  - Add support for the kx132acr-lbz accelerometer (subset of the kx022a
    feature set).
lltc,ltc2309
  - New driver for this 8 channel ADC.
microchip,mcp3911
  - Add support for rest of mcp391x family of ADCs (there are various
    differences beyond simple channel count variation.
    Series includes some general driver cleanup.
microchip,mcp3564
  - New driver for MCP3461, MCP3462, MCP3464, MCP3541, MCP3562, MCP3564
    and their R variants of 16/24bit ADCs. A few minor fixed followed.
rohm,bu1390
  - New driver for this pressure sensor.

Staging graduation
------------------

adi,ad1210 (after 13 or so years :)
  - More or less a complete (step-wise) rewrite of this resolver driver
    to bring it up to date with modern IIO standards.  The fault signal
    handling mapping to event channels was particularly complex and
    significant part of the changes.

Features
--------

iio-core
 - Add chromacity and color temperature channel types.
adi,ad7192
  - Oversampling ratio control (called fast settling in datasheet).
adi,adis16475
  - Add core support and then driver support for delta angle and delta
    velocity channels. These are intended for summation to establish
    angle and velocity changes over larger timescales.  Fix was
    needed for alignment after the temperature channel.  Further fix
    reduced set of devices for which the buffer support was applicable
    as seems burst reads don't cover these on all devices.
hid-sensors-als
  - Chromacity and color temperatures support including in amd sfh.
stx104
  - Add support for counter subsystem to this multipurpose device.
ti,twl6030
  - Add missing device tree binding description.

Clean up and minor fixes.
------------------------

treewide
  - Drop some unused declarations across IIO.
  - Make more use of device_get_match_data() instead of OF specific
    approaches.
Similar cleanup to sets of drivers.
  - Stop platform remove callbacks returning anything by using the
    temporary remove_new() callback.
  - Use i2c_get_match_data() to cope nicely with all types of ID table
    entry.
  - Use device_get_match_data() for various platform device to cope
    with more types of firmware.
  - Convert from enum to pointer in ID tables allowing use of
    i2c_get_match_data().
  - Fix sorting on some ID tables.
  - Include specific string helper headers rather than simply string_helpers.h
docs
  - Better description of the ordering requirements etc for
    available_scan_masks.
tools
  - Handle alignment of mixed sizes where the last element isn't the biggest
    correctly. Seems that doesn't happen often!
adi,ad2s1210
  - Lots of work from David Lechner on this driver including a few fixes
    that are going with the rework to avoid slowing that down.
adi,ad4310
  - Replace deprecated devm_clk_register()
adi,ad74413r
  - Bring the channel function setting inline with the datasheet.
adi,ad7192
  - Change to FIELD_PREP(), FIELD_GET().
  - Calculate f_order from the sinc filter and chop filter states.
  - Move more per chip config into data in struct ad7192_chip_info
  - Cleanup unused parameter in channel macros.
adi,adf4350
  - Make use of devm_* to simplify error handling for many of the setup
    calls in probe() / tear down in remove() and error paths.  Some more
    work to be done on this one.
  - Use dev_err_probe() for errors in probe() callback.
adi,adf4413
  - Typo in function name prefix.
adi,adxl345
  - Add channel scale to the chip type specific structure and drop
    using a type field previously used for indirection.
asahi,ak8985
  - Fix a mismatch introduced when switching from enum->pointers
    in the match tables.
amlogic,meson
  - Expand error logging during probe.
invensense,mpu6050
  - Support level-shifter control. Whilst no one is sure exactly what this
    is doing it is needed for some old boards.
  - Document mount-matrix dt-binding.
mediatek,mt6577
  - Use devm_clk_get_enabled() to replace open coded version and move
    everything over to being device managed. Drop now empty remove()
    callback. Fix follows to put the drvdata back.
  - Use dev_err_probe() for error reporting in probe() callback.
memsic,mxc4005
  - Add of_match_table.
microchip,mcp4725
  - Move various chip specific data from being looked up by chip ID to
    data in the chip type specific structure.
silicon-labs,si7005
  - Add of_match_table and entry in trivial-devices.yaml
st,lsm6dsx
  - Add missing mount-matrix dt binding documentation.
st,spear
  - Use devm_clk_get_enabled() and some other devm calls to move everything
    over to being device managed.  Drop now empty remove() callback.
  - Use dev_err_probe() to better handled deferred probing and tidy up
    error reporting in probe() callback.
st,stm32-adc
  - Add a bit of additional checking in probe() to protect against a NULL
    pointer (no known path to trigger it today).
  - Replace deprecated strncpy()
ti,ads1015
  - Allow for edge triggers.
  - Document interrupt in dt-bindings.

* tag 'iio-for-6.7a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (201 commits)
  iio: Use device_get_match_data()
  iio: adc: MCP3564: fix warn: unsigned '__x' is never less than zero.
  dt-bindings: trivial-devices: add silabs,si7005
  iio: si7005: Add device tree support
  drivers: imu: adis16475.c: Remove scan index from delta channels
  dt-bindings: iio: imu: st,lsm6dsx: add mount-matrix property
  iio: resolver: ad2s1210: remove of_match_ptr()
  iio: resolver: ad2s1210: remove DRV_NAME macro
  iio: resolver: ad2s1210: move out of staging
  staging: iio: resolver: ad2s1210: simplify code with guard(mutex)
  staging: iio: resolver: ad2s1210: clear faults after soft reset
  staging: iio: resolver: ad2s1210: refactor sample toggle
  staging: iio: resolver: ad2s1210: remove fault attribute
  staging: iio: resolver: ad2s1210: add label attribute support
  staging: iio: resolver: ad2s1210: add register/fault support summary
  staging: iio: resolver: ad2s1210: implement fault events
  iio: event: add optional event label support
  staging: iio: resolver: ad2s1210: rename DOS reset min/max attrs
  staging: iio: resolver: ad2s1210: convert DOS mismatch threshold to event attr
  staging: iio: resolver: ad2s1210: convert DOS overrange threshold to event attr
  ...
2023-10-20 07:54:15 +02:00
Thomas Richter
5069211e2f perf trace: Use the right bpf_probe_read(_str) variant for reading user data
Perf test case 111 Check open filename arg using perf trace + vfs_getname
fails on s390. This is caused by a failing function
bpf_probe_read() in file util/bpf_skel/augmented_raw_syscalls.bpf.c.

The root cause is the lookup by address. Function bpf_probe_read()
is used. This function works only for architectures
with ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

On s390 is not possible to determine from the address to which
address space the address belongs to (user or kernel space).

Replace bpf_probe_read() by bpf_probe_read_kernel()
and bpf_probe_read_str() by bpf_probe_read_user_str() to
explicity specify the address space the address refers to.

Output before:
 # ./perf trace -eopen,openat -- touch /tmp/111
 libbpf: prog 'sys_enter': BPF program load failed: Invalid argument
 libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
 reg type unsupported for arg#0 function sys_enter#75
 0: R1=ctx(off=0,imm=0) R10=fp0
 ; int sys_enter(struct syscall_enter_args *args)
 0: (bf) r6 = r1           ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
 ; return bpf_get_current_pid_tgid();
 1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
 2: (63) *(u32 *)(r10 -8) = r0 ; R0_w=scalar() R10=fp0 fp-8=????mmmm
 3: (bf) r2 = r10              ; R2_w=fp0 R10=fp0
 ;
 .....
 lines deleted here
 .....
 23: (bf) r3 = r6              ; R3_w=ctx(off=0,imm=0) R6=ctx(off=0,imm=0)
 24: (85) call bpf_probe_read#4
 unknown func bpf_probe_read#4
 processed 23 insns (limit 1000000) max_states_per_insn 0 \
	 total_states 2 peak_states 2 mark_read 2
 -- END PROG LOAD LOG --
 libbpf: prog 'sys_enter': failed to load: -22
 libbpf: failed to load object 'augmented_raw_syscalls_bpf'
 libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -22
 ....

Output after:
 # ./perf test -Fv 111
 111: Check open filename arg using perf trace + vfs_getname          :
 --- start ---
     1.085 ( 0.011 ms): touch/320753 openat(dfd: CWD, filename: \
	"/tmp/temporary_file.SWH85", \
	flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
 ---- end ----
 Check open filename arg using perf trace + vfs_getname: Ok
 #

Test with the sleep command shows:
Output before:
 # ./perf trace -e *sleep sleep 1.234567890
     0.000 (1234.681 ms): sleep/63114 clock_nanosleep(rqtp: \
         { .tv_sec: 0, .tv_nsec: 0 }, rmtp: 0x3ffe0979720) = 0
 #

Output after:
 # ./perf trace -e *sleep sleep 1.234567890
     0.000 (1234.686 ms): sleep/64277 clock_nanosleep(rqtp: \
         { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x3fff3df9ea0) = 0
 #

Fixes: 14e4b9f428 ("perf trace: Raw augmented syscalls fix libbpf 1.0+ compatibility")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Cc: svens@linux.ibm.com
Link: https://lore.kernel.org/r/20231019082642.3286650-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-19 22:41:46 -07:00
Zhang Rui
ccf8a05280 tools/power/turbostat: Obey allowed CPUs for primary thread/core detection
Thread_id doesn't tell if a CPU is allowed or not.

Detect allowed CPUs only and use the first detected thread/core as the
primary thread/core of a core/package.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-20 10:27:09 +08:00
Zhang Rui
74318add13 tools/power/turbostat: Abstract several functions
When detecting the primary thread/core in a core/package, current code
doesn't handle the allowed CPUs.

Abstract several functions for further fix of this issue.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-20 10:27:09 +08:00
Zhang Rui
7bb3fe27ad tools/power/turbostat: Obey allowed CPUs during startup
Set turbostat CPU affinity to make sure turbostat is running on one of
the allowed CPUs.

Set base_cpu to the first allowed CPU so that some platform information
is dumped using one of the allowed CPUs.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-20 10:27:09 +08:00
Zhang Rui
4ede6d1ce7 tools/power/turbostat: Obey allowed CPUs when accessing CPU counters
for_all_cpus/for_all_cpus_2 are used for accessing the per CPU counters,
and they should follow the cpu_allowed_set instead of cpu_present_set.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-20 10:27:09 +08:00
Zhang Rui
71cfd1da9f tools/power/turbostat: Introduce cpu_allowed_set
Turbostat supports "-c" parameter which limits output to system summary
plus the specified cpu-set. But some code still uses cpu_present_set to
read and dump the counters.

Introduce cpu_allowed_set for code that should obey the specified cpu-set.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2023-10-20 10:24:00 +08:00
Arnaldo Carvalho de Melo
4fa008a2db tools build: Fix llvm feature detection, still used by bpftool
When removing the BPF event for perf a feature test that checks if the
llvm devel files are availabe was removed but that is also used by
bpftool.

bpftool uses it to decide what kind of disassembly it will use: llvm or
binutils based.

Removing the tools/build/feature/test-llvm.cpp file made bpftool to
always fallback to binutils disassembly, even with the llvm devel files
installed, fix it by restoring just that small test-llvm.cpp test file.

Fixes: 56b11a2126 ("perf bpf: Remove support for embedding clang for compiling BPF events (-e foo.c)")
Reported-by: Manu Bretelle <chantr4@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Manu Bretelle <chantr4@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/lkml/ZTGa0Ukt7QyxWcVy@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-10-19 22:33:58 -03:00
Chuyi Zhou
130e0f7af9 selftests/bpf: Add tests for open-coded task and css iter
This patch adds 4 subtests to demonstrate these patterns and validating
correctness.

subtest1:

1) We use task_iter to iterate all process in the system and search for the
current process with a given pid.

2) We create some threads in current process context, and use
BPF_TASK_ITER_PROC_THREADS to iterate all threads of current process. As
expected, we would find all the threads of current process.

3) We create some threads and use BPF_TASK_ITER_ALL_THREADS to iterate all
threads in the system. As expected, we would find all the threads which was
created.

subtest2:

We create a cgroup and add the current task to the cgroup. In the
BPF program, we would use bpf_for_each(css_task, task, css) to iterate all
tasks under the cgroup. As expected, we would find the current process.

subtest3:

1) We create a cgroup tree. In the BPF program, we use
bpf_for_each(css, pos, root, XXX) to iterate all descendant under the root
with pre and post order. As expected, we would find all descendant and the
last iterating cgroup in post-order is root cgroup, the first iterating
cgroup in pre-order is root cgroup.

2) We wse BPF_CGROUP_ITER_ANCESTORS_UP to traverse the cgroup tree starting
from leaf and root separately, and record the height. The diff of the
hights would be the total tree-high - 1.

subtest4:

Add some failure testcase when using css_task, task and css iters, e.g,
unlock when using task-iters to iterate tasks.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Link: https://lore.kernel.org/r/20231018061746.111364-9-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:47 -07:00
Chuyi Zhou
ddab78cbb5 selftests/bpf: rename bpf_iter_task.c to bpf_iter_tasks.c
The newly-added struct bpf_iter_task has a name collision with a selftest
for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
renamed in order to avoid the collision.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-8-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:47 -07:00
Chuyi Zhou
7251d0905e bpf: Introduce css open-coded iterator kfuncs
This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css in open-coded iterator
style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
css_iter can be used to:

1) iterating a sepcific cgroup tree with pre/post/up order

2) iterating cgroup_subsystem in BPF Prog, like
for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.

The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
parameters defining iteration order and starting css. Here we also reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
BPF_CGROUP_ITER_ANCESTORS_UP enums.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-5-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:46 -07:00
Chuyi Zhou
c68a78ffe2 bpf: Introduce task open coded iterator kfuncs
This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_task in open-coded iterator
style. BPF programs can use these kfuncs or through bpf_for_each macro to
iterate all processes in the system.

The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
accepts a specific task and iterating type which allows:

1. iterating all process in the system (BPF_TASK_ITER_ALL_PROCS)

2. iterating all threads in the system (BPF_TASK_ITER_ALL_THREADS)

3. iterating all threads of a specific task (BPF_TASK_ITER_PROC_THREADS)

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Link: https://lore.kernel.org/r/20231018061746.111364-4-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:46 -07:00
Chuyi Zhou
9c66dc94b6 bpf: Introduce css_task open-coded iterator kfuncs
This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css_task in open-coded
iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
end}. BPF programs can use these kfuncs through bpf_for_each macro for
iteration of all tasks under a css.

css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
the bpf side has to be careful in where it allows to use this iter.
Currently we only allow it in bpf_lsm and bpf iter-s.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-3-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:46 -07:00
Jakub Kicinski
f9bc3cbc20 tools: ynl-gen: support limit names
Support the use of symbolic names like s8-min or u32-max in checks
to make writing specs less painful.

Link: https://lore.kernel.org/r/20231018163917.2514503-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:56 -07:00
Jakub Kicinski
668c1ac828 tools: ynl-gen: support full range of min/max checks for integer values
Extend the support to full range of min/max checks.
None of the existing YNL families required complex integer validation.

The support is less than trivial, because we try to keep struct nla_policy
tiny the min/max members it holds in place are s16. Meaning we can only
express checks in range of s16. For larger ranges we need to define
a structure and link it in the policy.

Link: https://lore.kernel.org/r/20231018163917.2514503-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:56 -07:00
Jakub Kicinski
ee0a4cfcbd tools: ynl-gen: track attribute use
For range validation we'll need to know if any individual
attribute is used on input (i.e. whether we will generate
a policy for it). Track this information.

Link: https://lore.kernel.org/r/20231018163917.2514503-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:56 -07:00
Breno Leitao
b9ec913212 selftests/bpf/sockopt: Add io_uring support
Expand the sockopt test to use also check for io_uring {g,s}etsockopt
commands operations.

This patch starts by marking each test if they support io_uring support
or not.

Right now, io_uring cmd getsockopt() has a limitation of only
accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP. Since
there aren't any test exercising getsockopt(level == SOL_SOCKET), this
patch changes two tests to use level == SOL_SOCKET, they are
"getsockopt: support smaller ctx->optlen" and "getsockopt: read
ctx->optlen".
There is no limitation for the setsockopt() part.

Later, each test runs using regular {g,s}etsockopt systemcalls, and, if
liburing is supported, execute the same test (again), but calling
liburing {g,s}setsockopt commands.

This patch also changes the level of two tests to use SOL_SOCKET for the
following two tests. This is going to help to exercise the io_uring
subsystem:
 * getsockopt: read ctx->optlen
 * getsockopt: support smaller ctx->optlen

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-12-leitao@debian.org
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-19 16:42:04 -06:00
Breno Leitao
ba6e0e5cb5 selftests/net: Extract uring helpers to be reusable
Instead of defining basic io_uring functions in the test case, move them
to a common directory, so, other tests can use them.

This simplify the test code and reuse the common liburing
infrastructure. This is basically a copy of what we have in
io_uring_zerocopy_tx with some minor improvements to make checkpatch
happy.

A follow-up test will use the same helpers in a BPF sockopt test.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-8-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-19 16:42:03 -06:00
Breno Leitao
7746a6adfc tools headers: Grab copy of io_uring.h
This file will be used by mini_uring.h and allow tests to run without
the need of installing liburing to run the tests.

This is needed to run io_uring tests in BPF, such as
(tools/testing/selftests/bpf/prog_tests/sockopt.c).

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-7-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-19 16:42:03 -06:00
Jakub Kicinski
041c3466f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

net/mac80211/key.c
  02e0e426a2 ("wifi: mac80211: fix error path key leak")
  2a8b665e6b ("wifi: mac80211: remove key_mtx")
  7d6904bf26 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/ti/Kconfig
  a602ee3176 ("net: ethernet: ti: Fix mixed module-builtin object")
  98bdeae950 ("net: cpmac: remove driver to prepare for platform removal")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 13:29:01 -07:00
Linus Torvalds
ce55c22ec8 Including fixes from bluetooth, netfilter, WiFi.
Feels like an up-tick in regression fixes, mostly for older releases.
 The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as fairly
 clear-cut user reported regressions. The mlx5 DMA bug was causing strife
 for 390x folks. The fixes themselves are not particularly scary, tho.
 No open investigations / outstanding reports at the time of writing.
 
 Current release - regressions:
 
  - eth: mlx5: perform DMA operations in the right locations,
    make devices usable on s390x, again
 
  - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve,
    previous fix of rejecting invalid config broke some scripts
 
  - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock
 
  - revert "ethtool: Fix mod state of verbose no_mask bitset",
    needs more work
 
 Current release - new code bugs:
 
  - tcp: fix listen() warning with v4-mapped-v6 address
 
 Previous releases - regressions:
 
  - tcp: allow tcp_disconnect() again when threads are waiting,
    it was denied to plug a constant source of bugs but turns out
    .NET depends on it
 
  - eth: mlx5: fix double-free if buffer refill fails under OOM
 
  - revert "net: wwan: iosm: enable runtime pm support for 7560",
    it's causing regressions and the WWAN team at Intel disappeared
 
  - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains
    a single skb, fix single-stream perf regression on some devices
 
 Previous releases - always broken:
 
  - Bluetooth:
    - fix issues in legacy BR/EDR PIN code pairing
    - correctly bounds check and pad HCI_MON_NEW_INDEX name
 
  - netfilter:
    - more fixes / follow ups for the large "commit protocol" rework,
      which went in as a fix to 6.5
    - fix null-derefs on netlink attrs which user may not pass in
 
  - tcp: fix excessive TLP and RACK timeouts from HZ rounding
    (bless Debian for keeping HZ=250 alive)
 
  - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
    letting frankenstein UDP super-frames from getting into the stack
 
  - net: fix interface altnames when ifc moves to a new namespace
 
  - eth: qed: fix the size of the RX buffers
 
  - mptcp: avoid sending RST when closing the initial subflow
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmUxawUACgkQMUZtbf5S
 Irt75w/+MC2ilnFQhfoQqpCxL2uHWOKzhenUz2PoNKR60OKgLLmTq8YBYcpyCVAQ
 wBQaNzNu1/yjF5BG7aNS/j0suGeJsOfhfoahQvXlLaat9NuDqxTpaeoT5FZ7eNQw
 RZ8+CJug3BRDV0TkaJH9UDVC/nfJTsnsGWNIhNXYGPsuveqAUun+xrnN8ZbvZIrn
 6D9rMF+u9SdVO+ANCquXBC7+CWEWiJS1ljUrU7BRNiv/9FSlnPQtjOdpuKleeBO8
 4usMS7TezHNgRdiAKC8GjSGUiIkIIMJT4y4wuczBEQAD4Pkki9UpBrui97ozOj7h
 W4N7UOuPlUBIardvKNoYz9rZyiFXBcPPm0GruHiuCqpxyqmzoFgv2XJsb/6KfzNn
 Dyro+lvh8smtbFHvFqiwaNu5y8ucfClaowvR4gjSe2KcB7hIpwNkh6vWC6OMGJK3
 hiKHnDrnXBQMbnP1YfiJ4feLmm3UYCG8eFdv/ULZT0a9TzZ7fKfzAfywUwD+/O8Y
 +S28Hr9srdDCHO7ih/gF3Wq9wtnnLy8QEkpt7cpXjRDj0uWH8JkHU3YEIGF2814Y
 LNVGmX9y6RcgrHNp03K1PmUcgAzhTTuV9QRoQKEucuBT5AK9ALDQ8YomnWDWDgrp
 UOdJPi1RUTsmqslADF15wZ9W5Ki/cDnUJsE4HU/MtnM2w95C49Q=
 =z1F6
 -----END PGP SIGNATURE-----

Merge tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, netfilter, WiFi.

  Feels like an up-tick in regression fixes, mostly for older releases.
  The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
  fairly clear-cut user reported regressions. The mlx5 DMA bug was
  causing strife for 390x folks. The fixes themselves are not
  particularly scary, tho. No open investigations / outstanding reports
  at the time of writing.

  Current release - regressions:

   - eth: mlx5: perform DMA operations in the right locations, make
     devices usable on s390x, again

   - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
     curve, previous fix of rejecting invalid config broke some scripts

   - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock

   - revert "ethtool: Fix mod state of verbose no_mask bitset", needs
     more work

  Current release - new code bugs:

   - tcp: fix listen() warning with v4-mapped-v6 address

  Previous releases - regressions:

   - tcp: allow tcp_disconnect() again when threads are waiting, it was
     denied to plug a constant source of bugs but turns out .NET depends
     on it

   - eth: mlx5: fix double-free if buffer refill fails under OOM

   - revert "net: wwan: iosm: enable runtime pm support for 7560", it's
     causing regressions and the WWAN team at Intel disappeared

   - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
     single skb, fix single-stream perf regression on some devices

  Previous releases - always broken:

   - Bluetooth:
      - fix issues in legacy BR/EDR PIN code pairing
      - correctly bounds check and pad HCI_MON_NEW_INDEX name

   - netfilter:
      - more fixes / follow ups for the large "commit protocol" rework,
        which went in as a fix to 6.5
      - fix null-derefs on netlink attrs which user may not pass in

   - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
     Debian for keeping HZ=250 alive)

   - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
     letting frankenstein UDP super-frames from getting into the stack

   - net: fix interface altnames when ifc moves to a new namespace

   - eth: qed: fix the size of the RX buffers

   - mptcp: avoid sending RST when closing the initial subflow"

* tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
  Revert "ethtool: Fix mod state of verbose no_mask bitset"
  selftests: mptcp: join: no RST when rm subflow/addr
  mptcp: avoid sending RST when closing the initial subflow
  mptcp: more conservative check for zero probes
  tcp: check mptcp-level constraints for backlog coalescing
  selftests: mptcp: join: correctly check for no RST
  net: ti: icssg-prueth: Fix r30 CMDs bitmasks
  selftests: net: add very basic test for netdev names and namespaces
  net: move altnames together with the netdevice
  net: avoid UAF on deleted altname
  net: check for altname conflicts when changing netdev's netns
  net: fix ifname in netlink ntf during netns move
  net: ethernet: ti: Fix mixed module-builtin object
  net: phy: bcm7xxx: Add missing 16nm EPHY statistics
  ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
  tcp_bpf: properly release resources on error paths
  net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
  net: mdio-mux: fix C45 access returning -EIO after API change
  tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
  octeon_ep: update BQL sent bytes before ringing doorbell
  ...
2023-10-19 12:08:18 -07:00
Kent Overstreet
faf1dce852 objtool: Add bcachefs noreturns
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-19 14:58:29 -04:00
Linus Torvalds
189b756271 seccomp fix for v6.6-rc7
- Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmUwfdQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJixtD/98vbEvwKMPxQxTH17Y2XMA7uSg
 8/WV6reVxE+DCalieqO14XRP/I++88W5A+HJSxZ+C9Q1WFXdsGINr3XMcihWZIc1
 IkzJ7t+LTTn4Y8MxC0m7LyIqpRIVYTkKW6RwrUbsZj0+vrpXtSWz6Gy7Oz7YXvah
 Y3DHVyFpi8FNK5cEz0cmBIITE0CCR/eQuiMD1057esudruWCFiKKGCZN6hPdNTlE
 pzCYGKNlA3eXOzsW/N2qAIj2nlnEJCMpP92vnvpRhcNIa+MKCzAELSi73QScG/ja
 lrcvCWWpWmR5sSNhA2dWbamR/S6GrEtwo16J4Sg/sYqHrywsr0vVGjEHLzkpf8Ef
 zxW5mGXl78dL+zoq6woyOzt/ztQ1kqVbSXF0VMMhVjUQhTSkvivr6LX+mwvNJYim
 KLAaJmWgPj6wm9agpQ/hukNKaENiA6sDTb+wpcWKdrWl958DoSY8zl3ZdJXWSxnE
 EWXLRtPioMP9BR/9U63Fr2U/m7dXKXHRfn5J2+0WBdq4MsuPtwkIs6SAaec0AKMS
 jJnRcu/b1ub00WitnJZk8v4jFfHVernFVkUGhFrgdXzwF7hXaMvsAPK6wfg2MNpe
 9aRPS4Pf8xgF6Bz7XqNrw3qgNmQwZ5kn/ucuwSjfOc2BcwSKjYGVqwXUQ8hbQdNK
 HtrskXY14BxtSLo1Gg==
 =Ir3f
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fix from Kees Cook:

 - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)

* tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  perf/benchmark: fix seccomp_unotify benchmark for 32-bit
2023-10-19 10:10:14 -07:00
Matthieu Baerts
2cfaa8b3b7 selftests: mptcp: join: no RST when rm subflow/addr
Recently, we noticed that some RST were wrongly generated when removing
the initial subflow.

This patch makes sure RST are not sent when removing any subflows or any
addresses.

Fixes: c2b2ae3925 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-5-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:00 -07:00
Matthieu Baerts
b134a58054 selftests: mptcp: join: correctly check for no RST
The commit mentioned below was more tolerant with the number of RST seen
during a test because in some uncontrollable situations, multiple RST
can be generated.

But it was not taking into account the case where no RST are expected:
this validation was then no longer reporting issues for the 0 RST case
because it is not possible to have less than 0 RST in the counter. This
patch fixes the issue by adding a specific condition.

Fixes: 6bf41020b7 ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-1-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:09:59 -07:00
Manu Bretelle
6bd5e167af bpftool: Wrap struct_ops dump in an array
When dumping a struct_ops, 2 dictionaries are emitted.

When using `name`, they were already wrapped in an array, but not when
using `id`. Causing `jq` to fail at parsing the payload as it reached
the comma following the first dict.

This change wraps those dictionaries in an array so valid json is emitted.

Before, jq fails to parse the output:
```
 $ sudo bpftool struct_ops dump id 1523612 | jq . > /dev/null
parse error: Expected value before ',' at line 19, column 2
```

After, no error parsing the output:
```
sudo ./bpftool  struct_ops dump id 1523612 | jq . > /dev/null
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231018230133.1593152-3-chantr4@gmail.com
2023-10-19 16:30:15 +02:00
Manu Bretelle
90704b4be0 bpftool: Fix printing of pointer value
When printing a pointer value, "%p" will either print the hexadecimal
value of the pointer (e.g `0x1234`), or `(nil)` when NULL.

Both of those are invalid json "integer" values and need to be wrapped
in quotes.

Before:
```
$ sudo bpftool struct_ops dump  name ned_dummy_cca | grep next
                    "next": (nil),
$ sudo bpftool struct_ops dump  name ned_dummy_cca | \
    jq '.[1].bpf_struct_ops_tcp_congestion_ops.data.list.next'
parse error: Invalid numeric literal at line 29, column 34
```

After:
```
$ sudo ./bpftool struct_ops dump  name ned_dummy_cca | grep next
                    "next": "(nil)",
$ sudo ./bpftool struct_ops dump  name ned_dummy_cca | \
    jq '.[1].bpf_struct_ops_tcp_congestion_ops.data.list.next'
"(nil)"
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231018230133.1593152-2-chantr4@gmail.com
2023-10-19 16:29:36 +02:00
Jakub Kicinski
3920431d98 selftests: net: add very basic test for netdev names and namespaces
Add selftest for fixes around naming netdevs and namespaces.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19 15:51:16 +02:00
Eric Dumazet
878d951c67 inet: lock the socket in ip_sock_set_tos()
Christoph Paasch reported a panic in TCP stack [1]

Indeed, we should not call sk_dst_reset() without holding
the socket lock, as __sk_dst_get() callers do not all rely
on bare RCU.

[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 12bad6067 P4D 12bad6067 PUD 12bad5067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 2750 Comm: syz-executor.5 Not tainted 6.6.0-rc4-g7a5720a344e7 #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:tcp_get_metrics+0x118/0x8f0 net/ipv4/tcp_metrics.c:321
Code: c7 44 24 70 02 00 8b 03 89 44 24 48 c7 44 24 4c 00 00 00 00 66 c7 44 24 58 02 00 66 ba 02 00 b1 01 89 4c 24 04 4c 89 7c 24 10 <49> 8b 0f 48 8b 89 50 05 00 00 48 89 4c 24 30 33 81 00 02 00 00 69
RSP: 0018:ffffc90000af79b8 EFLAGS: 00010293
RAX: 000000000100007f RBX: ffff88812ae8f500 RCX: ffff88812b5f8f01
RDX: 0000000000000002 RSI: ffffffff8300f080 RDI: 0000000000000002
RBP: 0000000000000002 R08: 0000000000000003 R09: ffffffff8205eca0
R10: 0000000000000002 R11: ffff88812b5f8f00 R12: ffff88812a9e0580
R13: 0000000000000000 R14: ffff88812ae8fbd2 R15: 0000000000000000
FS: 00007f70a006b640(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000012bad7003 CR4: 0000000000170ee0
Call Trace:
<TASK>
tcp_fastopen_cache_get+0x32/0x140 net/ipv4/tcp_metrics.c:567
tcp_fastopen_cookie_check+0x28/0x180 net/ipv4/tcp_fastopen.c:419
tcp_connect+0x9c8/0x12a0 net/ipv4/tcp_output.c:3839
tcp_v4_connect+0x645/0x6e0 net/ipv4/tcp_ipv4.c:323
__inet_stream_connect+0x120/0x590 net/ipv4/af_inet.c:676
tcp_sendmsg_fastopen+0x2d6/0x3a0 net/ipv4/tcp.c:1021
tcp_sendmsg_locked+0x1957/0x1b00 net/ipv4/tcp.c:1073
tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1336
__sock_sendmsg+0x83/0xd0 net/socket.c:730
__sys_sendto+0x20a/0x2a0 net/socket.c:2194
__do_sys_sendto net/socket.c:2206 [inline]

Fixes: e08d0b3d17 ("inet: implement lockless IP_TOS")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231018090014.345158-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19 13:13:13 +02:00
Pedro Tammela
35027c7909 selftests: tc-testing: move auxiliary scripts to a dedicated folder
Some taprio tests need auxiliary scripts to wait for workqueue events to
process. Move them to a dedicated folder in order to package them for
the kselftests tarball.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231017152309.3196320-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:07:51 -07:00
Pedro Tammela
f157b73d51 selftests: tc-testing: add missing Kconfig options to 'config'
Make sure CI builds using just tc-testing/config can run all tdc tests.
Some tests were broken because of missing knobs.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231017152309.3196320-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:07:51 -07:00
Jiri Slaby (SUSE)
31c65705a8 perf/benchmark: fix seccomp_unotify benchmark for 32-bit
Commit 7d5cb68af6 (perf/benchmark: add a new benchmark for
seccom_unotify) added a reference to __NR_seccomp into perf. This is
fine as it added also a definition of __NR_seccomp for 64-bit. But it
failed to do so for 32-bit as instead of ifndef, ifdef was used.

Fix this typo (so fix the build of perf on 32-bit).

Fixes: 7d5cb68af6 (perf/benchmark: add a new benchmark for seccom_unotify)
Cc: Andrei Vagin <avagin@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20231017083019.31733-1-jirislaby@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-10-18 17:47:18 -07:00
Jing Zhang
54a9ea7352 KVM: arm64: selftests: Test for setting ID register from usersapce
Add tests to verify setting ID registers from userspace is handled
correctly by KVM. Also add a test case to use ioctl
KVM_ARM_GET_REG_WRITABLE_MASKS to get writable masks.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18 23:37:33 +00:00
Jing Zhang
0359c946b1 tools headers arm64: Update sysreg.h with kernel sources
The users of sysreg.h (perf, KVM selftests) are now generating the
necessary sysreg-defs.h; sync sysreg.h with the kernel sources and
fix the KVM selftests that use macros which suffered a rename.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18 23:36:25 +00:00
Oliver Upton
9697d84cc3 KVM: selftests: Generate sysreg-defs.h and add to include path
Start generating sysreg-defs.h for arm64 builds in anticipation of
updating sysreg.h to a version that depends on it.

Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18 23:36:25 +00:00
Oliver Upton
e2bdd172e6 perf build: Generate arm64's sysreg-defs.h and add to include path
Start generating sysreg-defs.h in anticipation of updating sysreg.h to a
version that needs the generated output.

Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18 23:36:25 +00:00
Oliver Upton
02e85f7466 tools: arm64: Add a Makefile for generating sysreg-defs.h
Use a common Makefile for generating sysreg-defs.h, which will soon be
needed by perf and KVM selftests. The naming scheme of the generated
macros is not expected to change, so just refer to the canonical
script/data in the kernel source rather than copying to tools.

Co-developed-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18 23:36:25 +00:00
Namhyung Kim
1f36b190ad perf tools: Do not ignore the default vmlinux.h
The recent change made it possible to generate vmlinux.h from BTF and
to ignore the file.  But we also have a minimal vmlinux.h that will be
used by default.  It should not be ignored by GIT.

Fixes: b7a2d774c9 ("perf build: Add ability to build with a generated vmlinux.h")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310110451.rvdUZJEY-lkp@intel.com/
Cc: oe-kbuild-all@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-18 15:35:20 -07:00
Jiri Olsa
b5c532e904 tools/build: Fix -s detection code in tools/scripts/Makefile.include
As Dmitry described in [1] changelog the current way of detecting
-s option is broken for new make.

Changing the tools/build -s option detection the same way as it was
fixed for root Makefile in [1].

[1] 4bf7358816 ("kbuild: Port silent mode detection to future gnu make.")

Cc: Dmitry Goncharov <dgoncharov@users.sf.net>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: KP Singh <kpsingh@chromium.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20231008212251.236023-3-jolsa@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-18 15:29:47 -07:00
Jiri Olsa
d9997f7ffb tools/build: Fix -s detection code in tools/build/Makefile.build
As Dmitry described in [1] changelog the current way of detecting
-s option is broken for new make.

Changing the tools/build -s option detection the same way as it was
fixed for root Makefile in [1].

[1] 4bf7358816 ("kbuild: Port silent mode detection to future gnu make.")

Cc: Dmitry Goncharov <dgoncharov@users.sf.net>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: KP Singh <kpsingh@chromium.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20231008212251.236023-2-jolsa@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-18 15:29:35 -07:00
Swarup Laxman Kotiaklapudi
6e79b375ad proc: test /proc/${pid}/statm
My original comment lied, output can be "0 A A B 0 0 0\n"
(see comment in the code).

I don't quite understand why

	get_mm_counter(mm, MM_FILEPAGES) + get_mm_counter(mm, MM_SHMEMPAGES)

can stay positive but get_mm_counter(mm, MM_ANONPAGES) is always 0 after
everything is unmapped but that's just me.

[adobriyan@gmail.com: more or less rewritten]
Link: https://lkml.kernel.org/r/0721ca69-7bb4-40aa-8d01-0c5f91e5f363@p183
Signed-off-by: Swarup Laxman Kotiaklapudi <swarupkotikalapudi@gmail.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:43:22 -07:00
David Laight
598f0ac150 compiler.h: move __is_constexpr() to compiler.h
Prior to f747e6667e __is_constexpr() was in its only user minmax.h. 
That commit moved it to const.h - but that file just defines ULL(x) and
UL(x) so that constants can be defined for .S and .c files.

So apart from the word 'const' it wasn't really a good location.  Instead
move the definition to compiler.h just before the similar

  is_signed_type() and is_unsigned_type().

This may not be a good long-term home, but the three definitions belong
together.

Link: https://lkml.kernel.org/r/2a6680bbe2e84459816a113730426782@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:43:22 -07:00
Audra Mitchell
d8ea435f07 tools/mm: update the usage output to be more organized
Organize the usage options alphabetically and improve the description of
some options.  Also separate the more complicated cull options from the
single use compare options.

Link: https://lkml.kernel.org/r/20231013190350.579407-6-audra@redhat.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Georgi Djakov <djakov@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:19 -07:00
Audra Mitchell
c6d5e4901e tools/mm: fix the default case for page_owner_sort
With the additional commands and timestamps added to the tool, the default
case (-t) has been broken.  Now that the allocation timestamps are saved
outside of the txt field, allow us to properly sort the data by number of
times the record has been seen.  Furthermore prevent the misuse of the
commandline arguments so only one compare option can be used.

Link: https://lkml.kernel.org/r/20231013190350.579407-5-audra@redhat.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Georgi Djakov <djakov@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:19 -07:00
Audra Mitchell
63a150623a tools/mm: filter out timestamps for correct collation
With the introduction of allocation timestamps being included in
page_owner output, each record becomes unique due to the timestamp
nanosecond granularity.  Remove the check in add_list that tries to
collate each record during processing as the memcmp() is just additional
overhead at this point.

Also keep the allocation timestamps, but allow collation to occur without
consideration of the allocation timestamp except in the case were
allocation timestamps are requested by the user (the -a option).

Link: https://lkml.kernel.org/r/20231013190350.579407-4-audra@redhat.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Georgi Djakov <djakov@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:19 -07:00
Audra Mitchell
0179c62839 tools/mm: remove references to free_ts from page_owner_sort
With the removal of free timestamps from page_owner output, we no longer
need to handle this case or the "unreleased" case.  Remove all references
to both cases.

Link: https://lkml.kernel.org/r/20231013190350.579407-3-audra@redhat.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Georgi Djakov <djakov@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:19 -07:00
Nhat Pham
c0dddb7aa5 selftests: add a selftest to verify hugetlb usage in memcg
This patch add a new kselftest to demonstrate and verify the new hugetlb
memcg accounting behavior.

Link: https://lkml.kernel.org/r/20231006184629.155543-5-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun heo <tj@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:18 -07:00
Breno Leitao
116d57303a selftests/mm: add a new test for madv and hugetlb
Create a selftest that exercises the race between page faults and
madvise(MADV_DONTNEED) in the same huge page. Do it by running two
threads that touches the huge page and madvise(MADV_DONTNEED) at the same
time.

In case of a SIGBUS coming at pagefault, the test should fail, since we
hit the bug.

The test doesn't have a signal handler, and if it fails, it fails like
the following

  ----------------------------------
  running ./hugetlb_fault_after_madv
  ----------------------------------
  ./run_vmtests.sh: line 186: 595563 Bus error    (core dumped) "$@"
  [FAIL]

This selftest goes together with the fix of the bug[1] itself.

[1] https://lore.kernel.org/all/20231001005659.2185316-1-riel@surriel.com/#r

Link: https://lkml.kernel.org/r/20231005163922.87568-3-leitao@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Tested-by: Rik van Riel <riel@surriel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:16 -07:00
Breno Leitao
c8b9073142 selftests/mm: export get_free_hugepages()
Patch series "New selftest for mm", v2.

This is a simple test case that reproduces an mm problem[1], where a page
fault races with madvise(), and it is not trivial to reproduce and debug.

This test-case aims to avoid such race problems from happening again,
impacting workloads that leverages external allocators, such as tcmalloc,
jemalloc, etc.

[1] https://lore.kernel.org/all/20231001005659.2185316-1-riel@surriel.com/#r


This patch (of 2):

get_free_hugepages() is helpful for other hugepage tests.  Export it to
the common file (vm_util.c) to be reused.

Link: https://lkml.kernel.org/r/20231005163922.87568-1-leitao@debian.org
Link: https://lkml.kernel.org/r/20231005163922.87568-2-leitao@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:16 -07:00
Liam R. Howlett
7771dcf019 radix tree test suite: fix allocation calculation in kmem_cache_alloc_bulk()
The bulk allocation is iterating through an array and storing enough
memory for the entire bulk allocation instead of a single array entry. 
Only allocate an array element of the size set in the kmem_cache.

Link: https://lkml.kernel.org/r/20230929201359.2857583-1-Liam.Howlett@oracle.com
Fixes: cc86e0c2f3 ("radix tree test suite: add support for slab bulk APIs")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:13 -07:00
Muhammad Usama Anjum
46fd75d4a3 selftests: mm: add pagemap ioctl tests
Add pagemap ioctl tests. Add several different types of tests to judge
the correction of the interface.

Link: https://lkml.kernel.org/r/20230821141518.870589-7-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Miroslaw <emmir@google.com>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Paul Gofman <pgofman@codeweavers.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yun Zhou <yun.zhou@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:13 -07:00
Muhammad Usama Anjum
b58aa0f4fe tools headers UAPI: update linux/fs.h with the kernel sources
New IOCTL and macros has been added in the kernel sources. Update the
tools header file as well.

Link: https://lkml.kernel.org/r/20230821141518.870589-5-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Miroslaw <emmir@google.com>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Paul Gofman <pgofman@codeweavers.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yun Zhou <yun.zhou@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:13 -07:00
Andrew Morton
5ef8f1b2b4 Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-10-18 14:32:58 -07:00
Ilpo Järvinen
5247e6dbed selftests/resctrl: Fix MBM test failure when MBA unavailable
Commit 20d96b25cc ("selftests/resctrl: Fix schemata write error
check") exposed a problem in feature detection logic in MBM selftest.
If schemata does not support MB:x=x entries, the schemata write to
initialize 100% memory bandwidth allocation in mbm_setup() will now
fail with -EINVAL due to the error handling corrected by the commit
20d96b25cc ("selftests/resctrl: Fix schemata write error check").
That commit just uncovers the failed write, it is not wrong itself.

If MB:x=x is not supported by schemata, it is safe to assume 100%
memory bandwidth is always set. Therefore, the previously ignored error
does not make the MBM test itself wrong.

Restore the previous behavior of MBM test by checking MB support before
attempting to write it into schemata which results in behavior
equivalent to ignoring the write error.

Fixes: 20d96b25cc ("selftests/resctrl: Fix schemata write error check")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-18 14:33:44 -06:00
Mark Brown
34dce23f7e selftests/clone3: Report descriptive test names
The clone3() selftests currently report test results in a format that does
not mesh entirely well with automation. They log output for each test such
as:

  # [1382411] Trying clone3() with flags 0 (size 0)
  # I am the parent (1382411). My child's pid is 1382412
  # I am the child, my PID is 1382412
  # [1382411] clone3() with flags says: 0 expected 0
  ok 1 [1382411] Result (0) matches expectation (0)

This is not ideal for automated parsers since the text after the "ok 1" is
treated as the test name when comparing runs by a lot of automation (tests
routinely get renumbered due to things like new tests being added based on
logical groupings). The PID means that the test names will frequently vary
and the rest of the name being a description of results means several tests
have identical text there.

Address this by refactoring things so that we have a static descriptive
name for each test which we use when logging passes, failures and skips
and since we now have a stable name for the test to hand log that before
starting the test to address the common issue reading logs where the test
name is only printed after any diagnostics. The result is:

 # Running test 'simple clone3()'
 # [1562777] Trying clone3() with flags 0 (size 0)
 # I am the parent (1562777). My child's pid is 1562778
 # I am the child, my PID is 1562778
 # [1562777] clone3() with flags says: 0 expected 0
 ok 1 simple clone3()

In order to handle skips a bit more neatly this is done in a moderately
invasive fashion where we move from a sequence of function calls to having
an array of test parameters. This hopefully also makes it a little easier
to see what the tests are doing when looking at both the source and the
logs.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-18 14:33:44 -06:00
zhujun2
ecc4185a4d selftests:modify the incorrect print format
when the argument type is 'unsigned int',printf '%u'
in format string. Problem found during code reading.

Update commit log with information on how the problem
was found:
Shuah Khan <skhan@linuxfoundation.org>

Signed-off-by: zhujun2 <zhujun2@cmss.chinamobile.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-18 14:33:44 -06:00
zhujun2
3f6f8a8c5e selftests/efivarfs: create-read: fix a resource leak
The opened file should be closed in main(), otherwise resource
leak will occur that this problem was discovered by code reading

Signed-off-by: zhujun2 <zhujun2@cmss.chinamobile.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-18 14:33:44 -06:00
Yu Liao
11df28854b selftests/ftrace: Add riscv support for kprobe arg tests
This is the riscv variant of commit 9855c4626c ("selftests/ftrace:
Add ppc support for kprobe args tests").

Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-18 14:33:44 -06:00
Yu Liao
2eadb32992 selftests/ftrace: add loongarch support for kprobe args char tests
Add loongarch support for the recently added kprobe args tests.

Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-18 14:33:43 -06:00
Liam R. Howlett
099d7439ce maple_tree: add GFP_KERNEL to allocations in mas_expected_entries()
Users complained about OOM errors during fork without triggering
compaction.  This can be fixed by modifying the flags used in
mas_expected_entries() so that the compaction will be triggered in low
memory situations.  Since mas_expected_entries() is only used during fork,
the extra argument does not need to be passed through.

Additionally, the two test_maple_tree test cases and one benchmark test
were altered to use the correct locking type so that allocations would not
trigger sleeping and thus fail.  Testing was completed with lockdep atomic
sleep detection.

The additional locking change requires rwsem support additions to the
tools/ directory through the use of pthreads pthread_rwlock_t.  With this
change test_maple_tree works in userspace, as a module, and in-kernel.

Users may notice that the system gave up early on attempting to start new
processes instead of attempting to reclaim memory.

Link: https://lkml.kernel.org/r/20230915093243epcms1p46fa00bbac1ab7b7dca94acb66c44c456@epcms1p4
Link: https://lkml.kernel.org/r/20231012155233.2272446-1-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: <jason.sim@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 12:12:41 -07:00
Tiezhu Yang
fc7f04dc23 selftests/clone3: Fix broken test under !CONFIG_TIME_NS
When execute the following command to test clone3 under !CONFIG_TIME_NS:

  # make headers && cd tools/testing/selftests/clone3 && make && ./clone3

we can see the following error info:

  # [7538] Trying clone3() with flags 0x80 (size 0)
  # Invalid argument - Failed to create new process
  # [7538] clone3() with flags says: -22 expected 0
  not ok 18 [7538] Result (-22) is different than expected (0)
  ...
  # Totals: pass:18 fail:1 xfail:0 xpass:0 skip:0 error:0

This is because if CONFIG_TIME_NS is not set, but the flag
CLONE_NEWTIME (0x80) is used to clone a time namespace, it
will return -EINVAL in copy_time_ns().

If kernel does not support CONFIG_TIME_NS, /proc/self/ns/time
will be not exist, and then we should skip clone3() test with
CLONE_NEWTIME.

With this patch under !CONFIG_TIME_NS:

  # make headers && cd tools/testing/selftests/clone3 && make && ./clone3
  ...
  # Time namespaces are not supported
  ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
  ...
  # Totals: pass:18 fail:0 xfail:0 xpass:0 skip:1 error:0

Link: https://lkml.kernel.org/r/1689066814-13295-1-git-send-email-yangtiezhu@loongson.cn
Fixes: 515bddf0ec ("selftests/clone3: test clone3 with CLONE_NEWTIME")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 12:12:41 -07:00
Samasth Norway Ananda
e2de156b0d selftests/mm: include mman header to access MREMAP_DONTUNMAP identifier
Definition for MREMAP_DONTUNMAP is not present in glibc older than 2.32
thus throwing an undeclared error when running make on mm.  Including
linux/mman.h solves the build error for people having older glibc.

Link: https://lkml.kernel.org/r/20231012155257.891776-1-samasth.norway.ananda@oracle.com
Fixes: 0183d777c2 ("selftests: mm: remove duplicate unneeded defines")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/linux-mm/CA+G9fYvV-71XqpCr_jhdDfEtN701fBdG3q+=bafaZiGwUXy_aA@mail.gmail.com/
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 12:12:41 -07:00
Phil Sutter
2e2d9c7d4d selftests: netfilter: Run nft_audit.sh in its own netns
Don't mess with the host's firewall ruleset. Since audit logging is not
per-netns, add an initial delay of a second so other selftests' netns
cleanups have a chance to finish.

Fixes: e8dbde59ca ("selftests: netfilter: Test nf_tables audit logging")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:47:08 +02:00
Phil Sutter
1baf0152f7 netfilter: nf_tables: audit log object reset once per table
When resetting multiple objects at once (via dump request), emit a log
message per table (or filled skb) and resurrect the 'entries' parameter
to contain the number of objects being logged for.

To test the skb exhaustion path, perform some bulk counter and quota
adds in the kselftest.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:43:40 +02:00
Phil Sutter
c4eee56e14 net: skb_find_text: Ignore patterns extending past 'to'
Assume that caller's 'to' offset really represents an upper boundary for
the pattern search, so patterns extending past this offset are to be
rejected.

The old behaviour also was kind of inconsistent when it comes to
fragmentation (or otherwise non-linear skbs): If the pattern started in
between 'to' and 'from' offsets but extended to the next fragment, it
was not found if 'to' offset was still within the current fragment.

Test the new behaviour in a kselftest using iptables' string match.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: f72b948dcb ("[NET]: skb_find_text ignores to argument")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18 11:09:55 +01:00
Larysa Zaremba
bb6a88885f selftests/bpf: Add options and frags to xdp_hw_metadata
This is a follow-up to the commit 9b2b86332a ("bpf: Allow to use kfunc
XDP hints and frags together").

The are some possible implementations problems that may arise when providing
metadata specifically for multi-buffer packets, therefore there must be a
possibility to test such option separately.

Add an option to use multi-buffer AF_XDP xdp_hw_metadata and mark used XDP
program as capable to use frags.

As for now, xdp_hw_metadata accepts no options, so add simple option
parsing logic and a help message.

For quick reference, also add an ingress packet generation command to the
help message. The command comes from [0].

Example of output for multi-buffer packet:

  xsk_ring_cons__peek: 1
  0xead018: rx_desc[15]->addr=10000000000f000 addr=f100 comp_addr=f000
  rx_hash: 0x5789FCBB with RSS type:0x29
  rx_timestamp:  1696856851535324697 (sec:1696856851.5353)
  XDP RX-time:   1696856843158256391 (sec:1696856843.1583)
  	delta sec:-8.3771 (-8377068.306 usec)
  AF_XDP time:   1696856843158413078 (sec:1696856843.1584)
  	delta sec:0.0002 (156.687 usec)
  0xead018: complete idx=23 addr=f000
  xsk_ring_cons__peek: 1
  0xead018: rx_desc[16]->addr=100000000008000 addr=8100 comp_addr=8000
  0xead018: complete idx=24 addr=8000
  xsk_ring_cons__peek: 1
  0xead018: rx_desc[17]->addr=100000000009000 addr=9100 comp_addr=9000 EoP
  0xead018: complete idx=25 addr=9000

Metadata is printed for the first packet only.

  [0] https://lore.kernel.org/all/20230119221536.3349901-18-sdf@google.com/

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20231017162800.24080-1-larysa.zaremba@intel.com
2023-10-18 10:08:28 +02:00
Jakub Kicinski
9fea94d3a8 tools: ynl: fix converting flags to names after recent cleanup
I recently cleaned up specs to not specify enum-as-flags
when target enum is already defined as flags.
YNL Python library did not convert flags, unfortunately,
so this caused breakage for Stan and Willem.

Note that the nlspec.py abstraction already hides the differences
between flags and enums (value vs user_value), so the changes
are pretty trivial.

Fixes: 0629f22ec1 ("ynl: netdev: drop unnecessary enum-as-flags")
Reported-and-tested-by: Willem de Bruijn <willemb@google.com>
Reported-and-tested-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/all/ZS10NtQgd_BJZ3RU@google.com/
Link: https://lore.kernel.org/r/20231016213937.1820386-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17 17:59:46 -07:00
Johannes Nixdorf
6f84090333 selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest
Add a suite covering the fdb_n_learned and fdb_max_learned bridge
features, touching all special cases in accounting at least once.

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-5-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17 17:39:02 -07:00
Srinivas Pandruvada
a590ed6226 tools/power/x86/intel-speed-select: v1.18 release
This version addresses issues with:
- When CPU 0 hotplug is not possible, try cgroup v2 isolation
without any user input
- Fix turbo mode enable/disable swapped
- Sanitize command line integer and hex arguments
- Add more error messages
- Increase CPU count in one request

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-10-17 14:54:44 -07:00
Beau Belgrave
cf5a103c98 selftests/user_events: Fix abi_test for BE archs
The abi_test currently uses a long sized test value for enablement
checks. On LE this works fine, however, on BE this results in inaccurate
assert checks due to a bit being used and assuming it's value is the
same on both LE and BE.

Use int type for 32-bit values and long type for 64-bit values to ensure
appropriate behavior on both LE and BE.

Fixes: 60b1af8de8 ("tracing/user_events: Add ABI self-test")
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-17 15:07:19 -06:00
Daniel Borkmann
24516309e3 selftests/bpf: Add additional mprog query test coverage
Add several new test cases which assert corner cases on the mprog query
mechanism, for example, around passing in a too small or a larger array
than the current count.

  ./test_progs -t tc_opts
  #252     tc_opts_after:OK
  #253     tc_opts_append:OK
  #254     tc_opts_basic:OK
  #255     tc_opts_before:OK
  #256     tc_opts_chain_classic:OK
  #257     tc_opts_chain_mixed:OK
  #258     tc_opts_delete_empty:OK
  #259     tc_opts_demixed:OK
  #260     tc_opts_detach:OK
  #261     tc_opts_detach_after:OK
  #262     tc_opts_detach_before:OK
  #263     tc_opts_dev_cleanup:OK
  #264     tc_opts_invalid:OK
  #265     tc_opts_max:OK
  #266     tc_opts_mixed:OK
  #267     tc_opts_prepend:OK
  #268     tc_opts_query:OK
  #269     tc_opts_query_attach:OK
  #270     tc_opts_replace:OK
  #271     tc_opts_revision:OK
  Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20231017081728.24769-1-daniel@iogearbox.net
2023-10-17 12:57:43 -07:00
Changbin Du
9a13ee457a perf: script: fix missing ',' for fields option
A comma is missed at the end of line.

Signed-off-by: Changbin Du <changbin.du@huawei.com>
Link: https://lore.kernel.org/r/20231017015524.797065-1-changbin.du@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:51 -07:00
Athira Rajeev
a20fca2c5d perf tests: Fix shellcheck warning in stat_all_metricgroups
Running shellcheck on stat_all_metricgroups.sh reports
below warning:

 In ./tests/shell/stat_all_metricgroups.sh line 7:
 function ParanoidAndNotRoot()
 ^-- SC2112: 'function' keyword is non-standard. Delete it.

As per the format, "function" is a non-standard keyword that
can be used to declare functions. Fix this by removing the
"function" keyword from ParanoidAndNotRoot function

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20231013073021.99794-4-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:51 -07:00
Athira Rajeev
eff65ee26e perf tests: Fix shellcheck warning in record_sideband.sh
Running shellcheck on record_sideband.sh throws below
warning:

	In tests/shell/record_sideband.sh line 25:
	  if ! perf record -o ${perfdata} -BN --no-bpf-event -C $1 true 2>&1 >/dev/null
	    ^--^ SC2069: To redirect stdout+stderr, 2>&1 must be last (or use '{ cmd > file; } 2>&1' to clarify).

This shows shellcheck warning SC2069 where the redirection
order needs to be fixed. Use "cmd > /dev/null 2>&1" to fix
the redirection of perf record output

Fixes: 23b97c7ee9 ("perf test: Add test case for record sideband events")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: disgoel@linux.vnet.ibm.com
Link: https://lore.kernel.org/r/20231013073021.99794-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:51 -07:00
Athira Rajeev
47f5693c4c perf tests: Ignore shellcheck warning in lock_contention
Running shellcheck on lock_contention.sh generates below
warning

	In tests/shell/lock_contention.sh line 36:
	   if [ `nproc` -lt 4 ]; then
		  ^-----^ SC2046: Quote this to prevent word splitting.

Here since nproc will generate a single word output
and there is no possibility of word splitting, this
warning can be ignored. Use exception for this with
"disable" option in shellcheck. This warning is observed
after commit:
"commit 29441ab3a3 ("perf test lock_contention.sh: Skip test
if not enough CPUs")"

Fixes: 29441ab3a3 ("perf test lock_contention.sh: Skip test if not enough CPUs")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20231013073021.99794-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:51 -07:00
Athira Rajeev
f6a66ff98a tools/perf/arch/powerpc: Fix the CPU ID const char* value by adding 0x prefix
Simple expression parser test fails in powerpc as below:

    4: Simple expression parser
    test child forked, pid 170385
    Using CPUID 004e2102
    division by zero
    syntax error
    syntax error
    FAILED tests/expr.c:65 parse test failed
    test child finished with -1
    Simple expression parser: FAILED!

This is observed after commit:
'commit 9d5da30e4a ("perf jevents: Add a new expression builtin strcmp_cpuid_str()")'

With this commit, a new expression builtin strcmp_cpuid_str
got added. This function takes an 'ID' type value, which is
a string. So expression parse for strcmp_cpuid_str expects
const char * as cpuid value type. In case of powerpc, CPU IDs
are numbers. Hence it doesn't get interpreted correctly by
bison parser. Example in case of power9, cpuid string returns
as: 004e2102

cpuid of string type is expected in two cases:
1. char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused);

   Testcase "tests/expr.c" uses "perf_pmu__getcpuid" which calls
   get_cpuid_str to get the cpuid string.

2. cpuid field in  :struct pmu_events_map

   struct pmu_events_map {
           const char *arch;
	   const char *cpuid;

   Here cpuid field is used in "perf_pmu__find_events_table"
   function as "strcmp_cpuid_str(map->cpuid, cpuid)". The
   value for cpuid field is picked from mapfile.csv.

Fix the mapfile.csv and get_cpuid_str function to prefix
cpuid with 0x so that it gets correctly interpreted by
the bison parser

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20231009050052.64935-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:51 -07:00
Leo Yan
78efa7b411 perf cs-etm: Respect timestamp option
When users pass the option '--timestamp' or '-T' in the record command,
all events will set the PERF_SAMPLE_TIME bit in the attribution.  In
this case, the AUX event will record the kernel timestamp, but it
doesn't mean Arm CoreSight enables timestamp packets in its hardware
tracing.

If the option '--timestamp' or '-T' is set, this patch always enables
Arm CoreSight timestamp, as a result, the bit 28 in event's config is to
be set.

Before:

  # perf record -e cs_etm// --per-thread --timestamp -- ls
  # perf script --header-only
  ...
  # event : name = cs_etm//, , id = { 69 }, type = 12, size = 136,
  config = 0, { sample_period, sample_freq } = 1,
  sample_type = IP|TID|TIME|CPU|IDENTIFIER, read_format = ID|LOST,
  disabled = 1, enable_on_exec = 1, sample_id_all = 1, exclude_guest = 1
  ...

After:

  # perf record -e cs_etm// --per-thread --timestamp -- ls
  # perf script --header-only
  ...
  # event : name = cs_etm//, , id = { 49 }, type = 12, size = 136,
  config = 0x10000000, { sample_period, sample_freq } = 1,
  sample_type = IP|TID|TIME|CPU|IDENTIFIER, read_format = ID|LOST,
  disabled = 1, enable_on_exec = 1, sample_id_all = 1, exclude_guest = 1
  ...

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: James Clark <james.clark@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231014074159.1667880-3-leo.yan@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:51 -07:00
Leo Yan
f8ccc2d5cc perf cs-etm: Validate timestamp tracing in per-thread mode
So far, it's impossible to validate timestamp trace in Arm CoreSight when
the perf is in the per-thread mode.  E.g. for the command:

  perf record -e cs_etm/timestamp/ --per-thread -- ls

The command enables config 'timestamp' for 'cs_etm' event in the
per-thread mode.  In this case, the function cs_etm_validate_config()
directly bails out and skips validation.

Given profiled process can be scheduled on any CPUs in the per-thread
mode, this patch validates timestamp tracing for all CPUs when detect
the CPU map is empty.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: James Clark <james.clark@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231014074159.1667880-2-leo.yan@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:51 -07:00
Ian Rogers
0197da7aff perf pmu: Lazily compute default config
The default config is computed during creation of the PMU and may do
things like scanning sysfs, when the PMU may just be used as part of
scanning. Change default_config to perf_event_attr_init_default, a
callback that is used when a default config needs initializing. This
avoids holding onto the memory for a perf_event_attr and copying.

On a tigerlake laptop running the pmu-scan benchmark:

Before:
Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 28.780 usec (+- 0.503 usec)
  Average PMU scanning took: 283.480 usec (+- 18.471 usec)
Number of openat syscalls: 30,227

After:
Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 27.880 usec (+- 0.169 usec)
  Average PMU scanning took: 245.260 usec (+- 15.758 usec)
Number of openat syscalls: 28,914

Over 3 runs it is a nearly 12% reduction in execution time and a 4.3%
of openat calls.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Ian Rogers
f20c15d13f perf pmu-events: Remember the perf_events_map for a PMU
strcmp_cpuid_str performs regular expression comparisons and so per
CPUID linear searches over the perf_events_map are expensive. Add a
helper function called map_for_pmu that does the search but also
caches the map specific to a PMU. As the PMU may differ, also cache
the CPUID string so that PMUs with the same CPUID string don't require
the linear search and regular expression comparisons. This speeds
loading PMUs as the search is done once per PMU to find the
appropriate tables.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Ian Rogers
63883cb063 perf pmu: Const-ify perf_pmu__config_terms
Add const to related APIs, this is so they can be used to default
initialize a perf_event_attr from a const pmu.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Ian Rogers
3a42f4c796 perf pmu: Const-ify file APIs
File APIs don't alter the struct pmu so allow const ones to be passed.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Ian Rogers
672bd21390 perf arm-spe: Move PMU initialization from default config code
Avoid setting PMU values in arm_spe_pmu_default_config, move to
perf_pmu__arch_init.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Ian Rogers
461e3e636a perf intel-pt: Move PMU initialization from default config code
Avoid setting PMU values in intel_pt_pmu_default_config, move to
perf_pmu__arch_init.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Ian Rogers
aa61360155 perf pmu: Rename perf_pmu__get_default_config to perf_pmu__arch_init
Assign default_config as part of the init. perf_pmu__get_default_config
was doing more than just getting the default config and so this is
intended to better align with the code.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Adrian Hunter
661ce78105 perf intel-pt: Prefer get_unaligned_le64 to memcpy_le64
Use get_unaligned_le64() instead of memcpy_le64(..., 8) because it produces
simpler code.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-6-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Adrian Hunter
3b4fa67fc6 perf intel-pt: Use get_unaligned_le16() etc
Avoid unaligned access by using get_unaligned_le16(), get_unaligned_le32()
and get_unaligned_le64().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-5-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:49 -07:00
Adrian Hunter
f058fa5b07 perf intel-pt: Use existing definitions of le16_to_cpu() etc
Use definitions from tools/include/linux/kernel.h

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-4-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:49 -07:00
Adrian Hunter
1d2dbce9bb perf intel-pt: Simplify intel_pt_get_vmcs()
Simplify and remove unnecessary constant expressions.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-3-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:49 -07:00
Adrian Hunter
a91c987254 perf tools: Add get_unaligned_leNN()
Add get_unaligned_le16(), get_unaligned_le32 and get_unaligned_le64, same
as include/asm-generic/unaligned.h. And add include/asm-generic/unaligned.h
to check-headers.sh bringing tools/include/asm-generic/unaligned.h up to
date so that the kernel and tools versions match.

Use diagnostic pragmas to ignore -Wpacked used by perf build.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-2-adrian.hunter@intel.com
Link: https://lore.kernel.org/r/20231010142234.20061-1-adrian.hunter@intel.com
[ squashed check-header.sh addition ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:02 -07:00
Srinivas Pandruvada
3bc0f20a8c tools/power/x86/intel-speed-select: Use cgroup isolate for CPU 0
From kernel version 6.5, CPU 0 hotplug capability is deprecated.
If some SST profile doesn't have CPU 0, then it is no longer possible to
offline CPU 0. This means that user space threads will still run on
CPU 0.

To workaround this issue, use cgroup v2 isolation feature. Whenever there
/sys/devices/system/cpu/cpu0/online file is absent or open fails, isolate
CPU 0 via CPU cgroup v2 isolation. Also add a command line option to
force even if the /sys/devices/system/cpu/cpu0/online is present.

The previous commit "01bcb56f059e ("tools/power/x86/intel-speed-select:
Prevent CPU 0 offline") was just warning about this issue based on the
kernel version 6.5 and above. With this new approach, instead of warning
take action to mitigate the issue.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-10-17 11:55:37 -07:00
Srinivas Pandruvada
bc5370cca0 tools/power/x86/intel-speed-select: Increase max CPUs in one request
With the increase in the CPU count, this count needs to be updated.
Increase max CPU count to 512.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-10-17 11:55:37 -07:00
Srinivas Pandruvada
da4c1b9e8f tools/power/x86/intel-speed-select: Display error for core-power support
When core-power is getting enabled, if the feaure is not supported,
display error.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-10-17 11:55:37 -07:00
Srinivas Pandruvada
2fe8d2d791 tools/power/x86/intel-speed-select: No TRL for non compute domains
Don't call to set or get TRL for domains in which there are no CPUs.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-10-17 11:55:37 -07:00
Srinivas Pandruvada
7b00d1018c tools/power/x86/intel-speed-select: turbo-mode enable disable swapped
The command for turbo-mode enable and disable is swapped. Fix that.
Previously turbo-mode enable was actually disabling and disable was
enabling.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-10-17 11:55:37 -07:00
Srinivas Pandruvada
3076db34b5 tools/power/x86/intel-speed-select: Update help for TRL
TRL (turbo ratio limit) argument is passed in hex string. Clarify that
in the help.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-10-17 11:55:37 -07:00
Srinivas Pandruvada
61f3d868b3 tools/power/x86/intel-speed-select: Sanitize integer arguments
If the command takes some integer arguments, make sure the command
contains only digits. Same for Hex arguments. Otherwise return error.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-10-17 11:55:37 -07:00
Yafang Shao
44cb03f19b selftests/bpf: Add selftest for bpf_task_under_cgroup() in sleepable prog
The result is as follows:

  $ tools/testing/selftests/bpf/test_progs --name=task_under_cgroup
  #237     task_under_cgroup:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Without the previous patch, there will be RCU warnings in dmesg when
CONFIG_PROVE_RCU is enabled. While with the previous patch, there will
be no warnings.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20231007135945.4306-2-laoar.shao@gmail.com
2023-10-17 18:31:27 +02:00
Andrii Nakryiko
137df1189d libbpf: Don't assume SHT_GNU_verdef presence for SHT_GNU_versym section
Fix too eager assumption that SHT_GNU_verdef ELF section is going to be
present whenever binary has SHT_GNU_versym section. It seems like either
SHT_GNU_verdef or SHT_GNU_verneed can be used, so failing on missing
SHT_GNU_verdef actually breaks use cases in production.

One specific reported issue, which was used to manually test this fix,
was trying to attach to `readline` function in BASH binary.

Fixes: bb7fa09399 ("libbpf: Support symbol versioning for uprobe")
Reported-by: Liam Wisehart <liamwisehart@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Manu Bretelle <chantr4@gmail.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/bpf/20231016182840.4033346-1-andrii@kernel.org
2023-10-17 11:43:20 +02:00
Jakub Kicinski
a3c2dd9648 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZS1d4wAKCRDbK58LschI
 g4DSAP441CdKh8fd+wNKUSKHFbpCQ6EvocR6Nf+Sj2DFUx/w/QEA7mfju7Abqjc3
 xwDEx0BuhrjMrjV5MmEpxc7lYl9XcQU=
 =vuWk
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-10-16

We've added 90 non-merge commits during the last 25 day(s) which contain
a total of 120 files changed, 3519 insertions(+), 895 deletions(-).

The main changes are:

1) Add missed stats for kprobes to retrieve the number of missed kprobe
   executions and subsequent executions of BPF programs, from Jiri Olsa.

2) Add cgroup BPF sockaddr hooks for unix sockets. The use case is
   for systemd to reimplement the LogNamespace feature which allows
   running multiple instances of systemd-journald to process the logs
   of different services, from Daan De Meyer.

3) Implement BPF CPUv4 support for s390x BPF JIT, from Ilya Leoshkevich.

4) Improve BPF verifier log output for scalar registers to better
   disambiguate their internal state wrt defaults vs min/max values
   matching, from Andrii Nakryiko.

5) Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving
   the source IP address with a new BPF_FIB_LOOKUP_SRC flag,
   from Martynas Pumputis.

6) Add support for open-coded task_vma iterator to help with symbolization
   for BPF-collected user stacks, from Dave Marchevsky.

7) Add libbpf getters for accessing individual BPF ring buffers which
   is useful for polling them individually, for example, from Martin Kelly.

8) Extend AF_XDP selftests to validate the SHARED_UMEM feature,
   from Tushar Vyavahare.

9) Improve BPF selftests cross-building support for riscv arch,
   from Björn Töpel.

10) Add the ability to pin a BPF timer to the same calling CPU,
   from David Vernet.

11) Fix libbpf's bpf_tracing.h macros for riscv to use the generic
   implementation of PT_REGS_SYSCALL_REGS() to access syscall arguments,
   from Alexandre Ghiti.

12) Extend libbpf to support symbol versioning for uprobes, from Hengqi Chen.

13) Fix bpftool's skeleton code generation to guarantee that ELF data
    is 8 byte aligned, from Ian Rogers.

14) Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4
    security mitigations in BPF verifier, from Yafang Shao.

15) Annotate struct bpf_stack_map with __counted_by attribute to prepare
    BPF side for upcoming __counted_by compiler support, from Kees Cook.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (90 commits)
  bpf: Ensure proper register state printing for cond jumps
  bpf: Disambiguate SCALAR register state output in verifier logs
  selftests/bpf: Make align selftests more robust
  selftests/bpf: Improve missed_kprobe_recursion test robustness
  selftests/bpf: Improve percpu_alloc test robustness
  selftests/bpf: Add tests for open-coded task_vma iter
  bpf: Introduce task_vma open-coded iterator kfuncs
  selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
  bpf: Don't explicitly emit BTF for struct btf_iter_num
  bpf: Change syscall_nr type to int in struct syscall_tp_t
  net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set
  bpf: Avoid unnecessary audit log for CPU security mitigations
  selftests/bpf: Add tests for cgroup unix socket address hooks
  selftests/bpf: Make sure mount directory exists
  documentation/bpf: Document cgroup unix socket address hooks
  bpftool: Add support for cgroup unix socket address hooks
  libbpf: Add support for cgroup unix socket address hooks
  bpf: Implement cgroup sockaddr hooks for unix sockets
  bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf
  bpf: Propagate modified uaddrlen from cgroup sockaddr programs
  ...
====================

Link: https://lore.kernel.org/r/20231016204803.30153-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-16 21:05:33 -07:00
Linus Torvalds
86d6a628a2 ARM:
- Fix the handling of the phycal timer offset when FEAT_ECV
   and CNTPOFF_EL2 are implemented.
 
 - Restore the functionnality of Permission Indirection that
   was broken by the Fine Grained Trapping rework
 
 - Cleanup some PMU event sharing code
 
 MIPS:
 
 - Fix W=1 build.
 
 s390:
 
 - One small fix for gisa to avoid stalls.
 
 x86:
 
 - Truncate writes to PMU counters to the counter's width to avoid spurious
   overflows when emulating counter events in software.
 
 - Set the LVTPC entry mask bit when handling a PMI (to match Intel-defined
   architectural behavior).
 
 - Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work to
   kick the guest out of emulated halt.
 
 - Fix for loading XSAVE state from an old kernel into a new one.
 
 - Fixes for AMD AVIC
 
 selftests:
 
 - Play nice with %llx when formatting guest printf and assert statements.
 
 - Clean up stale test metadata.
 
 - Zero-initialize structures in memslot perf test to workaround a suspected
   "may be used uninitialized" false positives from GCC.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmUtvXgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOE3gf/Q0Xvi/oU/+iDMuvfCbMZg/nhbrsa
 WmE/zXLrCF0DknppAsWulkhLGL2ceL6X+f37f2vWpBdG9SVDG/vSAg+QQDwsXiKN
 hTJoaybtMMPZM64emPZKeLMrq3A/U32qIMmWMJkoQRyz6dftUhGqZEuy1jw8oomJ
 n9idRDCMkbo+bick4URt0FEuI3Tf6dPIlG7P5hObFTw+nenzzxTjoxWZ432Mgyod
 yqveEke4hcQ+6K1zdAcDNZqT9ZhxeTxAO4yrBAYfnFoPLhUXKDUumkqAQPNOhKTo
 YN+b29kHBm+HvYkHN785FQla/13wjE1aq5TUj5J7NEDv4uRXDefDq2OAeg==
 =b9AY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix the handling of the phycal timer offset when FEAT_ECV and
     CNTPOFF_EL2 are implemented

   - Restore the functionnality of Permission Indirection that was
     broken by the Fine Grained Trapping rework

   - Cleanup some PMU event sharing code

  MIPS:

   - Fix W=1 build

  s390:

   - One small fix for gisa to avoid stalls

  x86:

   - Truncate writes to PMU counters to the counter's width to avoid
     spurious overflows when emulating counter events in software

   - Set the LVTPC entry mask bit when handling a PMI (to match
     Intel-defined architectural behavior)

   - Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work
     to kick the guest out of emulated halt

   - Fix for loading XSAVE state from an old kernel into a new one

   - Fixes for AMD AVIC

  selftests:

   - Play nice with %llx when formatting guest printf and assert
     statements

   - Clean up stale test metadata

   - Zero-initialize structures in memslot perf test to workaround a
     suspected 'may be used uninitialized' false positives from GCC"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: arm64: timers: Correctly handle TGE flip with CNTPOFF_EL2
  KVM: arm64: POR{E0}_EL1 do not need trap handlers
  KVM: arm64: Add nPIR{E0}_EL1 to HFG traps
  KVM: MIPS: fix -Wunused-but-set-variable warning
  KVM: arm64: pmu: Drop redundant check for non-NULL kvm_pmu_events
  KVM: SVM: Fix build error when using -Werror=unused-but-set-variable
  x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested()
  x86: KVM: SVM: add support for Invalid IPI Vector interception
  x86: KVM: SVM: always update the x2avic msr interception
  KVM: selftests: Force load all supported XSAVE state in state test
  KVM: selftests: Load XSAVE state into untouched vCPU during state test
  KVM: selftests: Touch relevant XSAVE state in guest for state test
  KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}
  x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer
  KVM: selftests: Zero-initialize entire test_result in memslot perf test
  KVM: selftests: Remove obsolete and incorrect test case metadata
  KVM: selftests: Treat %llx like %lx when formatting guest printf
  KVM: x86/pmu: Synthesize at most one PMI per VM-exit
  KVM: x86: Mask LVTPC when handling a PMI
  KVM: x86/pmu: Truncate counter value to allowed width on write
  ...
2023-10-16 18:34:17 -07:00
Stefan Roesch
0374af1da0 mm/ksm: test case for prctl fork/exec workflow
This adds a new test case to the ksm functional tests to make sure that
the KSM setting is inherited by the child process when doing a fork/exec.

Link: https://lkml.kernel.org/r/20230922211141.320789-3-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Carl Klemm <carl@uvos.xyz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-16 15:44:38 -07:00
Swapnil Sapkal
0996e67423 selftests/amd-pstate: Added option to provide perf binary path
In selftests/amd-pstate, distro `perf` is used to capture `perf stat`
while running microbenchmarks. Distro `perf` is not working with
upstream kernel. Fix this by providing an option to give the perf
binary path.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-16 13:06:58 -06:00
Swapnil Sapkal
27aabb2c43 selftests/amd-pstate: Fix broken paths to run workloads in amd-pstate-ut
In selftests/amd-pstate, tbench and gitsource microbenchmarks are
used to compare the performance with different governors. In current
implementation the relative path to run `amd_pstate_tracer.py` is broken.
Fix this by using absolute paths.

Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-16 13:06:50 -06:00
Chuck Lever
f14122b2c2 tools: ynl: Add source files for nfsd netlink protocol
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:10 -04:00
Nicolin Chen
266dcae34d iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly
TEST_LENGTH passing ".size = sizeof(struct _struct) - 1" expects -EINVAL
from "if (ucmd.user_size < op->min_size)" check in iommufd_fops_ioctl().
This has been working when min_size is exactly the size of the structure.

However, if the size of the structure becomes larger than min_size, i.e.
the passing size above is larger than min_size, that min_size sanity no
longer works.

Since the first test in TEST_LENGTH() was to test that min_size sanity
routine, rework it to support a min_size calculation, rather than using
the full size of the structure.

Link: https://lore.kernel.org/r/20231015074648.24185-1-nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-16 11:05:51 -03:00
Andrii Nakryiko
72f8a1de4a bpf: Disambiguate SCALAR register state output in verifier logs
Currently the way that verifier prints SCALAR_VALUE register state (and
PTR_TO_PACKET, which can have var_off and ranges info as well) is very
ambiguous.

In the name of brevity we are trying to eliminate "unnecessary" output
of umin/umax, smin/smax, u32_min/u32_max, and s32_min/s32_max values, if
possible. Current rules are that if any of those have their default
value (which for mins is the minimal value of its respective types: 0,
S32_MIN, or S64_MIN, while for maxs it's U32_MAX, S32_MAX, S64_MAX, or
U64_MAX) *OR* if there is another min/max value that as matching value.
E.g., if smin=100 and umin=100, we'll emit only umin=10, omitting smin
altogether. This approach has a few problems, being both ambiguous and
sort-of incorrect in some cases.

Ambiguity is due to missing value could be either default value or value
of umin/umax or smin/smax. This is especially confusing when we mix
signed and unsigned ranges. Quite often, umin=0 and smin=0, and so we'll
have only `umin=0` leaving anyone reading verifier log to guess whether
smin is actually 0 or it's actually -9223372036854775808 (S64_MIN). And
often times it's important to know, especially when debugging tricky
issues.

"Sort-of incorrectness" comes from mixing negative and positive values.
E.g., if umin is some large positive number, it can be equal to smin
which is, interpreted as signed value, is actually some negative value.
Currently, that smin will be omitted and only umin will be emitted with
a large positive value, giving an impression that smin is also positive.

Anyway, ambiguity is the biggest issue making it impossible to have an
exact understanding of register state, preventing any sort of automated
testing of verifier state based on verifier log. This patch is
attempting to rectify the situation by removing ambiguity, while
minimizing the verboseness of register state output.

The rules are straightforward:
  - if some of the values are missing, then it definitely has a default
  value. I.e., `umin=0` means that umin is zero, but smin is actually
  S64_MIN;
  - all the various boundaries that happen to have the same value are
  emitted in one equality separated sequence. E.g., if umin and smin are
  both 100, we'll emit `smin=umin=100`, making this explicit;
  - we do not mix negative and positive values together, and even if
  they happen to have the same bit-level value, they will be emitted
  separately with proper sign. I.e., if both umax and smax happen to be
  0xffffffffffffffff, we'll emit them both separately as
  `smax=-1,umax=18446744073709551615`;
  - in the name of a bit more uniformity and consistency,
  {u32,s32}_{min,max} are renamed to {s,u}{min,max}32, which seems to
  improve readability.

The above means that in case of all 4 ranges being, say, [50, 100] range,
we'd previously see hugely ambiguous:

    R1=scalar(umin=50,umax=100)

Now, we'll be more explicit:

    R1=scalar(smin=umin=smin32=umin32=50,smax=umax=smax32=umax32=100)

This is slightly more verbose, but distinct from the case when we don't
know anything about signed boundaries and 32-bit boundaries, which under
new rules will match the old case:

    R1=scalar(umin=50,umax=100)

Also, in the name of simplicity of implementation and consistency, order
for {s,u}32_{min,max} are emitted *before* var_off. Previously they were
emitted afterwards, for unclear reasons.

This patch also includes a few fixes to selftests that expect exact
register state to accommodate slight changes to verifier format. You can
see that the changes are pretty minimal in common cases.

Note, the special case when SCALAR_VALUE register is a known constant
isn't changed, we'll emit constant value once, interpreted as signed
value.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-5-andrii@kernel.org
2023-10-16 13:49:18 +02:00
Andrii Nakryiko
cde7851428 selftests/bpf: Make align selftests more robust
Align subtest is very specific and finicky about expected verifier log
output and format. This is often completely unnecessary as in a bunch of
situations test actually cares about var_off part of register state. But
given how exact it is right now, any tiny verifier log changes can lead
to align tests failures, requiring constant adjustment.

This patch tries to make this a bit more robust by making logic first
search for specified register and then allowing to match only portion of
register state, not everything exactly. This will come handly with
follow up changes to SCALAR register output disambiguation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-4-andrii@kernel.org
2023-10-16 13:49:18 +02:00
Andrii Nakryiko
08a7078fea selftests/bpf: Improve missed_kprobe_recursion test robustness
Given missed_kprobe_recursion is non-serial and uses common testing
kfuncs to count number of recursion misses it's possible that some other
parallel test can trigger extraneous recursion misses. So we can't
expect exactly 1 miss. Relax conditions and expect at least one.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-3-andrii@kernel.org
2023-10-16 13:49:18 +02:00
Andrii Nakryiko
2d78928c9c selftests/bpf: Improve percpu_alloc test robustness
Make these non-serial tests filter BPF programs by intended PID of
a test runner process. This makes it isolated from other parallel tests
that might interfere accidentally.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-2-andrii@kernel.org
2023-10-16 13:49:18 +02:00
Binbin Wu
2906063341 selftests/x86/lam: Zero out buffer for readlink()
Zero out the buffer for readlink() since readlink() does not append a
terminating null byte to the buffer.  Also change the buffer length
passed to readlink() to 'PATH_MAX - 1' to ensure the resulting string
is always null terminated.

Fixes: 833c12ce0f ("selftests/x86/lam: Add inherit test cases for linear-address masking")
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20231016062446.695-1-binbin.wu@linux.intel.com
2023-10-16 11:39:57 +02:00
Liming Wu
e07744b43d tools/virtio: Add dma sync api for virtio test
Fixes: 8bd2f71054 ("virtio_ring: introduce dma sync api for virtqueue")
also add dma sync api for virtio test.

Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com>
Message-Id: <20231008031734.1095-1-liming.wu@jaguarmicro.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-16 05:32:23 -04:00
zhujun2
3c4fe89878 selftests: net: remove unused variables
These variables are never referenced in the code, just remove them

Signed-off-by: zhujun2 <zhujun2@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16 09:20:08 +01:00
Kuan-Wei Chiu
de84da588f tools/thermal: Remove unused 'mds' and 'nrhandler' variables
In the previous code, the 'mds' and 'nrhandler' variables were not
utilized in the codebase. Additionally, there was a potential NULL
pointer dereference and memory leak due to improper handling of memory
reallocation failure.

This patch removes the unused 'mds' and 'nrhandler' variables along with
the associated code, addressing the unused variable issue, NULL pointer
dereference issue and the memory leak issue.

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230926173736.1142420-1-visitorckw@gmail.com
2023-10-15 23:40:10 +02:00
Xabier Marquiegui
26285e689c ptp: add testptp mask test
Add option to test timestamp event queue mask manipulation in testptp.

Option -F allows the user to specify a single channel that will be
applied on the mask filter via IOCTL.

The test program will maintain the file open until user input is
received.

This allows checking the effect of the IOCTL in debugfs.

eg:

Console 1:
```
Channel 12 exclusively enabled. Check on debugfs.
Press any key to continue
```

Console 2:
```
0x00000000 0x00000001 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
```

Signed-off-by: Xabier Marquiegui <reibax@gmail.com>
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 20:07:52 +01:00
Xabier Marquiegui
403376ddb4 ptp: add debugfs interface to see applied channel masks
Use debugfs to be able to view channel mask applied to every timestamp
event queue.

Every time the device is opened, a new entry is created in
`$DEBUGFS_MOUNTPOINT/ptpN/$INSTANCE_ADDRESS/mask`.

The mask value can be viewed grouped in 32bit decimal values using cat,
or converted to hexadecimal with the included `ptpchmaskfmt.sh` script.
32 bit values are listed from least significant to most significant.

Signed-off-by: Xabier Marquiegui <reibax@gmail.com>
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 20:07:52 +01:00
Aaron Conole
8eff0e0622 selftests: openvswitch: Fix the ct_tuple for v4
The ct_tuple v4 data structure decode / encode routines were using
the v6 IP address decode and relying on default encode. This could
cause exceptions during encode / decode depending on how a ct4
tuple would appear in a netlink message.

Caught during code review.

Fixes: e52b07aa1a ("selftests: openvswitch: add flow dump support")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 20:02:51 +01:00
Aaron Conole
76035fd12c selftests: openvswitch: Skip drop testing on older kernels
Kernels that don't have support for openvswitch drop reasons also
won't have the drop counter reasons, so we should skip the test
completely.  It previously wasn't possible to build a test case
for this without polluting the datapath, so we introduce a mechanism
to clear all the flows from a datapath allowing us to test for
explicit drop actions, and then clear the flows to build the
original test case.

Fixes: 4242029164 ("selftests: openvswitch: add explicit drop testcase")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 20:02:51 +01:00
Aaron Conole
af846afad5 selftests: openvswitch: Catch cases where the tests are killed
In case of fatal signal, or early abort at least cleanup the current
test case.

Fixes: 25f16c873f ("selftests: add openvswitch selftest suite")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 20:02:51 +01:00
Aaron Conole
92e37f20f2 selftests: openvswitch: Add version check for pyroute2
Paolo Abeni reports that on some systems the pyroute2 version isn't
new enough to run the test suite.  Ensure that we support a minimum
version of 0.6 for all cases (which does include the existing ones).
The 0.6.1 version was released in May of 2021, so should be
propagated to most installations at this point.

The alternative that Paolo proposed was to only skip when the
add-flow is being run.  This would be okay for most cases, except
if a future test case is added that needs to do flow dump without
an associated add (just guessing).  In that case, it could also be
broken and we would need additional skip logic anyway.  Just draw
a line in the sand now.

Fixes: 25f16c873f ("selftests: add openvswitch selftest suite")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Closes: https://lore.kernel.org/lkml/8470c431e0930d2ea204a9363a60937289b7fdbe.camel@redhat.com/
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 20:02:51 +01:00
Paolo Bonzini
2b3f2325e7 KVM selftests fixes for 6.6:
- Play nice with %llx when formatting guest printf and assert statements.
 
  - Clean up stale test metadata.
 
  - Zero-initialize structures in memslot perf test to workaround a suspected
    "may be used uninitialized" false positives from GCC.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmUp1RISHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5I+wP/1hpxu+wC29SxT6PxLcMFCuIy+XV+jSV
 mnFRKhu4Np4E2Th8jFs94qnVN/gdPSNF+kbY//4kqfenf6V6VpNj4rUER2xZY+HV
 0Ben08AzuoDdzn9UikG59GJeyXuZZDqqOdjM2xgefXDqyRYvFhwQleVCULzoGBeB
 iwUD7g2MOJfFjqUhJAV8HDQTzhMgDIc9J+d+oO6J8FRXd64jQ4CBpaM1FDyjmpFp
 RiKMRPkOuVkTbiScCGGwrMdpOXpT1tY5DcIP0N7O6EafyohItQ21hZWth5/1DmIS
 wL1rMTad7uRcdQUdTDLUTegcFRQUjtUo5NS5l92MNFWGlW+J+C5GNVR8znMQCGq1
 ZgovBR72FpFHaup2P2G4mxE9A0M7kiyhLqJFyaM6dvQQTyYSw0gPxKEjUmdljru1
 Gf4SVZPdTAe543xpFJy3ANB4lWsc5uYzWxbIuNNSNl1p6l2em+wvvDroMJujVYU5
 GnKxEtuwEnAUN094a8MMvWR6Q1m1Bfi5brFaPYmUveoZjGb7z8LxebQ4Tg3L08Ga
 rH2Ri/Dt5mF/Pr5hje7eWDOYyDopPIcUWzRpIVA0/ORScqMXOV/oqEVufA+wwlSj
 k0vcRkieRa2cxsKV+F00BSxKXEtdH51k3TEZA7dKFIKDuQk9q46+g5JXr4DF29dl
 eXVDD03hHAFa
 =LJ3q
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-selftests-6.6-fixes' of https://github.com/kvm-x86/linux into HEAD

KVM selftests fixes for 6.6:

 - Play nice with %llx when formatting guest printf and assert statements.

 - Clean up stale test metadata.

 - Zero-initialize structures in memslot perf test to workaround a suspected
   "may be used uninitialized" false positives from GCC.
2023-10-15 08:25:18 -04:00
Arseniy Krasnov
8d211285c6 test/vsock: io_uring rx/tx tests
This adds set of tests which use io_uring for rx/tx. This test suite is
implemented as separated util like 'vsock_test' and has the same set of
input arguments as 'vsock_test'. These tests only cover cases of data
transmission (no connect/bind/accept etc).

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 13:19:43 +01:00
Arseniy Krasnov
e846d679ad test/vsock: MSG_ZEROCOPY support for vsock_perf
To use this option pass '--zerocopy' parameter:

./vsock_perf --zerocopy --sender <cid> ...

With this option MSG_ZEROCOPY flag will be passed to the 'send()' call.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 13:19:42 +01:00
Arseniy Krasnov
bc36442ef3 test/vsock: MSG_ZEROCOPY flag tests
This adds three tests for MSG_ZEROCOPY feature:
1) SOCK_STREAM tx with different buffers.
2) SOCK_SEQPACKET tx with different buffers.
3) SOCK_STREAM test to read empty error queue of the socket.

Patch also works as preparation for the next patches for tools in this
patchset: vsock_perf and vsock_uring_test:
1) Adds several new functions to util.c - they will be also used by
   vsock_uring_test.
2) Adds two new functions for MSG_ZEROCOPY handling to a new source
   file - such source will be shared between vsock_test, vsock_perf and
   vsock_uring_test, thus avoiding code copy-pasting.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15 13:19:42 +01:00
Jiri Pirko
0f4d44f6ee netlink: specs: devlink: fix reply command values
Make sure that the command values used for replies are correct. This is
only affecting generated userspace helpers, no change on kernel code.

Fixes: 7199c86247 ("netlink: specs: devlink: add commands that do per-instance dump")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231012115811.298129-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 17:27:27 -07:00
Dave Marchevsky
e0e1a7a5fc selftests/bpf: Add tests for open-coded task_vma iter
The open-coded task_vma iter added earlier in this series allows for
natural iteration over a task's vmas using existing open-coded iter
infrastructure, specifically bpf_for_each.

This patch adds a test demonstrating this pattern and validating
correctness. The vma->vm_start and vma->vm_end addresses of the first
1000 vmas are recorded and compared to /proc/PID/maps output. As
expected, both see the same vmas and addresses - with the exception of
the [vsyscall] vma - which is explained in a comment in the prog_tests
program.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231013204426.1074286-5-davemarchevsky@fb.com
2023-10-13 15:48:58 -07:00
Dave Marchevsky
45b38941c8 selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
Further patches in this series will add a struct bpf_iter_task_vma,
which will result in a name collision with the selftest prog renamed in
this patch. Rename the selftest to avoid the collision.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231013204426.1074286-3-davemarchevsky@fb.com
2023-10-13 15:48:58 -07:00
Ido Schimmel
aa13e5241a selftests: fib_tests: Count all trace point invocations
The tests rely on the IPv{4,6} FIB trace points being triggered once for
each forwarded packet. If receive processing is deferred to the
ksoftirqd task these invocations will not be counted and the tests will
fail. Fix by specifying the '-a' flag to avoid perf from filtering on
the mausezahn task.

Before:

 # ./fib_tests.sh -t ipv4_mpath_list

 IPv4 multipath list receive tests
     TEST: Multipath route hit ratio (.68)                               [FAIL]

 # ./fib_tests.sh -t ipv6_mpath_list

 IPv6 multipath list receive tests
     TEST: Multipath route hit ratio (.27)                               [FAIL]

After:

 # ./fib_tests.sh -t ipv4_mpath_list

 IPv4 multipath list receive tests
     TEST: Multipath route hit ratio (1.00)                              [ OK ]

 # ./fib_tests.sh -t ipv6_mpath_list

 IPv6 multipath list receive tests
     TEST: Multipath route hit ratio (.99)                               [ OK ]

Fixes: 8ae9efb859 ("selftests: fib_tests: Add multipath list receive tests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Link: https://lore.kernel.org/r/20231010132113.3014691-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 14:28:22 -07:00
Ido Schimmel
dbb13378ba selftests: fib_tests: Disable RP filter in multipath list receive test
The test relies on the fib:fib_table_lookup trace point being triggered
once for each forwarded packet. If RP filter is not disabled, the trace
point will be triggered twice for each packet (for source validation and
forwarding), potentially masking actual bugs. Fix by explicitly
disabling RP filter.

Before:

 # ./fib_tests.sh -t ipv4_mpath_list

 IPv4 multipath list receive tests
     TEST: Multipath route hit ratio (1.99)                              [ OK ]

After:

 # ./fib_tests.sh -t ipv4_mpath_list

 IPv4 multipath list receive tests
     TEST: Multipath route hit ratio (.99)                               [ OK ]

Fixes: 8ae9efb859 ("selftests: fib_tests: Add multipath list receive tests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Link: https://lore.kernel.org/r/20231010132113.3014691-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 14:28:22 -07:00
Maciej Wieczor-Retman
508934b5d1 selftests/resctrl: Move run_benchmark() to a more fitting file
resctrlfs.c contains mostly functions that interact in some way with
resctrl FS entries while functions inside resctrl_val.c deal with
measurements and benchmarking.

run_benchmark() is located in resctrlfs.c even though it's purpose
is not interacting with the resctrl FS but to execute cache checking
logic.

Move run_benchmark() to resctrl_val.c just before resctrl_val() that
makes use of run_benchmark(). Make run_benchmark() static since it's
not used between multiple files anymore.

Remove return comment from kernel-doc since the function is type void.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:59:06 -06:00
Maciej Wieczor-Retman
20d96b25cc selftests/resctrl: Fix schemata write error check
Writing bitmasks to the schemata can fail when the bitmask doesn't
adhere to constraints defined by what a particular CPU supports.
Some example of constraints are max length or having contiguous bits.
The driver should properly return errors when any rule concerning
bitmask format is broken.

Resctrl FS returns error codes from fprintf() only when fclose() is
called. Current error checking scheme allows invalid bitmasks to be
written into schemata file and the selftest doesn't notice because the
fclose() error code isn't checked.

Substitute fopen(), flose() and fprintf() with open(), close() and
write() to avoid error code buffering between fprintf() and fclose().

Remove newline character from the schema string after writing it to
the schemata file so it prints correctly before function return.

Pass the string generated with strerror() to the "reason" buffer so
the error message is more verbose. Extend "reason" buffer so it can hold
longer messages.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:59:01 -06:00
Ilpo Järvinen
ef43c30858 selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests
The initial value of 5% chosen for the maximum allowed percentage
difference between resctrl mbm value and IMC mbm value in

commit 06bd03a57f ("selftests/resctrl: Fix MBA/MBM results reporting
       format") was "randomly chosen value" (as admitted by the changelog).

When running tests in our lab across a large number platforms, 5%
difference upper bound for success seems a bit on the low side for the
MBA and MBM tests. Some platforms produce outliers that are slightly
above that, typically 6-7%, which leads MBA/MBM test frequently
failing.

Replace the "randomly chosen value" with a success bound that is based
on those measurements across large number of platforms by relaxing the
MBA/MBM success bound to 8%. The relaxed bound removes the failures due
the frequent outliers.

Fixed commit description style error during merge:
Shuah Khan <skhan@linuxfoundation.org>

Fixes: 06bd03a57f ("selftests/resctrl: Fix MBA/MBM results reporting format")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:54:38 -06:00
Ilpo Järvinen
06035f0194 selftests/resctrl: Fix feature checks
The MBA and CMT tests expect support of other features to be able to
run.

When platform only supports MBA but not MBM, MBA test will fail with:
Failed to open total bw file: No such file or directory

When platform only supports CMT but not CAT, CMT test will fail with:
Failed to open bit mask file '/sys/fs/resctrl/info/L3/cbm_mask': No such file or directory

It leads to the test reporting test fail (even if no test was run at
all).

Extend feature checks to cover these two conditions to show these tests
were skipped rather than failed.

Fixes: ee0415681e ("selftests/resctrl: Use resctrl/info for feature detection")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> # selftests/resctrl: Refactor feature check to use resource and feature name
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:54:34 -06:00
Ilpo Järvinen
d56e5da0e0 selftests/resctrl: Refactor feature check to use resource and feature name
Feature check in validate_resctrl_feature_request() takes in the test
name string and maps that to what to check per test.

Pass resource and feature names to validate_resctrl_feature_request()
directly rather than deriving them from the test name inside the
function which makes the feature check easier to extend for new test
cases.

Use !! in the return statement to make the boolean conversion more
obvious even if it is not strictly necessary from correctness point of
view (to avoid it looking like the function is returning a freed
pointer).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> # selftests/resctrl: Remove duplicate feature check from CMT test
Cc: <stable@vger.kernel.org> # selftests/resctrl: Move _GNU_SOURCE define into Makefile
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:54:27 -06:00
Ilpo Järvinen
3a1e4a91aa selftests/resctrl: Move _GNU_SOURCE define into Makefile
_GNU_SOURCE is defined in resctrl.h. Defining _GNU_SOURCE has a large
impact on what gets defined when including headers either before or
after it. This can result in compile failures if .c file decides to
include a standard header file before resctrl.h.

It is safer to define _GNU_SOURCE in Makefile so it is always defined
regardless of in which order includes are done.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:54:21 -06:00
Ilpo Järvinen
030b48fb2c selftests/resctrl: Remove duplicate feature check from CMT test
The test runner run_cmt_test() in resctrl_tests.c checks for CMT
feature and does not run cmt_resctrl_val() if CMT is not supported.
Then cmt_resctrl_val() also check is CMT is supported.

Remove the duplicated feature check for CMT from cmt_resctrl_val().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:54:16 -06:00
Ilpo Järvinen
3aff514644 selftests/resctrl: Extend signal handler coverage to unmount on receiving signal
Unmounting resctrl FS has been moved into the per test functions in
resctrl_tests.c by commit caddc0fbe4 ("selftests/resctrl: Move
resctrl FS mount/umount to higher level"). In case a signal (SIGINT,
SIGTERM, or SIGHUP) is received, the running selftest is aborted by
ctrlc_handler() which then unmounts resctrl fs before exiting. The
current section between signal_handler_register() and
signal_handler_unregister(), however, does not cover the entire
duration when resctrl FS is mounted.

Move signal_handler_register() and signal_handler_unregister() calls
from per test files into resctrl_tests.c to properly unmount resctrl
fs. In order to not add signal_handler_register()/unregister() n times,
create helpers test_prepare() and test_cleanup().

Do not call ksft_exit_fail_msg() in test_prepare() but only in the per
test function to keep the control flow cleaner without adding calls to
exit() deep into the call chain.

Adjust child process kill() call in ctrlc_handler() to only be invoked
if the child was already forked.

Fixes: caddc0fbe4 ("selftests/resctrl: Move resctrl FS mount/umount to higher level")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:54:09 -06:00
Ilpo Järvinen
beb7f47184 selftests/resctrl: Fix uninitialized .sa_flags
signal_handler_unregister() calls sigaction() with uninitializing
sa_flags in the struct sigaction.

Make sure sa_flags is always initialized in signal_handler_unregister()
by initializing the struct sigaction when declaring it. Also add the
initialization to signal_handler_register() even if there are no know
bugs in there because correctness is then obvious from the code itself.

Fixes: 73c55fa5ab ("selftests/resctrl: Commonize the signal handler register/unregister for all tests")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:54:04 -06:00
Ilpo Järvinen
f23c7925e9 selftests/resctrl: Cleanup benchmark argument parsing
Benchmark argument is handled by custom argument parsing code which is
more complicated than it needs to be.

Process benchmark argument within the normal getopt() handling and drop
unnecessary ben_ind and has_ben variables. When -b is given, terminate
the argument processing as -b consumes all remaining arguments.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: "Wieczor-Retman, Maciej" <maciej.wieczor-retman@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:29:08 -06:00
Ilpo Järvinen
149ff72953 selftests/resctrl: Remove ben_count variable
ben_count is only used to write the terminator for the list. It is
enough to use i from the loop so no need for another variable.

Remove ben_count variable as it is not needed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: "Wieczor-Retman, Maciej" <maciej.wieczor-retman@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:29:02 -06:00
Ilpo Järvinen
e33cb5702a selftests/resctrl: Make benchmark command const and build it with pointers
Benchmark command is used in multiple tests so it should not be
mutated by the tests but CMT test alters span argument. Due to the
order of tests (CMT test runs last), mutating the span argument in CMT
test does not trigger any real problems currently.

Mark benchmark_cmd strings as const and setup the benchmark command
using pointers. Because the benchmark command becomes const, the input
arguments can be used directly. Besides being simpler, using the input
arguments directly also removes the internal size restriction.

CMT test has to create a copy of the benchmark command before altering
the benchmark command.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: "Wieczor-Retman, Maciej" <maciej.wieczor-retman@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:28:55 -06:00
Ilpo Järvinen
47809eb70c selftests/resctrl: Reorder resctrl FS prep code and benchmark_cmd init
Benchmark command is initialized before resctrl FS check and
preparation code that can call ksft_exit_skip(). There is no strong
reason why the resctrl FS support check and unmounting it (if already
mounted), has to be done after the benchmark command initialization.

Move benchmark command initialization such that it is done not until
right before the tests commence. This simplifies rollback handling when
benchmark command initialization starts to use dynamic allocation (in a
change following this).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: "Wieczor-Retman, Maciej" <maciej.wieczor-retman@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:28:49 -06:00
Ilpo Järvinen
b1a901e078 selftests/resctrl: Simplify span lifetime
struct resctrl_val_param contains span member. resctrl_val(), however,
never uses it because the value of span is embedded into the default
benchmark command and parsed from it by run_benchmark().

Remove span from resctrl_val_param. Provide DEFAULT_SPAN for the code
that needs it. CMT and CAT tests communicate span that is different
from the DEFAULT_SPAN between their internal functions which is
converted into passing it directly as a parameter.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: "Wieczor-Retman, Maciej" <maciej.wieczor-retman@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-13 14:28:44 -06:00