Commit graph

34399 commits

Author SHA1 Message Date
Prarit Bhargava
9c08581728 tools/power turbostat: Provide better debug messages for failed capabilities accesses
turbostat reports some capabilities access errors and not others.  Provide
the same debug message for all errors.

[lenb: remove extra quotes]

Cc: David Arcari <darcari@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2023-03-17 11:16:30 -04:00
Len Brown
884a1f9561 tools/power turbostat: update dump of SECONDARY_TURBO_RATIO_LIMIT
cosmetic only (but useful if you copy/paste)

Signed-off-by: Len Brown <len.brown@intel.com>
2023-03-17 10:59:17 -04:00
Ido Schimmel
62199e3f16 selftests: net: Add VXLAN MDB test
Add test cases for VXLAN MDB, testing the control and data paths. Two
different sets of namespaces (i.e., ns{1,2}_v4 and ns{1,2}_v6) are used
in order to test VXLAN MDB with both IPv4 and IPv6 underlays,
respectively.

Example truncated output:

 # ./test_vxlan_mdb.sh
 [...]
 Tests passed: 620
 Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17 08:05:50 +00:00
Nikolay Aleksandrov
222c94ec0a selftests: bonding: add tests for ether type changes
Add new network selftests for the bonding device which exercise the ether
type changing call paths. They also test for the recent syzbot bug[1] which
causes a warning and results in wrong device flags (IFF_SLAVE missing).
The test adds three bond devices and a nlmon device, enslaves one of the
bond devices to the other and then uses the nlmon device for successful
and unsuccesful enslaves both of which change the bond ether type. Thus
we can test for both MASTER and SLAVE flags at the same time.

If the flags are properly restored we get:
TEST: Change ether type of an enslaved bond device with unsuccessful enslave   [ OK ]
TEST: Change ether type of an enslaved bond device with successful enslave   [ OK ]

[1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17 07:56:41 +00:00
Jakub Kicinski
cfab77c0b5 ynl: make the tooling check the license
The (only recently documented) expectation is that all specs
are under a certain license, but we don't actually enforce it.
What's worse we then go ahead and assume the license was right,
outputting the expected license into generated files.

Fixes: 37d9df224d ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16 21:22:44 -07:00
Jakub Kicinski
4e16b6a748 ynl: broaden the license even more
I relicensed Netlink spec code to GPL-2.0 OR BSD-3-Clause but
we still put a slightly different license on the uAPI header
than the rest of the code. Use the Linux-syscall-note on all
the specs and all generated code. It's moot for kernel code,
but should not hurt. This way the licenses match everywhere.

Cc: Chuck Lever <chuck.lever@oracle.com>
Fixes: 37d9df224d ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16 21:20:32 -07:00
Jakub Kicinski
054abb515f tools: ynl: make definitions optional again
definitions are optional, commit in question breaks cli for ethtool.

Fixes: 6517a60b03 ("tools: ynl: move the enum classes to shared code")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16 21:20:32 -07:00
Po-Hsu Lin
24994513ad selftests: net: devlink_port_split.py: skip test if no suitable device available
The `devlink -j port show` command output may not contain the "flavour"
key, an example from Ubuntu 22.10 s390x LPAR(5.19.0-37-generic), with
mlx4 driver and iproute2-5.15.0:
  {"port":{"pci/0001:00:00.0/1":{"type":"eth","netdev":"ens301"},
           "pci/0001:00:00.0/2":{"type":"eth","netdev":"ens301d1"},
           "pci/0002:00:00.0/1":{"type":"eth","netdev":"ens317"},
           "pci/0002:00:00.0/2":{"type":"eth","netdev":"ens317d1"}}}

This will cause a KeyError exception.

Create a validate_devlink_output() to check for this "flavour" from
devlink command output to avoid this KeyError exception. Also let
it handle the check for `devlink -j dev show` output in main().

Apart from this, if the test was not started because the max lanes of
the designated device is 0. The script will still return 0 and thus
causing a false-negative test result.

Use a found_max_lanes flag to determine if these tests were skipped
due to this reason and return KSFT_SKIP to make it more clear.

Link: https://bugs.launchpad.net/bugs/1937133
Fixes: f3348a82e7 ("selftests: net: Add port split test")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Link: https://lore.kernel.org/r/20230315165353.229590-1-po-hsu.lin@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16 17:38:05 -07:00
Linus Torvalds
0ddc84d2dd ARM64:
* Address a rather annoying bug w.r.t. guest timer offsetting.  The
   synchronization of timer offsets between vCPUs was broken, leading to
   inconsistent timer reads within the VM.
 
 x86:
 
 * New tests for the slow path of the EVTCHNOP_send Xen hypercall
 
 * Add missing nVMX consistency checks for CR0 and CR4
 
 * Fix bug that broke AMD GATag on 512 vCPU machines
 
 Selftests:
 
 * Skip hugetlb tests if huge pages are not available
 
 * Sync KVM exit reasons
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmQQhBMUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPJsAf/aqKQtRJH2YDHuS/OvlH546lgrPTY
 zc2S187N4OofqKvm8HWAJOPravGI4Lkc3Jvlq2jPnlwl66musfako5YGXyyJesIP
 9pc32jxwbhpHyp39tSTxlNbjE68E4Tau2iFa5n6fq/2BOEkZNGRhTDWPfbJV4yZO
 JpkaguNm1nuZfKnRNxaaYhJwbqPIBc8l+Y3Q3nw6QLZHaNoupsd2pY3c4SuTYFcW
 UxUaFtNkpXQxbwve0MWFLh/JztOzFhQcdMi3OSTBYZz32T0vncjXFDuARfKLNKyw
 FgwkHgs2/d35AgE0JEwz1u6+/RMHvUheG08zkp8//lINfNgF/Cka7Dz2uA==
 =B1LI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Address a rather annoying bug w.r.t. guest timer offsetting. The
     synchronization of timer offsets between vCPUs was broken, leading
     to inconsistent timer reads within the VM.

  x86:

   - New tests for the slow path of the EVTCHNOP_send Xen hypercall

   - Add missing nVMX consistency checks for CR0 and CR4

   - Fix bug that broke AMD GATag on 512 vCPU machines

  Selftests:

   - Skip hugetlb tests if huge pages are not available

   - Sync KVM exit reasons"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: selftests: Sync KVM exit reasons in selftests
  KVM: selftests: Add macro to generate KVM exit reason strings
  KVM: selftests: Print expected and actual exit reason in KVM exit reason assert
  KVM: selftests: Make vCPU exit reason test assertion common
  KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
  KVM: selftests: Use enum for test numbers in xen_shinfo_test
  KVM: selftests: Add helpers to make Xen-style VMCALL/VMMCALL hypercalls
  KVM: selftests: Move the guts of kvm_hypercall() to a separate macro
  KVM: SVM: WARN if GATag generation drops VM or vCPU ID information
  KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUs
  KVM: SVM: Fix a benign off-by-one bug in AVIC physical table mask
  selftests: KVM: skip hugetlb tests if huge pages are not available
  KVM: VMX: Use tabs instead of spaces for indentation
  KVM: VMX: Fix indentation coding style issue
  KVM: nVMX: remove unnecessary #ifdef
  KVM: nVMX: add missing consistency checks for CR0 and CR4
  KVM: arm64: timers: Convert per-vcpu virtual offset to a global value
2023-03-16 11:32:12 -07:00
Arseniy Krasnov
7e699d2a4e test/vsock: copy to user failure test
This adds SOCK_STREAM and SOCK_SEQPACKET tests for invalid buffer case.
It tries to read data to NULL buffer (data already presents in socket's
queue), then uses valid buffer. For SOCK_STREAM second read must return
data, because skbuff is not dropped, but for SOCK_SEQPACKET skbuff will
be dropped by kernel, and 'recv()' will return EAGAIN.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16 17:28:23 +00:00
Linus Torvalds
9c1bec9c0b linux-kselftest-fixes-6.3-rc3
This kselftest fixes update for Linux 6.3-rc3 consists of a fix to
 amd-pstate test Makefile and a fix to LLVM build for i386 and x86_64
 in kselftest common lib.mk.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmQSDZ8ACgkQCwJExA0N
 QxwBDA/9FskE3TVhoTmB3XkDzrKDJqgvYxOwo6TbUU+/gRXqBN5d8047eX9vRRav
 TgTGS+TMLSm1+Mp1iDiz64x7hxeWcs5v9xTqVbCUqjZPPcxq3K4vT6ekBRHULYg+
 eJVN3J+TxbMCxUrOLceojU81zypTg0KLNmPns5ghM1GVf/34YYmHjOsTUBrqMMxc
 WWkDF7J9KC3sIYHg4+aFzUR/GQnjQKbyx/u+5cwY9abiC1AwTFOpWnNogoQphq9J
 qTo2kinQGRm0AOXbZE1SxnZMRosLleZ1NfbYMyaNDLCoXyedRjB4n6u6mtZ79gBR
 lqp761GWT8xtH5e7gSXuzUlWZ6s1EUgTadQyHT/6gHcBrorVCRGaHhc0UTf3SOBU
 7czfWgDcfXIt6+Y9ARXhfdoTDo4n5xSGl7tt4RUKyA2CUcF5PnYEea//smRocwO8
 Ze+Lz3StqpeW/FluX98yMzs14HRB2O+iL22SLRHIRAhKKo9K0gVd5P4G5KWX6Eto
 YR7dD9aIgNiUWlEzjBCb4V9zLmD+54Cq202I/IR4WO1/jOLU2xfY4k7oHGKTLB0+
 EOcGnXupCMjLFVgycaFB/g68ZejtAotpVbzI4+1y+1wWx+/pwch1NrARd0cWyj5a
 RI56dBQPlqYrB3RS9aNdTAmIuoe4VmqsaGXXypwGissxNzaY8r0=
 =ITKW
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "A fix to amd-pstate test Makefile and a fix to LLVM build for x86 in
  kselftest common lib.mk"

* tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: fix LLVM build for i386 and x86_64
  selftests: amd-pstate: fix TEST_FILES
2023-03-15 12:20:37 -07:00
Kuniyuki Iwashima
13715acf8a selftest: Add test for bind() conflicts.
The test checks if (IPv4, IPv6) address pair properly conflict or not.

  * IPv4
    * 0.0.0.0
    * 127.0.0.1

  * IPv6
    * ::
    * ::1

If the IPv6 address is [::], the second bind() always fails.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15 00:24:10 -07:00
Chen Yu
0bc23d8b22 ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
The user provides arbitrary non-numeic value to level and type,
which could bring unexpected behavior. In this case the expected
behavior would be to throw an error.

 pfrut -h
usage: pfrut [OPTIONS]
code injection:
-l, --load
-s, --stage
-a, --activate
-u, --update [stage and activate]
-q, --query
-d, --revid
update telemetry:
-G, --getloginfo
-T, --type(0:execution, 1:history)
-L, --level(0, 1, 2, 4)
-R, --read
-D, --revid log

 pfrut -T A
 pfrut -G
log_level:0
log_type:0
log_revid:2
max_data_size:65536
chunk1_size:0
chunk2_size:1530
rollover_cnt:0
reset_cnt:17

Fix this by restricting the input to be in the expected range.

Reported-by: Hariganesh Govindarajulu <hariganesh.govindarajulu@intel.com>
Suggested-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-14 20:38:13 +01:00
Todd Brandt
6fa7f53735 pm-graph: sleepgraph: Avoid crashing on binary data in device names
A regression has occurred in the hid-sensor code where a device
name string has not been initialized to 0, and ends up without
a NULL char and is printed with %s. This includes random binary
data in the device name, which makes its way into the ftrace output
and ends up crashing sleepgraph because it expects the ftrace output
to be ASCII only.

For example: "HID-SENSOR-INT-020b?.39.auto" ends up in ftrace instead
of "HID-SENSOR-INT-020b.39.auto". It causes this crash in sleepgraph:

  File "/usr/bin/sleepgraph", line 5579, in executeSuspend
    for line in fp:
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position
1568: invalid start byte

The issue is present in 6.3-rc1 and is described in full here:
https://bugzilla.kernel.org/show_bug.cgi?id=217169

A separate fix has been submitted to have this issue repaired, but
it has also exposed a larger bug in sleepgraph, since nothing should
make sleepgraph crash. Sleepgraph needs to be able to handle binary
data showing up in ftrace gracefully.

Modify the ftrace processing code to treat it as potentially binary
and to filter out binary data and leave just the ASCII.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217169
Fixes: 98c062e824 ("HID: hid-sensor-custom: Allow more custom iio sensors")
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-14 18:51:59 +01:00
Vipin Sharma
f3e707413d KVM: selftests: Sync KVM exit reasons in selftests
Add missing KVM_EXIT_* reasons in KVM selftests from
include/uapi/linux/kvm.h

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Message-Id: <20230204014547.583711-5-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:20:10 -04:00
Sean Christopherson
1b3d660e5d KVM: selftests: Add macro to generate KVM exit reason strings
Add and use a macro to generate the KVM exit reason strings array
instead of relying on developers to correctly copy+paste+edit each
string.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204014547.583711-4-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:20:10 -04:00
Vipin Sharma
6f974494b8 KVM: selftests: Print expected and actual exit reason in KVM exit reason assert
Print what KVM exit reason a test was expecting and what it actually
got int TEST_ASSERT_KVM_EXIT_REASON().

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Message-Id: <20230204014547.583711-3-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:20:09 -04:00
Vipin Sharma
c96f57b080 KVM: selftests: Make vCPU exit reason test assertion common
Make TEST_ASSERT_KVM_EXIT_REASON() macro and replace all exit reason
test assert statements with it.

No functional changes intended.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Message-Id: <20230204014547.583711-2-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:20:09 -04:00
David Woodhouse
e6239a4ec5 KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
When kvm_xen_evtchn_send() takes the slow path because the shinfo GPC
needs to be revalidated, it used to violate the SRCU vs. kvm->lock
locking rules and potentially cause a deadlock.

Now that lockdep is learning to catch such things, make sure that code
path is exercised by the selftest.

Link: https://lore.kernel.org/all/20230113124606.10221-2-dwmw2@infradead.org
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:20:08 -04:00
David Woodhouse
e7062a98d0 KVM: selftests: Use enum for test numbers in xen_shinfo_test
The xen_shinfo_test started off with very few iterations, and the numbers
we used in GUEST_SYNC() were precisely mapped to the RUNSTATE_xxx values
anyway to start with.

It has since grown quite a few more tests, and it's kind of awful to be
handling them all as bare numbers. Especially when I want to add a new
test in the middle. Define an enum for the test stages, and use it both
in the guest code and the host switch statement.

No functional change, if I can count to 24.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:20:08 -04:00
Sean Christopherson
c0c76d9993 KVM: selftests: Add helpers to make Xen-style VMCALL/VMMCALL hypercalls
Add wrappers to do hypercalls using VMCALL/VMMCALL and Xen's register ABI
(as opposed to full Xen-style hypercalls through a hypervisor provided
page).  Using the common helpers dedups a pile of code, and uses the
native hypercall instruction when running on AMD.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:20:08 -04:00
Sean Christopherson
4009e0bb7b KVM: selftests: Move the guts of kvm_hypercall() to a separate macro
Extract the guts of kvm_hypercall() to a macro so that Xen hypercalls,
which have a different register ABI, can reuse the VMCALL vs. VMMCALL
logic.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:20:07 -04:00
Paolo Bonzini
3dc40cf89b selftests: KVM: skip hugetlb tests if huge pages are not available
Right now, if KVM memory stress tests are run with hugetlb sources but hugetlb is
not available (either in the kernel or because /proc/sys/vm/nr_hugepages is 0)
the test will fail with a memory allocation error.

This makes it impossible to add tests that default to hugetlb-backed memory,
because on a machine with a default configuration they will fail.  Therefore,
check HugePages_Total as well and, if zero, direct the user to enable hugepages
in procfs.  Furthermore, return KSFT_SKIP whenever hugetlb is not available.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:20:06 -04:00
Pedro Tammela
c66b2111c9 selftests: tc-testing: add tests for action binding
Add tests that check if filters can bind actions, that is create an
action independently and then bind to a filter.

tdc-tests under category 'infra':
1..18
ok 1 abdc - Reference pedit action object in filter
ok 2 7a70 - Reference mpls action object in filter
ok 3 d241 - Reference bpf action object in filter
ok 4 383a - Reference connmark action object in filter
ok 5 c619 - Reference csum action object in filter
ok 6 a93d - Reference ct action object in filter
ok 7 8bb5 - Reference ctinfo action object in filter
ok 8 2241 - Reference gact action object in filter
ok 9 35e9 - Reference gate action object in filter
ok 10 b22e - Reference ife action object in filter
ok 11 ef74 - Reference mirred action object in filter
ok 12 2c81 - Reference nat action object in filter
ok 13 ac9d - Reference police action object in filter
ok 14 68be - Reference sample action object in filter
ok 15 cf01 - Reference skbedit action object in filter
ok 16 c109 - Reference skbmod action object in filter
ok 17 4abc - Reference tunnel_key action object in filter
ok 18 dadd - Reference vlan action object in filter

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230309175554.304824-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13 16:54:16 -07:00
Alexei Starovoitov
a33a6eaa19 Merge branch 'bpf: Allow reads from uninit stack'
Merge commit bf9bec4cb3 ("Merge branch 'bpf: Allow reads from uninit stack'")
from bpf-next to bpf tree to address verification issues in some programs
due to stack usage.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-13 13:21:22 -07:00
Linus Torvalds
fc89d7fb49 virtio,vhost,vdpa: bugfixes
Some fixes accumulated so far.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmQOwvcPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRp2msH/0EwXAZCieuUnRU2c3wGEYnP+wnQbIBH3aUJ
 mpC3vFR9Bdo2bhxYcRjSJ+tUqWmUPn+0XZ8dYR9zu4lDli69cbWflij0+AJ/iGEe
 Rm6vhzq9KB4CKRulaW7gEdY+nXaIdoCAhdiI8Ka5QHd17wtnQ7X8g7HJRk+6m8hy
 tMfm8osrDXcHSGClipHKkw63n2vflAwA0cLfQjZYec4gwIMqOIDrztvvPJorxTLo
 NNHUF1wzyYCiVoWotDjfNcEiYULbSSE3Gau8KDyF6Y1VrkfyakXDAiH77GlttPuv
 j7DGIuWUGn84/z51C2HY164ux/bfWwKUHntGk6NvliZDWqeo718=
 =6O4V
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "Some virtio / vhost / vdpa fixes accumulated so far"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  tools/virtio: Ignore virtio-trace/trace-agent
  vdpa_sim: set last_used_idx as last_avail_idx in vdpasim_queue_ready
  vhost-vdpa: free iommu domain after last use during cleanup
  vdpa/mlx5: should not activate virtq object when suspended
  vp_vdpa: fix the crash in hot unplug with vp_vdpa
2023-03-13 10:43:09 -07:00
Rong Tao
ae43c20da2 tools/virtio: Ignore virtio-trace/trace-agent
since commit 108fc82596e3("tools: Add guest trace agent as a user tool")
introduce virtio-trace/trace-agent, it should be ignored in the git tree.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_52B2BC2F47540A5FEB46E710BD0C8485B409@qq.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-13 02:29:12 -04:00
Linus Torvalds
f5eded1f5f kernel.fork.v6.3-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZA2l2QAKCRCRxhvAZXjc
 okR1AP9UjPVvVTU3DRp7Giqyv1rdv/iaCVRtEDQmhzDflksioQEAyJXTt+3YOTNl
 sSocNYBhVBsijelICeq7hZrmVP9CrgM=
 =C0cC
 -----END PGP SIGNATURE-----

Merge tag 'kernel.fork.v6.3-rc2' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux

Pull clone3 fix from Christian Brauner:
 "A simple fix for the clone3() system call.

  The CLONE_NEWTIME allows the creation of time namespaces. The flag
  reuses a bit from the CSIGNAL bits that are used in the legacy clone()
  system call to set the signal that gets sent to the parent after the
  child exits.

  The clone3() system call doesn't rely on CSIGNAL anymore as it uses a
  dedicated .exit_signal field in struct clone_args. So we blocked all
  CSIGNAL bits in clone3_args_valid(). When CLONE_NEWTIME was introduced
  and reused a CSIGNAL bit we forgot to adapt clone3_args_valid()
  causing CLONE_NEWTIME with clone3() to be rejected. Fix this"

* tag 'kernel.fork.v6.3-rc2' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
  selftests/clone3: test clone3 with CLONE_NEWTIME
  fork: allow CLONE_NEWTIME in clone3 flags
2023-03-12 09:04:28 -07:00
Matthieu Baerts
840742b7ed selftests: mptcp: userspace pm: fix printed values
In case of errors, the printed message had the expected and the seen
value inverted.

This patch simply correct the order: first the expected value, then the
one that has been seen.

Fixes: 10d4273411 ("selftests: mptcp: userspace: print error details if any")
Cc: stable@vger.kernel.org
Acked-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:42:56 -08:00
Lorenzo Bianconi
f85949f982 xdp: add xdp_set_features_flag utility routine
Introduce xdp_set_features_flag utility routine in order to update
dynamically xdp_features according to the dynamic hw configuration via
ethtool (e.g. changing number of hw rx/tx queues).
Add xdp_clear_features_flag() in order to clear all xdp_feature flag.

Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:33:47 -08:00
Lorenzo Bianconi
bf51d27704 tools: ynl: fix get_mask utility routine
Fix get_mask utility routine in order to take into account possible gaps
in the elements list.

Fixes: be5bea1cc0 ("net: add basic C code generators for Netlink")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:33:47 -08:00
Lorenzo Bianconi
8f76a4f80f tools: ynl: fix render-max for flags definition
Properly manage render-max property for flags definition type
introducing mask value and setting it to (last_element << 1) - 1
instead of adding max value set to last_element + 1

Fixes: be5bea1cc0 ("net: add basic C code generators for Netlink")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:33:47 -08:00
Alexei Starovoitov
e8c8361cfd selftests/bpf: Fix progs/test_deny_namespace.c issues.
The following build error can be seen:
progs/test_deny_namespace.c:22:19: error: call to undeclared function 'BIT_LL'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
        __u64 cap_mask = BIT_LL(CAP_SYS_ADMIN);

The struct kernel_cap_struct no longer exists in the kernel as well.
Adjust bpf prog to fix both issues.

Fixes: f122a08b19 ("capability: just use a 'u64' instead of a 'u32[2]' array")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 12:54:12 -08:00
Alexei Starovoitov
32513d40d9 selftests/bpf: Fix progs/find_vma_fail1.c build error.
The commit 11e456cae9 ("selftests/bpf: Fix compilation errors: Assign a value to a constant")
fixed the issue cleanly in bpf-next.
This is an alternative fix in bpf tree to avoid merge conflict between bpf and bpf-next.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 12:41:18 -08:00
Guillaume Tucker
624c60f326 selftests: fix LLVM build for i386 and x86_64
Add missing cases for the i386 and x86_64 architectures when
determining the LLVM target for building kselftest.

Fixes: 795285ef24 ("selftests: Fix clang cross compilation")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-03-10 13:41:10 -07:00
Jesus Sanchez-Palencia
d6f7ff9dd3 libbpf: Revert poisoning of strlcpy
This reverts commit 6d0c4b11e743("libbpf: Poison strlcpy()").

It added the pragma poison directive to libbpf_internal.h to protect
against accidental usage of strlcpy but ended up breaking the build for
toolchains based on libcs which provide the strlcpy() declaration from
string.h (e.g. uClibc-ng). The include order which causes the issue is:

    string.h,
    from Iibbpf_common.h:12,
    from libbpf.h:20,
    from libbpf_internal.h:26,
    from strset.c:9:

Fixes: 6d0c4b11e7 ("libbpf: Poison strlcpy()")
Signed-off-by: Jesus Sanchez-Palencia <jesussanp@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230309004836.2808610-1-jesussanp@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 10:19:25 -08:00
Linus Torvalds
49be4fb281 perf tools fixes for v6.3:
- Add Adrian Hunter to MAINTAINERS as a perf tools reviewer.
 
 - Sync various tools/ copies of kernel headers with the kernel sources, this
   time trying to avoid first merging with upstream to then update but instead
   copy from upstream so that a merge is avoided and the end result after merging
   this pull request is the one expected, tools/perf/check-headers.sh (mostly)
   happy, less warnings while building tools/perf/.
 
 - Fix counting when initial delay configured by setting
   perf_attr.enable_on_exec when starting workloads from the perf command line.
 
 - Don't avoid emitting a PERF_RECORD_MMAP2 in 'perf inject --buildid-all' when
   that record comes with a build-id, otherwise we end up not being able to
   resolve symbols.
 
 - Don't use comma as the CSV output separator the "stat+csv_output" test, as
   comma can appear on some tests as a modifier for an event, use @ instead,
   ditto for the JSON linter test.
 
 - The offcpu test was looking for some bits being set on
   task_struct->prev_state without masking other bits not important for this
   specific 'perf test', fix it.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZApKjQAKCRCyPKLppCJ+
 JzdfAQDRnwDCxhb4cvx7lVR32L1XMIFW6qLWRBJWoxC2SJi6lgD/SoQgKswkxrJv
 XnBP7jEaIsh3M3ak82MxLKbjSAEvnwk=
 =jup7
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.3-1-2023-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Add Adrian Hunter to MAINTAINERS as a perf tools reviewer

 - Sync various tools/ copies of kernel headers with the kernel sources,
   this time trying to avoid first merging with upstream to then update
   but instead copy from upstream so that a merge is avoided and the end
   result after merging this pull request is the one expected,
   tools/perf/check-headers.sh (mostly) happy, less warnings while
   building tools/perf/

 - Fix counting when initial delay configured by setting
   perf_attr.enable_on_exec when starting workloads from the perf
   command line

 - Don't avoid emitting a PERF_RECORD_MMAP2 in 'perf inject
   --buildid-all' when that record comes with a build-id, otherwise we
   end up not being able to resolve symbols

 - Don't use comma as the CSV output separator the "stat+csv_output"
   test, as comma can appear on some tests as a modifier for an event,
   use @ instead, ditto for the JSON linter test

 - The offcpu test was looking for some bits being set on
   task_struct->prev_state without masking other bits not important for
   this specific 'perf test', fix it

* tag 'perf-tools-fixes-for-v6.3-1-2023-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Add Adrian Hunter to MAINTAINERS as a reviewer
  tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  tools headers x86 cpufeatures: Sync with the kernel sources
  tools include UAPI: Sync linux/vhost.h with the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources
  tools include UAPI: Synchronize linux/fcntl.h with the kernel sources
  tools headers: Synchronize {linux,vdso}/bits.h with the kernel sources
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
  perf stat: Fix counting when initial delay configured
  tools headers svm: Sync svm headers with the kernel sources
  perf test: Avoid counting commas in json linter
  perf tests stat+csv_output: Switch CSV separator to @
  perf inject: Fix --buildid-all not to eat up MMAP2
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  perf test: Fix offcpu test prev_state check
2023-03-10 08:18:46 -08:00
Jakub Kicinski
d0928c1c5b Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:

====================
Netfilter updates for net-next

1. nf_tables 'brouting' support, from Sriram Yagnaraman.

2. Update bridge netfilter and ovs conntrack helpers to handle
   IPv6 Jumbo packets properly, i.e. fetch the packet length
   from hop-by-hop extension header, from Xin Long.

   This comes with a test BIG TCP test case, added to
   tools/testing/selftests/net/.

3. Fix spelling and indentation in conntrack, from Jeremy Sowden.

* 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nat: fix indentation of function arguments
  netfilter: conntrack: fix typo
  selftests: add a selftest for big tcp
  netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
  netfilter: move br_nf_check_hbh_len to utils
  netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
  netfilter: bridge: check len before accessing more nh data
  netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
  netfilter: bridge: introduce broute meta statement
====================

Link: https://lore.kernel.org/r/20230308193033.13965-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 23:28:53 -08:00
Jakub Kicinski
d0ddf5065f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Documentation/bpf/bpf_devel_QA.rst
  b7abcd9c65 ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
  d56b0c461d ("bpf, docs: Fix link to netdev-FAQ target")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 22:22:11 -08:00
Linus Torvalds
44889ba56c Networking fixes for 6.3-rc2, including fixes from netfilter, bpf
Current release - regressions:
 
   - core: avoid skb end_offset change in __skb_unclone_keeptruesize()
 
   - sched:
     - act_connmark: handle errno on tcf_idr_check_alloc
     - flower: fix fl_change() error recovery path
 
   - ieee802154: prevent user from crashing the host
 
 Current release - new code bugs:
 
   - eth: bnxt_en: fix the double free during device removal
 
   - tools: ynl:
     - fix enum-as-flags in the generic CLI
     - fully inherit attrs in subsets
     - re-license uniformly under GPL-2.0 or BSD-3-clause
 
 Previous releases - regressions:
 
   - core: use indirect calls helpers for sk_exit_memory_pressure()
 
   - tls:
     - fix return value for async crypto
     - avoid hanging tasks on the tx_lock
 
   - eth: ice: copy last block omitted in ice_get_module_eeprom()
 
 Previous releases - always broken:
 
   - core: avoid double iput when sock_alloc_file fails
 
   - af_unix: fix struct pid leaks in OOB support
 
   - tls:
     - fix possible race condition
     - fix device-offloaded sendpage straddling records
 
   - bpf:
     - sockmap: fix an infinite loop error
     - test_run: fix &xdp_frame misplacement for LIVE_FRAMES
     - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
 
   - netfilter: tproxy: fix deadlock due to missing BH disable
 
   - phylib: get rid of unnecessary locking
 
   - eth: bgmac: fix *initial* chip reset to support BCM5358
 
   - eth: nfp: fix csum for ipsec offload
 
   - eth: mtk_eth_soc: fix RX data corruption issue
 
 Misc:
 
   - usb: qmi_wwan: add telit 0x1080 composition
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmQJzQISHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOky5YP/04Dbsbeqpk0Q94axmjoaS0J/4rW49js
 RaA7v8ci7sL1omW8k5tILPXniAouN4YHNOCW1KbLBMR5O7lyn9qM1RteHOIpOmte
 TLzAw+6Wl7CyGiiqirv2GU96Wd/jZoZpPXFZz/gXP59GnkChSHzQcpexmz0nrmxI
 eCRSs+qm+re3wmDKTYm5C+g+420PNXu9JItPnTNf+nTkTBxpmOEMyry03I0taXKS
 wceQHB2q5E0sSWXDfkxG/pmUuYTj3AdRSQ+vo+FLevSs/LWeThs2I6pT5sn8XS76
 1S8Lh6FytfBhyalFmRtrpqIJYyGae5MwEXQ29ddfmF4OFFLedx3IH0+JFQxTE9So
 i4gaXmM5SUI7c5vhib097xUISoLxKqqXQVQQSQ1MPZRfXtVubbA2gCv+vh6fXVoj
 zQYatZOLM7KT9q4Pw8A+9bJPof/FV+ObC67pbGQbJJgBoy+oOixDuP+x5DYT384L
 /5XS+23OZiFe7bvQoE/0SQMeRk3lF2XkS5l9gSbdSnGQPiaOqKhDgkoCmdkn1jvg
 qtkBS6+tRRoOBNsjC4r4eFXBVOQ1+myyjZetBnEOaSp22FaTJFQh9qX3AMFIHbUy
 m0jDi9OJZSWHICd6KNWPm3JK43cMjiyZbGftYqOHhuY5HN30vQN6sl7DXIJ0rIcE
 myHMfizwqmGT
 =hSXM
 -----END PGP SIGNATURE-----

Merge tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter and bpf.

  Current release - regressions:

   - core: avoid skb end_offset change in __skb_unclone_keeptruesize()

   - sched:
      - act_connmark: handle errno on tcf_idr_check_alloc
      - flower: fix fl_change() error recovery path

   - ieee802154: prevent user from crashing the host

  Current release - new code bugs:

   - eth: bnxt_en: fix the double free during device removal

   - tools: ynl:
      - fix enum-as-flags in the generic CLI
      - fully inherit attrs in subsets
      - re-license uniformly under GPL-2.0 or BSD-3-clause

  Previous releases - regressions:

   - core: use indirect calls helpers for sk_exit_memory_pressure()

   - tls:
      - fix return value for async crypto
      - avoid hanging tasks on the tx_lock

   - eth: ice: copy last block omitted in ice_get_module_eeprom()

  Previous releases - always broken:

   - core: avoid double iput when sock_alloc_file fails

   - af_unix: fix struct pid leaks in OOB support

   - tls:
      - fix possible race condition
      - fix device-offloaded sendpage straddling records

   - bpf:
      - sockmap: fix an infinite loop error
      - test_run: fix &xdp_frame misplacement for LIVE_FRAMES
      - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR

   - netfilter: tproxy: fix deadlock due to missing BH disable

   - phylib: get rid of unnecessary locking

   - eth: bgmac: fix *initial* chip reset to support BCM5358

   - eth: nfp: fix csum for ipsec offload

   - eth: mtk_eth_soc: fix RX data corruption issue

  Misc:

   - usb: qmi_wwan: add telit 0x1080 composition"

* tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  tools: ynl: fix enum-as-flags in the generic CLI
  tools: ynl: move the enum classes to shared code
  net: avoid double iput when sock_alloc_file fails
  af_unix: fix struct pid leaks in OOB support
  eth: fealnx: bring back this old driver
  net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
  net: microchip: sparx5: fix deletion of existing DSCP mappings
  octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
  net/smc: fix fallback failed while sendmsg with fastopen
  ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
  mailmap: update entries for Stephen Hemminger
  mailmap: add entry for Maxim Mikityanskiy
  nfc: change order inside nfc_se_io error path
  ethernet: ice: avoid gcc-9 integer overflow warning
  ice: don't ignore return codes in VSI related code
  ice: Fix DSCP PFC TLV creation
  net: usb: qmi_wwan: add Telit 0x1080 composition
  net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
  netfilter: conntrack: adopt safer max chain length
  net: tls: fix device-offloaded sendpage straddling records
  ...
2023-03-09 10:56:58 -08:00
Linus Torvalds
2653e3fe33 for-linus-2023030901
-----BEGIN PGP SIGNATURE-----
 
 iQJSBAABCAA8FiEEoEVH9lhNrxiMPSyI7MXwXhnZSjYFAmQJ8eweHGJlbmphbWlu
 LnRpc3NvaXJlc0ByZWRoYXQuY29tAAoJEOzF8F4Z2Uo2ff4P/j4zp6J7wfstWL5g
 Ma3u3RqRpM0HKw0tO5PeigLYGout40oW8xAH7n8ERu2o45yQAd5ZbgXVey25FTSd
 QEVd7zwN/ADMMTTujGQAfzpE6O7eaALVDgtgjOcNS8uRLeyqcDSCgBRaB8sNwLy7
 ZhHU5OWKvRCiiwTQyG7gvY9+cTre6wNjdKR2Ei+xra5IS78gp3OZ1NJOjT3iHDxe
 Zxo4kpkEaBJmVNYbC41sZfkzuZ84SfKXUC14V3BBiXmvnYU6x3WmuXxFvCgBEjgz
 agmKugHrdABa+oooONdKztHSOfa5saeYy11FO+q8txIEZqSiodr1anmk2U77Yu9O
 f4E8sQQszHMWyaqac5+dwUCaupgmKtPZOoMRjbGfGjwLObYkhxnJu6AamWYZl7DG
 E6AaO+ZV0SQcl89GpJ4+SiXSbSxUopYljzkUnvrnrOqPe4AkdWVTuCexBgGkOKqa
 DDQb+OYcI5N2aFMTOx8dkmZ6MPU7Mtot7UPTC1rv5Cgi8xFCH215dLauskDyatmt
 XQw5+9hzb1q3ZFFw6E/IhBkRcNLeAga1lsSxeqIGKkCHeEAOuCaO9ev454LD4oKk
 7nTqyKKAx+Roaw3dwCyU+U2v12B+PoYBq6BGnJtm/EnF0VuJRi8kV8J9uMXoo30j
 v4Fo/IsHUlgIiuJo556tmjcTetUm
 =XVs0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2023030901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Benjamin Tissoires:

 - fix potential out of bound write of zeroes in HID core with a
   specially crafted uhid device (Lee Jones)

 - fix potential use-after-free in work function in intel-ish-hid (Reka
   Norman)

 - selftests config fixes (Benjamin Tissoires)

 - few device small fixes and support

* tag 'for-linus-2023030901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: intel-ish-hid: ipc: Fix potential use-after-free in work function
  HID: logitech-hidpp: Add support for Logitech MX Master 3S mouse
  HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded
  selftest: hid: fix hid_bpf not set in config
  HID: uhid: Over-ride the default maximum data buffer value with our own
  HID: core: Provide new max_buffer_size attribute to over-ride the default
2023-03-09 10:17:23 -08:00
Jakub Kicinski
c311aaa74c tools: ynl: fix enum-as-flags in the generic CLI
Lorenzo points out that the generic CLI is broken for the netdev
family. When I added the support for documentation of enums
(and sparse enums) the client script was not updated.
It expects the values in enum to be a list of names,
now it can also be a dict (YAML object).

Reported-by: Lorenzo Bianconi <lorenzo@kernel.org>
Fixes: e4b48ed460 ("tools: ynl: add a completely generic client")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-08 23:28:21 -08:00
Jakub Kicinski
6517a60b03 tools: ynl: move the enum classes to shared code
Move bulk of the EnumSet and EnumEntry code to shared
code for reuse by cli.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-08 23:28:21 -08:00
Roberto Sassu
12fabae03c selftests/bpf: Fix IMA test
Commit 62622dab0a ("ima: return IMA digest value only when IMA_COLLECTED
flag is set") caused bpf_ima_inode_hash() to refuse to give non-fresh
digests. IMA test #3 assumed the old behavior, that bpf_ima_inode_hash()
still returned also non-fresh digests.

Correct the test by accepting both cases. If the samples returned are 1,
assume that the commit above is applied and that the returned digest is
fresh. If the samples returned are 2, assume that the commit above is not
applied, and check both the non-fresh and fresh digest.

Fixes: 62622dab0a ("ima: return IMA digest value only when IMA_COLLECTED flag is set")
Reported-by: David Vernet <void@manifault.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/bpf/20230308103713.1681200-1-roberto.sassu@huaweicloud.com
2023-03-08 11:15:39 -08:00
Xin Long
6bb382bcf7 selftests: add a selftest for big tcp
This test runs on the client-router-server topo, and monitors the traffic
on the RX devices of router and server while sending BIG TCP packets with
netperf from client to server. Meanwhile, it changes 'tso' on the TX devs
and 'gro' on the RX devs. Then it checks if any BIG TCP packets appears
on the RX devs with 'ip/ip6tables -m length ! --length 0:65535' for each
case.

Note that we also add tc action ct in link1 ingress to cover the ipv6
jumbo packets process in nf_ct_skb_network_trim() of nf_conntrack_ovs.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08 14:25:42 +01:00
Tobias Klauser
515bddf0ec
selftests/clone3: test clone3 with CLONE_NEWTIME
Verify that clone3 can be called successfully with CLONE_NEWTIME in
flags.

Cc: Andrey Vagin <avagin@openvz.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-08 12:31:35 +01:00
Puranjay Mohan
720d93b60a libbpf: USDT arm arg parsing support
Parsing of USDT arguments is architecture-specific; on arm it is
relatively easy since registers used are r[0-10], fp, ip, sp, lr,
pc. Format is slightly different compared to aarch64; forms are

- "size @ [ reg, #offset ]" for dereferences, for example
  "-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]"
- "size @ reg" for register values; for example
  "-4@r0"
- "size @ #value" for raw values; for example
  "-8@#1"

Add support for parsing USDT arguments for ARM architecture.

To test the above changes QEMU's virt[1] board with cortex-a15
CPU was used. libbpf-bootstrap's usdt example[2] was modified to attach
to a test program with DTRACE_PROBE1/2/3/4... probes to test different
combinations.

[1] https://www.qemu.org/docs/master/system/arm/virt.html
[2] https://github.com/libbpf/libbpf-bootstrap/blob/master/examples/c/usdt.bpf.c

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-3-puranjay12@gmail.com
2023-03-07 15:35:53 -08:00
Puranjay Mohan
98e678e9bc libbpf: Refactor parse_usdt_arg() to re-use code
The parse_usdt_arg() function is defined differently for each
architecture but the last part of the function is repeated
verbatim for each architecture.

Refactor parse_usdt_arg() to fill the arg_sz and then do the repeated
post-processing in parse_usdt_spec().

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307120440.25941-2-puranjay12@gmail.com
2023-03-07 15:35:05 -08:00
Daniel Müller
3ecde2182a libbpf: Fix theoretical u32 underflow in find_cd() function
Coverity reported a potential underflow of the offset variable used in
the find_cd() function. Switch to using a signed 64 bit integer for the
representation of offset to make sure we can never underflow.

Fixes: 1eebcb6063 ("libbpf: Implement basic zip archive parsing support")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230307215504.837321-1-deso@posteo.net
2023-03-07 15:30:47 -08:00
Jakub Kicinski
37d9df224d ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
I was intending to make all the Netlink Spec code BSD-3-Clause
to ease the adoption but it appears that:
 - I fumbled the uAPI and used "GPL WITH uAPI note" there
 - it gives people pause as they expect GPL in the kernel
As suggested by Chuck re-license under dual. This gives us benefit
of full BSD freedom while fulfilling the broad "kernel is under GPL"
expectations.

Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/
Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-07 13:44:30 -08:00
Jakub Kicinski
36e5e391a2 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZsBwAKCRDbK58LschI
 g3W1AQCQnO6pqqX5Q2aYDAZPlZRtV2TRLjuqrQE0dHW/XLAbBgD/bgsAmiKhPSCG
 2mTt6izpTQVlZB0e8KcDIvbYd9CE3Qc=
 =EjJQ
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-03-06

We've added 85 non-merge commits during the last 13 day(s) which contain
a total of 131 files changed, 7102 insertions(+), 1792 deletions(-).

The main changes are:

1) Add skb and XDP typed dynptrs which allow BPF programs for more
   ergonomic and less brittle iteration through data and variable-sized
   accesses, from Joanne Koong.

2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF
   open-coded iterators allowing for less restrictive looping capabilities,
   from Andrii Nakryiko.

3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
   programs to NULL-check before passing such pointers into kfunc,
   from Alexei Starovoitov.

4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
   local storage maps, from Kumar Kartikeya Dwivedi.

5) Add BPF verifier support for ST instructions in convert_ctx_access()
   which will help new -mcpu=v4 clang flag to start emitting them,
   from Eduard Zingerman.

6) Make uprobe attachment Android APK aware by supporting attachment
   to functions inside ELF objects contained in APKs via function names,
   from Daniel Müller.

7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper
   to start the timer with absolute expiration value instead of relative
   one, from Tero Kristo.

8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id,
   from Tejun Heo.

9) Extend libbpf to support users manually attaching kprobes/uprobes
   in the legacy/perf/link mode, from Menglong Dong.

10) Implement workarounds in the mips BPF JIT for DADDI/R4000,
   from Jiaxun Yang.

11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT,
    from Hengqi Chen.

12) Extend BPF instruction set doc with describing the encoding of BPF
    instructions in terms of how bytes are stored under big/little endian,
    from Jose E. Marchesi.

13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui.

14) Fix bpf_xdp_query() backwards compatibility on old kernels,
    from Yonghong Song.

15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS,
    from Florent Revest.

16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache,
    from Hou Tao.

17) Fix BPF verifier's check_subprogs to not unnecessarily mark
    a subprogram with has_tail_call, from Ilya Leoshkevich.

18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
  selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
  selftests/bpf: Split test_attach_probe into multi subtests
  libbpf: Add support to set kprobe/uprobe attach mode
  tools/resolve_btfids: Add /libsubcmd to .gitignore
  bpf: add support for fixed-size memory pointer returns for kfuncs
  bpf: generalize dynptr_get_spi to be usable for iters
  bpf: mark PTR_TO_MEM as non-null register type
  bpf: move kfunc_call_arg_meta higher in the file
  bpf: ensure that r0 is marked scratched after any function call
  bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
  bpf: clean up visit_insn()'s instruction processing
  selftests/bpf: adjust log_fixup's buffer size for proper truncation
  bpf: honor env->test_state_freq flag in is_state_visited()
  selftests/bpf: enhance align selftest's expected log matching
  bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
  bpf: improve stack slot state printing
  selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
  selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
  bpf: allow ctx writes using BPF_ST_MEM instruction
  bpf: Use separate RCU callbacks for freeing selem
  ...
====================

Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06 20:36:39 -08:00
Jakub Kicinski
757b56a6c7 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZZ1wAKCRDbK58LschI
 g4fcAQDYVsICeBDmhdBdZs7Kb91/s6SrU6B0jy4zs0gOIBBOhgD7B3jt3dMTD2tp
 rPLHlv6uUoYS7mbZsrZi/XjVw8UmewM=
 =VUnr
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-03-06

We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 9 files changed, 64 insertions(+), 18 deletions(-).

The main changes are:

1) Fix BTF resolver for DATASEC sections when a VAR points at a modifier,
   that is, keep resolving such instances instead of bailing out,
   from Lorenz Bauer.

2) Fix BPF test framework with regards to xdp_frame info misplacement
   in the "live packet" code, from Alexander Lobakin.

3) Fix an infinite loop in BPF sockmap code for TCP/UDP/AF_UNIX,
   from Liu Jian.

4) Fix a build error for riscv BPF JIT under PERF_EVENTS=n,
   from Randy Dunlap.

5) Several BPF doc fixes with either broken links or external instead
   of internal doc links, from Bagas Sanjaya.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: check that modifier resolves after pointer
  btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
  bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
  bpf, doc: Link to submitting-patches.rst for general patch submission info
  bpf, doc: Do not link to docs.kernel.org for kselftest link
  bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser()
  riscv, bpf: Fix patch_text implicit declaration
  bpf, docs: Fix link to BTF doc
====================

Link: https://lore.kernel.org/r/20230306215944.11981-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06 20:28:00 -08:00
Guillaume Tucker
2da789cda4 selftests: amd-pstate: fix TEST_FILES
Bring back the Python scripts that were initially added with
TEST_GEN_FILES but now with TEST_FILES to avoid having them deleted
when doing a clean.  Also fix the way the architecture is being
determined as they should also be installed when ARCH=x86_64 is
provided explicitly.  Then also append extra files to TEST_FILES and
TEST_PROGS with += so they don't get discarded.

Fixes: ba2d788aa8 ("selftests: amd-pstate: Trigger tbench benchmark and test cpus")
Fixes: a49fb7218e ("selftests: amd-pstate: Don't delete source files via Makefile")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-03-06 17:03:20 -07:00
Arnaldo Carvalho de Melo
06a1574b94 tools headers UAPI: Sync linux/perf_event.h with the kernel sources
To pick up the changes in:

  09519ec3b1 ("perf: Add perf_event_attr::config3")

The patches for the tooling side will come later.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/lkml/ZAZLYmDjWjSItWOq@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-06 17:21:23 -03:00
Lorenz Bauer
dfdd608c3b selftests/bpf: check that modifier resolves after pointer
Add a regression test that ensures that a VAR pointing at a
modifier which follows a PTR (or STRUCT or ARRAY) is resolved
correctly by the datasec validator.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230306112138.155352-3-lmb@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-06 11:44:14 -08:00
Alexander Lobakin
294635a816 bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
&xdp_buff and &xdp_frame are bound in a way that

xdp_buff->data_hard_start == xdp_frame

It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:

	for (u32 i = 0; i < 0xdead; i++) {
		xdpf = xdp_convert_buff_to_frame(&xdp);
		xdp_convert_frame_to_buff(xdpf, &xdp);
	}

shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.

Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.

Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.

(was found while testing XDP traffic generator on ice, which calls
 xdp_convert_frame_to_buff() for each XDP frame)

Fixes: b530e9e106 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-06 11:15:54 -08:00
Arnaldo Carvalho de Melo
7d0930647c tools headers x86 cpufeatures: Sync with the kernel sources
To pick the changes from:

  8415a74852 ("x86/cpu, kvm: Add support for CPUID_80000021_EAX")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
  diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/required-features.h' differs from latest version at 'arch/x86/include/asm/required-features.h'
  diff -u tools/arch/x86/include/asm/required-features.h arch/x86/include/asm/required-features.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZAYlS2XTJ5hRtss7@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-06 14:42:12 -03:00
Menglong Dong
c7aec81b31 selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
Add the testing for kprobe/uprobe attaching in default, legacy, perf and
link mode. And the testing passed:

./test_progs -t attach_probe
$5/1     attach_probe/manual-default:OK
$5/2     attach_probe/manual-legacy:OK
$5/3     attach_probe/manual-perf:OK
$5/4     attach_probe/manual-link:OK
$5/5     attach_probe/auto:OK
$5/6     attach_probe/kprobe-sleepable:OK
$5/7     attach_probe/uprobe-lib:OK
$5/8     attach_probe/uprobe-sleepable:OK
$5/9     attach_probe/uprobe-ref_ctr:OK
$5       attach_probe:OK
Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: https://lore.kernel.org/bpf/20230306064833.7932-4-imagedong@tencent.com
2023-03-06 09:38:08 -08:00
Menglong Dong
7391ec6391 selftests/bpf: Split test_attach_probe into multi subtests
In order to adapt to the older kernel, now we split the "attach_probe"
testing into multi subtests:

  manual // manual attach tests for kprobe/uprobe
  auto // auto-attach tests for kprobe and uprobe
  kprobe-sleepable // kprobe sleepable test
  uprobe-lib // uprobe tests for library function by name
  uprobe-sleepable // uprobe sleepable test
  uprobe-ref_ctr // uprobe ref_ctr test

As sleepable kprobe needs to set BPF_F_SLEEPABLE flag before loading,
we need to move it to a stand alone skel file, in case of it is not
supported by kernel and make the whole loading fail.

Therefore, we can only enable part of the subtests for older kernel.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: https://lore.kernel.org/bpf/20230306064833.7932-3-imagedong@tencent.com
2023-03-06 09:38:08 -08:00
Menglong Dong
f8b299bc6a libbpf: Add support to set kprobe/uprobe attach mode
By default, libbpf will attach the kprobe/uprobe BPF program in the
latest mode that supported by kernel. In this patch, we add the support
to let users manually attach kprobe/uprobe in legacy or perf mode.

There are 3 mode that supported by the kernel to attach kprobe/uprobe:

  LEGACY: create perf event in legacy way and don't use bpf_link
  PERF: create perf event with perf_event_open() and don't use bpf_link

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: create perf event with perf_event_open() and use bpf_link
Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com/
Link: https://lore.kernel.org/bpf/20230306064833.7932-2-imagedong@tencent.com

Users now can manually choose the mode with
bpf_program__attach_uprobe_opts()/bpf_program__attach_kprobe_opts().
2023-03-06 09:38:04 -08:00
Rong Tao
fd4cb29f2a tools/resolve_btfids: Add /libsubcmd to .gitignore
Add libsubcmd to .gitignore, otherwise after compiling the kernel it
would result in the following:

    # bpf-next...bpf-next/master
    ?? tools/bpf/resolve_btfids/libsubcmd/

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_F13D670D5D7AA9C4BD868D3220921AAC090A@qq.com
2023-03-06 16:23:01 +01:00
Arnaldo Carvalho de Melo
14e998ed42 tools include UAPI: Sync linux/vhost.h with the kernel sources
To get the changes in:

  3b688d7a08 ("vhost-vdpa: uAPI to resume the device")

To pick up these changes and support them:

  $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
  $ cp ../linux/include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
  $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
  $ diff -u before after
  --- before	2023-03-06 09:26:14.889251817 -0300
  +++ after	2023-03-06 09:26:20.594406270 -0300
  @@ -30,6 +30,7 @@
   	[0x77] = "VDPA_SET_CONFIG_CALL",
   	[0x7C] = "VDPA_SET_GROUP_ASID",
   	[0x7D] = "VDPA_SUSPEND",
  +	[0x7E] = "VDPA_RESUME",
   };
   static const char *vhost_virtio_ioctl_read_cmds[] = {
   	[0x00] = "GET_FEATURES",
  $

For instance, see how those 'cmd' ioctl arguments get translated, now
VDPA_RESUME will be as well:

  # perf trace -a -e ioctl --max-events=10
       0.000 ( 0.011 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1)                        = 0
      21.353 ( 0.014 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1)                        = 0
      25.766 ( 0.014 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740)            = 0
      25.845 ( 0.034 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70)            = 0
      25.916 ( 0.011 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0)               = 0
      25.941 ( 0.025 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ATOMIC, arg: 0x7ffe4a22c840)               = 0
      32.915 ( 0.009 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_RMFB, arg: 0x7ffe4a22cf9c)                 = 0
      42.522 ( 0.013 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740)            = 0
      42.579 ( 0.031 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70)            = 0
      42.644 ( 0.010 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0)               = 0
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
Link: https://lore.kernel.org/lkml/ZAXdCTecxSNwAoeK@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-06 09:31:26 -03:00
Andrii Nakryiko
fffc893b6b selftests/bpf: adjust log_fixup's buffer size for proper truncation
Adjust log_fixup's expected buffer length to fix the test. It's pretty
finicky in its length expectation, but it doesn't break often. So just
adjust the length to work on current kernel and with follow up iterator
changes as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230302235015.2044271-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04 11:14:32 -08:00
Andrii Nakryiko
6f876e75d3 selftests/bpf: enhance align selftest's expected log matching
Allow to search for expected register state in all the verifier log
output that's related to specified instruction number.

See added comment for an example of possible situation that is happening
due to a simple enhancement done in the next patch, which fixes handling
of env->test_state_freq flag in state checkpointing logic.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230302235015.2044271-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04 11:14:31 -08:00
Eduard Zingerman
71cf4d027a selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
Function verifier.c:convert_ctx_access() applies some rewrites to BPF
instructions that read or write BPF program context. This commit adds
machinery to allow test cases that inspect BPF program after these
rewrites are applied.

An example of a test case:

  {
        // Shorthand for field offset and size specification
	N(CGROUP_SOCKOPT, struct bpf_sockopt, retval),

        // Pattern generated for field read
	.read  = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::current_task);"
		 "$dst = *(u64 *)($dst + task_struct::bpf_ctx);"
		 "$dst = *(u32 *)($dst + bpf_cg_run_ctx::retval);",

        // Pattern generated for field write
	.write = "*(u64 *)($ctx + bpf_sockopt_kern::tmp_reg) = r9;"
		 "r9 = *(u64 *)($ctx + bpf_sockopt_kern::current_task);"
		 "r9 = *(u64 *)(r9 + task_struct::bpf_ctx);"
		 "*(u32 *)(r9 + bpf_cg_run_ctx::retval) = $src;"
		 "r9 = *(u64 *)($ctx + bpf_sockopt_kern::tmp_reg);" ,
  },

For each test case, up to three programs are created:
- One that uses BPF_LDX_MEM to read the context field.
- One that uses BPF_STX_MEM to write to the context field.
- One that uses BPF_ST_MEM to write to the context field.

The disassembly of each program is compared with the pattern specified
in the test case.

Kernel code for disassembly is reused (as is in the bpftool).
To keep Makefile changes to the minimum, symbolic links to
`kernel/bpf/disasm.c` and `kernel/bpf/disasm.h ` are added.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03 21:41:46 -08:00
Eduard Zingerman
806f81cd1e selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
Check that verifier tracks pointer types for BPF_ST_MEM instructions
and reports error if pointer types do not match for different
execution branches.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03 21:41:46 -08:00
Eduard Zingerman
0d80a619c1 bpf: allow ctx writes using BPF_ST_MEM instruction
Lift verifier restriction to use BPF_ST_MEM instructions to write to
context data structures. This requires the following changes:
 - verifier.c:do_check() for BPF_ST updated to:
   - no longer forbid writes to registers of type PTR_TO_CTX;
   - track dst_reg type in the env->insn_aux_data[...].ptr_type field
     (same way it is done for BPF_STX and BPF_LDX instructions).
 - verifier.c:convert_ctx_access() and various callbacks invoked by
   it are updated to handled BPF_ST instruction alongside BPF_STX.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03 21:41:46 -08:00
Arnaldo Carvalho de Melo
3ee7cb4fdf tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in:

  e7862eda30 ("x86/cpu: Support AMD Automatic IBRS")
  0125acda7d ("x86/bugs: Reset speculation control settings on init")
  38aaf921e9 ("perf/x86: Add Meteor Lake support")
  5b6fac3fa4 ("x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation")
  dc2a3e8579 ("x86/resctrl: Add interface to read mbm_total_bytes_config")

Addressing these tools/perf build warnings:

    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
    Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'

That makes the beautification scripts to pick some new entries:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before    2023-03-03 18:26:51.766923522 -0300
  +++ after     2023-03-03 18:27:09.987415481 -0300
  @@ -267,9 +267,11 @@
        [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT",
        [0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
        [0xc0000200 - x86_64_specific_MSRs_offset] = "IA32_MBA_BW_BASE",
  +     [0xc0000280 - x86_64_specific_MSRs_offset] = "IA32_SMBA_BW_BASE",
        [0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
        [0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
        [0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR",
  +     [0xc0000400 - x86_64_specific_MSRs_offset] = "IA32_EVT_CFG_BASE",
   };

   #define x86_AMD_V_KVM_MSRs_offset 0xc0010000
  $

Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:

  # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  ^C#

If we use -v (verbose mode) we can see what it does behind the scenes:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  Using CPUID AuthenticAMD-25-21-0
  0x6a0
  0x6a8
  New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  0x6a0
  0x6a8
  New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  mmap size 528384B
  ^C#

Example with a frequent msr:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
  Using CPUID AuthenticAMD-25-21-0
  0x48
  New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  0x48
  New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  mmap size 528384B
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
     0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule ([kernel.kallsyms])
                                       futex_wait_queue_me ([kernel.kallsyms])
                                       futex_wait ([kernel.kallsyms])
                                       do_futex ([kernel.kallsyms])
                                       __x64_sys_futex ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                       __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
     0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule_idle ([kernel.kallsyms])
                                       do_idle ([kernel.kallsyms])
                                       cpu_startup_entry ([kernel.kallsyms])
                                       secondary_startup_64_no_verify ([kernel.kallsyms])
    #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Breno Leitao <leitao@debian.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/lkml/ZAJoaZ41+rU5H0vL@kernel.org
[ I had published the perf-tools branch before with the sync with ]
[ 8c29f01654 ("x86/sev: Add SEV-SNP guest feature negotiation support") ]
[ I removed it from this new sync ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03 23:24:11 -03:00
Arnaldo Carvalho de Melo
33c53f9b5a tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources
To pick up the changes in:

  89b0e7de34 ("KVM: arm64: nv: Introduce nested virtualization VCPU feature")
  14329b825f ("KVM: x86/pmu: Introduce masked events to the pmu event filter")
  6213b701a9 ("KVM: x86: Replace 0-length arrays with flexible arrays")
  3fd49805d1 ("KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg")
  14329b825f ("KVM: x86/pmu: Introduce masked events to the pmu event filter")

That don't change functionality in tools/perf, as no new ioctl is added
for the 'perf trace' scripts to harvest.

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
  diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
  Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
  diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Kook <keescook@chromium.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/lkml/ZAJlg7%2FfWDVGX0F3@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03 23:24:10 -03:00
Arnaldo Carvalho de Melo
5f800380af tools include UAPI: Synchronize linux/fcntl.h with the kernel sources
To pick up the changes in:

  6fd7353829 ("mm/memfd: add F_SEAL_EXEC")

That doesn't add or change any perf tools functionality, only addresses
these build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/fcntl.h' differs from latest version at 'include/uapi/linux/fcntl.h'
  diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03 22:34:20 -03:00
Arnaldo Carvalho de Melo
811f35ff59 tools headers: Synchronize {linux,vdso}/bits.h with the kernel sources
To pick up the changes in this cset:

  cbdb1f163a ("vdso/bits.h: Add BIT_ULL() for the sake of consistency")

That just causes perf to rebuild, the macro included doesn't clash with
anything in tools/{perf,objtool,bpf}.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/include/linux/bits.h' differs from latest version at 'include/linux/bits.h'
  diff -u tools/include/linux/bits.h include/linux/bits.h
  Warning: Kernel ABI header at 'tools/include/vdso/bits.h' differs from latest version at 'include/vdso/bits.h'
  diff -u tools/include/vdso/bits.h include/vdso/bits.h

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03 22:34:15 -03:00
Arnaldo Carvalho de Melo
df4b933e0e tools headers UAPI: Sync linux/prctl.h with the kernel sources
To pick new prctl options introduced in:

  b507808ebc ("mm: implement memory-deny-write-execute as a prctl")

That results in:

  $ diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h
  --- tools/include/uapi/linux/prctl.h	2022-06-20 17:54:43.884515663 -0300
  +++ include/uapi/linux/prctl.h	2023-03-03 11:18:51.090923569 -0300
  @@ -281,6 +281,12 @@
   # define PR_SME_VL_LEN_MASK		0xffff
   # define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */

  +/* Memory deny write / execute */
  +#define PR_SET_MDWE			65
  +# define PR_MDWE_REFUSE_EXEC_GAIN	1
  +
  +#define PR_GET_MDWE			66
  +
   #define PR_SET_VMA		0x53564d41
   # define PR_SET_VMA_ANON_NAME		0

  $ tools/perf/trace/beauty/prctl_option.sh > before
  $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after
  $ diff -u before after
  --- before	2023-03-03 11:47:43.320013146 -0300
  +++ after	2023-03-03 11:47:50.937216229 -0300
  @@ -59,6 +59,8 @@
   	[62] = "SCHED_CORE",
   	[63] = "SME_SET_VL",
   	[64] = "SME_GET_VL",
  +	[65] = "SET_MDWE",
  +	[66] = "GET_MDWE",
   };
   static const char *prctl_set_mm_options[] = {
   	[1] = "START_CODE",
  $

Now users can do:

  # perf trace -e syscalls:sys_enter_prctl --filter "option==SET_MDWE||option==GET_MDWE"
^C#
  # trace -v -e syscalls:sys_enter_prctl --filter "option==SET_MDWE||option==GET_MDWE"
  New filter for syscalls:sys_enter_prctl: (option==65||option==66) && (common_pid != 5519 && common_pid != 3404)
^C#

And when these prctl options appears in a session, they will be
translated to the corresponding string.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZAI%2FAoPXb%2Fsxz1%2Fm@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03 22:34:08 -03:00
Arnaldo Carvalho de Melo
31d2e6b5d4 tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
We also continue with SYM_TYPED_FUNC_START() in util/include/linux/linkage.h
and with an exception in tools/perf/check_headers.sh's diff check to ignore
the include cfi_types.h line when checking if the kernel original files drifted
from the copies we carry.

This is to get the changes from:

  69d4c0d321 ("entry, kasan, x86: Disallow overriding mem*() functions")

That addresses these perf tools build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
  diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
  Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S'
  diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/ZAH%2FjsioJXGIOrkf@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-03 22:34:00 -03:00
Alexei Starovoitov
6fcd486b3a bpf: Refactor RCU enforcement in the verifier.
bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack
of such key mechanism makes it impossible for sleepable bpf programs to use RCU
pointers.

Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't
support btf_type_tag yet) and allowlist certain field dereferences in important
data structures like tast_struct, cgroup, socket that are used by sleepable
programs either as RCU pointer or full trusted pointer (which is valid outside
of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such
tagging. They will be removed once GCC supports btf_type_tag.

With that refactor check_ptr_to_btf_access(). Make it strict in enforcing
PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without
modifier flags. There is a chance that this strict enforcement might break
existing programs (especially on GCC compiled kernels), but this cleanup has to
start sooner than later. Note PTR_TO_CTX access still yields old deprecated
PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the
kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will
remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0.

Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Alexei Starovoitov
0047d8343f selftests/bpf: Tweak cgroup kfunc test.
Adjust cgroup kfunc test to dereference RCU protected cgroup pointer
as PTR_TRUSTED and pass into KF_TRUSTED_ARGS kfunc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-6-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Alexei Starovoitov
838bd4ac9a selftests/bpf: Add a test case for kptr_rcu.
Tweak existing map_kptr test to check kptr_rcu.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-5-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Alexei Starovoitov
20c09d92fa bpf: Introduce kptr_rcu.
The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
Derefrence of other kptr-s returns PTR_UNTRUSTED.

For example:
struct map_value {
   struct cgroup __kptr *cgrp;
};

SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
{
  struct cgroup *cg, *cg2;

  cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
  bpf_kptr_xchg(&v->cgrp, cg);

  cg2 = v->cgrp; // This is new feature introduced by this patch.
  // cg2 is PTR_MAYBE_NULL | MEM_RCU.
  // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero

  if (cg2)
    bpf_cgroup_ancestor(cg2, level); // safe to do.
}

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Alexei Starovoitov
03b77e17ae bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.
__kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps.
The concept felt useful, but didn't get much traction,
since bpf_rdonly_cast() was added soon after and bpf programs received
a simpler way to access PTR_UNTRUSTED kernel pointers
without going through restrictive __kptr usage.

Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate
its intended usage.
The main goal of __kptr_untrusted was to read/write such pointers
directly while bpf_kptr_xchg was a mechanism to access refcnted
kernel pointers. The next patch will allow RCU protected __kptr access
with direct read. At that point __kptr_untrusted will be deprecated.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Jakub Kicinski
ad4fafcde5 tools: ynl: use 1 as the default for first entry in attrs/ops
Pretty much all families use value: 1 or reserve as unspec
the first entry in attribute set and the first operation.
Make this the default. Update documentation (the doc for
values of operations just refers back to doc for attrs
so updating only attrs).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-03 08:22:39 +00:00
Jakub Kicinski
7cf93538e0 tools: ynl: fully inherit attrs in subsets
To avoid having to repeat the entire definition of an attribute
(including the value) use the Attr object from the original set.
In fact this is already the documented expectation.

Fixes: be5bea1cc0 ("net: add basic C code generators for Netlink")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-03 08:22:39 +00:00
Tero Kristo
944459e88b selftests/bpf: Add absolute timer test
Add test for the absolute BPF timer under the existing timer tests. This
will run the timer two times with 1us expiration time, and then re-arm
the timer at ~35s in the future. At the end, it is verified that the
absolute timer expired exactly two times.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20230302114614.2985072-3-tero.kristo@linux.intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-02 22:41:32 -08:00
Tero Kristo
f71f853049 bpf: Add support for absolute value BPF timers
Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start()
to start an absolute value timer instead of the default relative value.
This makes the timer expire at an exact point in time, instead of a time
with latencies induced by both the BPF and timer subsystems.

Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20230302114614.2985072-2-tero.kristo@linux.intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-02 22:41:32 -08:00
Dave Marchevsky
ec97a76f11 selftests/bpf: Add -Wuninitialized flag to bpf prog flags
Per C99 standard [0], Section 6.7.8, Paragraph 10:

  If an object that has automatic storage duration is not initialized
  explicitly, its value is indeterminate.

And in the same document, in appendix "J.2 Undefined behavior":

  The behavior is undefined in the following circumstances:
  [...]
  The value of an object with automatic storage duration is used while
  it is indeterminate (6.2.4, 6.7.8, 6.8).

This means that use of an uninitialized stack variable is undefined
behavior, and therefore that clang can choose to do a variety of scary
things, such as not generating bytecode for "bunch of useful code" in
the below example:

  void some_func()
  {
    int i;
    if (!i)
      return;
    // bunch of useful code
  }

To add insult to injury, if some_func above is a helper function for
some BPF program, clang can choose to not generate an "exit" insn,
causing verifier to fail with "last insn is not an exit or jmp". Going
from that verification failure to the root cause of uninitialized use
is certain to be frustrating.

This patch adds -Wuninitialized to the cflags for selftest BPF progs and
fixes up existing instances of uninitialized use.

  [0]: https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Cc: David Vernet <void@manifault.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230303005500.1614874-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-02 22:38:50 -08:00
Changbin Du
25f69c69bc perf stat: Fix counting when initial delay configured
When creating counters with initial delay configured, the enable_on_exec
field is not set. So we need to enable the counters later. The problem
is, when a workload is specified the target__none() is true. So we also
need to check stat_config.initial_delay.

In this change, we add a new field 'initial_delay' for struct target
which could be shared by other subcommands. And define
target__enable_on_exec() which returns whether enable_on_exec should be
set on normal cases.

Before this fix the event is not counted:

  $ ./perf stat -e instructions -D 100 sleep 2
  Events disabled
  Events enabled

   Performance counter stats for 'sleep 2':

       <not counted>      instructions

         1.901661124 seconds time elapsed

         0.001602000 seconds user
         0.000000000 seconds sys

After fix it works:

  $ ./perf stat -e instructions -D 100 sleep 2
  Events disabled
  Events enabled

   Performance counter stats for 'sleep 2':

             404,214      instructions

         1.901743475 seconds time elapsed

         0.001617000 seconds user
         0.000000000 seconds sys

Fixes: c587e77e10 ("perf stat: Do not delay the workload with --delay")
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hui Wang <hw.huiwang@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230302031146.2801588-2-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-02 17:39:03 -03:00
Arnaldo Carvalho de Melo
a98c0710b4 tools headers svm: Sync svm headers with the kernel sources
To pick the changes in:

  8c29f01654 ("x86/sev: Add SEV-SNP guest feature negotiation support")

That triggers:

  CC      /tmp/build/perf-tools/arch/x86/util/kvm-stat.o
  CC      /tmp/build/perf-tools/util/header.o
  LD      /tmp/build/perf-tools/arch/x86/util/perf-in.o
  LD      /tmp/build/perf-tools/arch/x86/perf-in.o
  LD      /tmp/build/perf-tools/arch/perf-in.o
  LD      /tmp/build/perf-tools/util/perf-in.o
  LD      /tmp/build/perf-tools/perf-in.o
  LINK    /tmp/build/perf-tools/perf

But this time causes no changes in tooling results, as the introduced
SVM_VMGEXIT_TERM_REQUEST exit reason wasn't added to SVM_EXIT_REASONS,
that is used in kvm-stat.c.

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
  diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-02 17:38:32 -03:00
Linus Torvalds
857f1268a5 Changes in this cycle were:
- Shrink 'struct instruction', to improve objtool performance & memory
    footprint.
 
  - Other maximum memory usage reductions - this makes the build both faster,
    and fixes kernel build OOM failures on allyesconfig and similar configs
    when they try to build the final (large) vmlinux.o.
 
  - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying
    single-byte instruction (PUSH/POP or LEAVE). This requires the
    extension of the ORC metadata structure with a 'signal' field.
 
  - Misc fixes & cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmQAVp8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gV6A//YbWb4nNxYbRFBd1O3FnFfy4efrDQ4btI
 hwkL6f7jka9RnIpIEatJvaLdNvyN5tuPCC/+B5eVnvFdd1JcBUmj5D+zYFt6H6qt
 BG4M6TNHFkP1kOJVfFGn8UPRfoMz2oMiEqilpsc1Yuf7b3ldMJtGUoHaeZC9pyqe
 RUisKNw4WHZp2G/gTBUWxW17xpWY3Awgch/w4HCu8wMnR+uEC44i0UCBfnAadl36
 ar66PfhMJcQIv0XkK9wu43g7+HFnjpxHOx35JW3lRot0xRnwl/JcsmaX5iPkh0gt
 HV8eLH80J0homeMZDY7vWIKJxGeLkIdfjO5gxwTdnFc9rQw3GwHp1B7WTS6J3Vwe
 gM00kyaGly3CvkKMiz5QQBfViWCjE25nYS8X0i9Oz6Gk58IkRPGByaDTKRjNrDJB
 BwH9DE9xb3dPVZRv/PejkTdggQWo+FDTrL8ulHIjUFK11M7VubwkskecNHkfpAOE
 TRy5iLjMocF8u7hdyec6Mma2K6qEndC2Rw9ZMPQ7TeieMsBcl63cSRgSJLFfdRhr
 /5c6Hr2SNQKU8xu+3j49GyBwFvp4CwCa+GPs9/o+l0uCvuKNIn9B788cm4TjxLJ9
 C3PRzE6B/CaLhYvlC5k5cNM+I4YpoMU/mvSvY6HcC0Duj2nSAWS2VV60MVMDpqVX
 8nK4xnla2tM=
 =bpPY
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Shrink 'struct instruction', to improve objtool performance & memory
   footprint

 - Other maximum memory usage reductions - this makes the build both
   faster, and fixes kernel build OOM failures on allyesconfig and
   similar configs when they try to build the final (large) vmlinux.o

 - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying
   single-byte instruction (PUSH/POP or LEAVE). This requires the
   extension of the ORC metadata structure with a 'signal' field

 - Misc fixes & cleanups

* tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  objtool: Fix ORC 'signal' propagation
  objtool: Remove instruction::list
  x86: Fix FILL_RETURN_BUFFER
  objtool: Fix overlapping alternatives
  objtool: Union instruction::{call_dest,jump_table}
  objtool: Remove instruction::reloc
  objtool: Shrink instruction::{type,visited}
  objtool: Make instruction::alts a single-linked list
  objtool: Make instruction::stack_ops a single-linked list
  objtool: Change arch_decode_instruction() signature
  x86/entry: Fix unwinding from kprobe on PUSH/POP instruction
  x86/unwind/orc: Add 'signal' field to ORC metadata
  objtool: Optimize layout of struct special_alt
  objtool: Optimize layout of struct symbol
  objtool: Allocate multiple structures with calloc()
  objtool: Make struct check_options static
  objtool: Make struct entries[] static and const
  objtool: Fix HOSTCC flag usage
  objtool: Properly support make V=1
  objtool: Install libsubcmd in build
  ...
2023-03-02 09:45:34 -08:00
Daniel Müller
c44fd84507 libbpf: Add support for attaching uprobes to shared objects in APKs
This change adds support for attaching uprobes to shared objects located
in APKs, which is relevant for Android systems where various libraries
may reside in APKs. To make that happen, we extend the syntax for the
"binary path" argument to attach to with that supported by various
Android tools:
  <archive>!/<binary-in-archive>

For example:
  /system/app/test-app/test-app.apk!/lib/arm64-v8a/libc++_shared.so

APKs need to be specified via full path, i.e., we do not attempt to
resolve mere file names by searching system directories.

We cannot currently test this functionality end-to-end in an automated
fashion, because it relies on an Android system being present, but there
is no support for that in CI. I have tested the functionality manually,
by creating a libbpf program containing a uretprobe, attaching it to a
function inside a shared object inside an APK, and verifying the sanity
of the returned values.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-4-deso@posteo.net
2023-03-01 16:05:35 -08:00
Daniel Müller
434fdcead7 libbpf: Introduce elf_find_func_offset_from_file() function
This change splits the elf_find_func_offset() function in two:
elf_find_func_offset(), which now accepts an already opened Elf object
instead of a path to a file that is to be opened, as well as
elf_find_func_offset_from_file(), which opens a binary based on a
path and then invokes elf_find_func_offset() on the Elf object. Having
this split in responsibilities will allow us to call
elf_find_func_offset() from other code paths on Elf objects that did not
necessarily come from a file on disk.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-3-deso@posteo.net
2023-03-01 16:05:35 -08:00
Daniel Müller
1eebcb6063 libbpf: Implement basic zip archive parsing support
This change implements support for reading zip archives, including
opening an archive, finding an entry based on its path and name in it,
and closing it.
The code was copied from https://github.com/iovisor/bcc/pull/4440, which
implements similar functionality for bcc. The author confirmed that he
is fine with this usage and the corresponding relicensing. I adjusted it
to adhere to libbpf coding standards.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Michał Gregorczyk <michalgr@meta.com>
Link: https://lore.kernel.org/bpf/20230301212308.1839139-2-deso@posteo.net
2023-03-01 16:05:34 -08:00
Andrii Nakryiko
35cbf7f915 selftests/bpf: Support custom per-test flags and multiple expected messages
Extend __flag attribute by allowing to specify one of the following:
 * BPF_F_STRICT_ALIGNMENT
 * BPF_F_ANY_ALIGNMENT
 * BPF_F_TEST_RND_HI32
 * BPF_F_TEST_STATE_FREQ
 * BPF_F_SLEEPABLE
 * BPF_F_XDP_HAS_FRAGS
 * Some numeric value

Extend __msg attribute by allowing to specify multiple exepcted messages.
All messages are expected to be present in the verifier log in the
order of application.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301175417.3146070-2-eddyz87@gmail.com

[ Eduard: added commit message, formatting, comments ]
2023-03-01 11:13:42 -08:00
Viktor Malik
4672129127 libbpf: Cleanup linker_append_elf_relos
Clang Static Analyser (scan-build) reports some unused symbols and dead
assignments in the linker_append_elf_relos function. Clean these up.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/c5c8fe9f411b69afada8399d23bb048ef2a70535.1677658777.git.vmalik@redhat.com
2023-03-01 11:13:11 -08:00
Viktor Malik
7832d06bd9 libbpf: Remove several dead assignments
Clang Static Analyzer (scan-build) reports several dead assignments in
libbpf where the assigned value is unconditionally overridden by another
value before it is read. Remove these assignments.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/5503d18966583e55158471ebbb2f67374b11bf5e.1677658777.git.vmalik@redhat.com
2023-03-01 11:13:11 -08:00
Viktor Malik
40e1bcab1e libbpf: Remove unnecessary ternary operator
Coverity reports that the first check of 'err' in bpf_object__init_maps
is always false as 'err' is initialized to 0 at that point. Remove the
unnecessary ternary operator.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/78a3702f2ea9f32a84faaae9b674c56269d330a7.1677658777.git.vmalik@redhat.com
2023-03-01 11:13:10 -08:00
Tiezhu Yang
be35f4af71 selftests/bpf: Set __BITS_PER_LONG if target is bpf for LoongArch
If target is bpf, there is no __loongarch__ definition, __BITS_PER_LONG
defaults to 32, __NR_nanosleep is not defined:

  #if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32
  #define __NR_nanosleep 101
  __SC_3264(__NR_nanosleep, sys_nanosleep_time32, sys_nanosleep)
  #endif

Work around this problem, by explicitly setting __BITS_PER_LONG to
__loongarch_grlen which is defined by compiler as 64 for LA64.

This is similar with commit 36e70b9b06 ("selftests, bpf: Fix broken
riscv build").

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1677585781-21628-1-git-send-email-yangtiezhu@loongson.cn
2023-03-01 11:05:50 -08:00
Kumar Kartikeya Dwivedi
85521e1ea4 selftests/bpf: Add more tests for kptrs in maps
Firstly, ensure programs successfully load when using all of the
supported maps. Then, extend existing tests to test more cases at
runtime. We are currently testing both the synchronous freeing of items
and asynchronous destruction when map is freed, but the code needs to be
adjusted a bit to be able to also accomodate support for percpu maps.

We now do a delete on the item (and update for array maps which has a
similar effect for kptrs) to perform a synchronous free of the kptr, and
test destruction both for the synchronous and asynchronous deletion.
Next time the program runs, it should observe the refcount as 1 since
all existing references should have been released by then. By running
the program after both possible paths freeing kptrs, we establish that
they correctly release resources. Next, we augment the existing test to
also test the same code path shared by all local storage maps using a
task local storage map.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230225154010.391965-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01 10:24:33 -08:00
Joanne Koong
cfa7b01189 selftests/bpf: tests for using dynptrs to parse skb and xdp buffers
Test skb and xdp dynptr functionality in the following ways:

1) progs/test_cls_redirect_dynptr.c
   * Rewrite "progs/test_cls_redirect.c" test to use dynptrs to parse
     skb data

   * This is a great example of how dynptrs can be used to simplify a
     lot of the parsing logic for non-statically known values.

     When measuring the user + system time between the original version
     vs. using dynptrs, and averaging the time for 10 runs (using
     "time ./test_progs -t cls_redirect"):
         original version: 0.092 sec
         with dynptrs: 0.078 sec

2) progs/test_xdp_dynptr.c
   * Rewrite "progs/test_xdp.c" test to use dynptrs to parse xdp data

     When measuring the user + system time between the original version
     vs. using dynptrs, and averaging the time for 10 runs (using
     "time ./test_progs -t xdp_attach"):
         original version: 0.118 sec
         with dynptrs: 0.094 sec

3) progs/test_l4lb_noinline_dynptr.c
   * Rewrite "progs/test_l4lb_noinline.c" test to use dynptrs to parse
     skb data

     When measuring the user + system time between the original version
     vs. using dynptrs, and averaging the time for 10 runs (using
     "time ./test_progs -t l4lb_all"):
         original version: 0.062 sec
         with dynptrs: 0.081 sec

     For number of processed verifier instructions:
         original version: 6268 insns
         with dynptrs: 2588 insns

4) progs/test_parse_tcp_hdr_opt_dynptr.c
   * Add sample code for parsing tcp hdr opt lookup using dynptrs.
     This logic is lifted from a real-world use case of packet parsing
     in katran [0], a layer 4 load balancer. The original version
     "progs/test_parse_tcp_hdr_opt.c" (not using dynptrs) is included
     here as well, for comparison.

     When measuring the user + system time between the original version
     vs. using dynptrs, and averaging the time for 10 runs (using
     "time ./test_progs -t parse_tcp_hdr_opt"):
         original version: 0.031 sec
         with dynptrs: 0.045 sec

5) progs/dynptr_success.c
   * Add test case "test_skb_readonly" for testing attempts at writes
     on a prog type with read-only skb ctx.
   * Add "test_dynptr_skb_data" for testing that bpf_dynptr_data isn't
     supported for skb progs.

6) progs/dynptr_fail.c
   * Add test cases "skb_invalid_data_slice{1,2,3,4}" and
     "xdp_invalid_data_slice{1,2}" for testing that helpers that modify the
     underlying packet buffer automatically invalidate the associated
     data slice.
   * Add test cases "skb_invalid_ctx" and "xdp_invalid_ctx" for testing
     that prog types that do not support bpf_dynptr_from_skb/xdp don't
     have access to the API.
   * Add test case "dynptr_slice_var_len{1,2}" for testing that
     variable-sized len can't be passed in to bpf_dynptr_slice
   * Add test case "skb_invalid_slice_write" for testing that writes to a
     read-only data slice are rejected by the verifier.
   * Add test case "data_slice_out_of_bounds_skb" for testing that
     writes to an area outside the slice are rejected.
   * Add test case "invalid_slice_rdwr_rdonly" for testing that prog
     types that don't allow writes to packet data don't accept any calls
     to bpf_dynptr_slice_rdwr.

[0] https://github.com/facebookincubator/katran/blob/main/katran/lib/bpf/pckt_parsing.h

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230301154953.641654-11-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01 10:05:19 -08:00
Linus Torvalds
f122a08b19 capability: just use a 'u64' instead of a 'u32[2]' array
Back in 2008 we extended the capability bits from 32 to 64, and we did
it by extending the single 32-bit capability word from one word to an
array of two words.  It was then obfuscated by hiding the "2" behind two
macro expansions, with the reasoning being that maybe it gets extended
further some day.

That reasoning may have been valid at the time, but the last thing we
want to do is to extend the capability set any more.  And the array of
values not only causes source code oddities (with loops to deal with
it), but also results in worse code generation.  It's a lose-lose
situation.

So just change the 'u32[2]' into a 'u64' and be done with it.

We still have to deal with the fact that the user space interface is
designed around an array of these 32-bit values, but that was the case
before too, since the array layouts were different (ie user space
doesn't use an array of 32-bit values for individual capability masks,
but an array of 32-bit slices of multiple masks).

So that marshalling of data is actually simplified too, even if it does
remain somewhat obscure and odd.

This was all triggered by my reaction to the new "cap_isidentical()"
introduced recently.  By just using a saner data structure, it went from

	unsigned __capi;
	CAP_FOR_EACH_U32(__capi) {
		if (a.cap[__capi] != b.cap[__capi])
			return false;
	}
	return true;

to just being

	return a.val == b.val;

instead.  Which is rather more obvious both to humans and to compilers.

Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-01 10:01:22 -08:00
Joanne Koong
66e3a13e7c bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
The user must pass in a buffer to store the contents of the data slice
if a direct pointer to the data cannot be obtained.

For skb and xdp type dynptrs, these two APIs are the only way to obtain
a data slice. However, for other types of dynptrs, there is no
difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data.

For skb type dynptrs, the data is copied into the user provided buffer
if any of the data is not in the linear portion of the skb. For xdp type
dynptrs, the data is copied into the user provided buffer if the data is
between xdp frags.

If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then
the skb will be uncloned (see bpf_unclone_prologue()).

Please note that any bpf_dynptr_write() automatically invalidates any prior
data slices of the skb dynptr. This is because the skb may be cloned or
may need to pull its paged buffer into the head. As such, any
bpf_dynptr_write() will automatically have its prior data slices
invalidated, even if the write is to data in the skb head of an uncloned
skb. Please note as well that any other helper calls that change the
underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
slices of the skb dynptr as well, for the same reasons.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01 09:55:24 -08:00
Joanne Koong
05421aecd4 bpf: Add xdp dynptrs
Add xdp dynptrs, which are dynptrs whose underlying pointer points
to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of xdp->data and xdp->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).

For reads and writes on the dynptr, this includes reading/writing
from/to and across fragments. Data slices through the bpf_dynptr_data
API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() should be used.

For examples of how xdp dynptrs can be used, please see the attached
selftests.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01 09:55:24 -08:00
Joanne Koong
b5964b968a bpf: Add skb dynptrs
Add skb dynptrs, which are dynptrs whose underlying pointer points
to a skb. The dynptr acts on skb data. skb dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of skb->data and skb->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).

For bpf prog types that don't support writes on skb data, the dynptr is
read-only (bpf_dynptr_write() will return an error)

For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
interfaces, reading and writing from/to data in the head as well as from/to
non-linear paged buffers is supported. Data slices through the
bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used.

For examples of how skb dynptrs can be used, please see the attached
selftests.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01 09:55:24 -08:00