Commit graph

1236353 commits

Author SHA1 Message Date
Jiri Pirko
ded6f77c05 devlink: extend multicast filtering by port index
Expose the previously introduced notification multicast messages
filtering infrastructure and allow the user to select messages using
port index.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 15:31:40 +01:00
Jiri Pirko
13b127d257 devlink: add a command to set notification filter and use it for multicasts
Currently the user listening on a socket for devlink notifications
gets always all messages for all existing instances, even if he is
interested only in one of those. That may cause unnecessary overhead
on setups with thousands of instances present.

User is currently able to narrow down the devlink objects replies
to dump commands by specifying select attributes.

Allow similar approach for notifications. Introduce a new devlink
NOTIFY_FILTER_SET which the user passes the select attributes. Store
these per-socket and use them for filtering messages
during multicast send.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 15:31:40 +01:00
Jiri Pirko
971b4ad882 genetlink: introduce helpers to do filtered multicast
Currently it is possible for netlink kernel user to pass custom
filter function to broadcast send function netlink_broadcast_filtered().
However, this is not exposed to multicast send and to generic
netlink users.

Extend the api and introduce a netlink helper nlmsg_multicast_filtered()
and a generic netlink helper genlmsg_multicast_netns_filtered()
to allow generic netlink families to specify filter function
while sending multicast messages.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 15:31:40 +01:00
Jiri Pirko
403863e985 netlink: introduce typedef for filter function
Make the code using filter function a bit nicer by consolidating the
filter function arguments using typedef.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 15:31:40 +01:00
Jiri Pirko
a731132424 genetlink: introduce per-sock family private storage
Introduce an xarray for Generic netlink family to store per-socket
private. Initialize this xarray only if family uses per-socket privs.

Introduce genl_sk_priv_get() to get the socket priv pointer for a family
and initialize it in case it does not exist.
Introduce __genl_sk_priv_get() to obtain socket priv pointer for a
family under RCU read lock.

Allow family to specify the priv size, init() and destroy() callbacks.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 15:31:40 +01:00
Jiri Pirko
5648de0b1f devlink: introduce a helper for netlink multicast send
Introduce a helper devlink_nl_notify_send() so each object notification
function does not have to call genlmsg_multicast_netns() with the same
arguments.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 15:31:40 +01:00
Jiri Pirko
cddbff470e devlink: send notifications only if there are listeners
Introduce devlink_nl_notify_need() helper and using it to check at the
beginning of notification functions to avoid overhead of composing
notification messages in case nobody listens.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 15:31:40 +01:00
Jiri Pirko
11280ddeae devlink: introduce __devl_is_registered() helper and use it instead of xa_get_mark()
Introduce __devl_is_registered() which does not assert on devlink
instance lock and use it in notifications which may be called
without devlink instance lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 15:31:40 +01:00
Jiri Pirko
337ad364c4 devlink: use devl_is_registered() helper instead xa_get_mark()
Instead of checking the xarray mark directly using xa_get_mark() helper
use devl_is_registered() helper which wraps it up. Note that there are
couple more users of xa_get_mark() left which are going to be handled
by the next patch.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 15:31:40 +01:00
Paolo Abeni
f7dd48ea76 Merge branch 'add-pf-vf-mailbox-support'
Shinas Rasheed says:

====================
add PF-VF mailbox support

This patchset aims to add PF-VF mailbox support, its related
version support, and relevant control net support for immediate
functionalities such as firmware notifications to VF.

Changes:
V6:
  - Fixed 1/4 patch to apply to top of net-next merged with net fixes

V5: https://lore.kernel.org/all/20231214164536.2670006-1-srasheed@marvell.com/
  - Refactored patches to cut out redundant changes in 1/4 patch.

V4: https://lore.kernel.org/all/20231213035816.2656851-1-srasheed@marvell.com/
  - Included tag [1/4] in subject of first patch of series which was
    lost in V3

V3: https://lore.kernel.org/all/20231211063355.2630028-1-srasheed@marvell.com/
  - Corrected error cleanup logic for PF-VF mbox setup
  - Removed double inclusion of types.h header file in octep_pfvf_mbox.c

V2: https://lore.kernel.org/all/20231209081450.2613561-1-srasheed@marvell.com/
  - Removed unused variable in PATCH 1/4

V1: https://lore.kernel.org/all/20231208070352.2606192-1-srasheed@marvell.com/
====================

Link: https://lore.kernel.org/r/20231215181425.2681426-1-srasheed@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 12:00:55 +01:00
Shinas Rasheed
4ebb86a97c octeon_ep: support firmware notifications for VFs
Notifications from firmware to vf has to pass through PF
control mbox and via PF-VF mailboxes. The notifications have to
be parsed out from the control mbox and passed to the
PF-VF mailbox in order to reach the corresponding VF.
Version compatibility should also be checked before messages
are passed to the mailboxes.

Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 12:00:53 +01:00
Shinas Rasheed
e28db8cbeb octeon_ep: control net framework to support VF offloads
Inquire firmware on supported offloads, as well as convey offloads
enabled dynamically to firmware for the VFs. Implement control net API
to support the same.

Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 12:00:53 +01:00
Shinas Rasheed
c130e589d5 octeon_ep: PF-VF mailbox version support
Add PF-VF mailbox initial version support

Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 12:00:53 +01:00
Shinas Rasheed
cde29af9e6 octeon_ep: add PF-VF mailbox communication
Implement mailbox communication between PF and VFs.
PF-VF mailbox is used for all control commands from VF to PF and
asynchronous notification messages from PF to VF.

Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-19 12:00:52 +01:00
Jakub Kicinski
c49b292d03 netdev
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmWAz2EACgkQ6rmadz2v
 bToqrw/9EwroZCc8GEHOKAlb/fzrMvn92rLo0ZW/cGN84QJPnx4zM6Zo0+fgLaaN
 oqqztwMUwdzGC3uX3FfVXaaLKbJ/MeHeL9BXFZNW8zkRHciw4R7kIBhOdPnHyET7
 uT+rQ4xPe1Mt7e9PjepKlSL5mEsxWfBkdUgsdn19Z2Vjdfr9mZMhYWYMJGcfTCD1
 TwxHKBPhq5fN3IsshmMBB8IrRp1HStUKb65MgZ4dI22LJXxTsFkx5XMFXcmuqvkH
 NhKj8jDcPEEh31bYcb6aG2Z4onw5F2lquygjk1Qyy5cyw45m/ipJKAXKdAyvJG+R
 VZCWOET/9wbRwFSK5wxwihCuKghFiofK52i2PcGtXZh0PCouyZZneSJOKM0yVWKO
 BvuJBxK4ETRnQyN6ZxhuJiEXG3/YMBBhyR2TX1LntVK9ct/k7qFVzATG49J39/sR
 SYMbptBRj4a5oMJ1qn0nFVEDFkg0jTnTDNnsEpcz60Ayt6EsJ1XosO5yz2huf861
 xgRMTKMseyG1/uV45tQ8ZPzbSPpBxjUi9Dl3coYsIm1a+y6clWUXcarONY5KVrpS
 CR98DuFgl+E7dXuisd/Kz2p2KxxSPq8nytsmLlgOvrUqhwiXqB+TKN8EHgIapVOt
 l1A5LrzXFTcGlT9MlaWBqEIy83Bu1nqQqbxrAFOE0k8A5jomXaw=
 =stU2
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-12-18

This PR is larger than usual and contains changes in various parts
of the kernel.

The main changes are:

1) Fix kCFI bugs in BPF, from Peter Zijlstra.

End result: all forms of indirect calls from BPF into kernel
and from kernel into BPF work with CFI enabled. This allows BPF
to work with CONFIG_FINEIBT=y.

2) Introduce BPF token object, from Andrii Nakryiko.

It adds an ability to delegate a subset of BPF features from privileged
daemon (e.g., systemd) through special mount options for userns-bound
BPF FS to a trusted unprivileged application. The design accommodates
suggestions from Christian Brauner and Paul Moore.

Example:
$ sudo mkdir -p /sys/fs/bpf/token
$ sudo mount -t bpf bpffs /sys/fs/bpf/token \
             -o delegate_cmds=prog_load:MAP_CREATE \
             -o delegate_progs=kprobe \
             -o delegate_attachs=xdp

3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.

 - Complete precision tracking support for register spills
 - Fix verification of possibly-zero-sized stack accesses
 - Fix access to uninit stack slots
 - Track aligned STACK_ZERO cases as imprecise spilled registers.
   It improves the verifier "instructions processed" metric from single
   digit to 50-60% for some programs.
 - Fix verifier retval logic

4) Support for VLAN tag in XDP hints, from Larysa Zaremba.

5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.

End result: better memory utilization and lower I$ miss for calls to BPF
via BPF trampoline.

6) Fix race between BPF prog accessing inner map and parallel delete,
from Hou Tao.

7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.

It allows BPF interact with IPSEC infra. The intent is to support
software RSS (via XDP) for the upcoming ipsec pcpu work.
Experiments on AWS demonstrate single tunnel pcpu ipsec reaching
line rate on 100G ENA nics.

8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.

9) BPF file verification via fsverity, from Song Liu.

It allows BPF progs get fsverity digest.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits)
  bpf: Ensure precise is reset to false in __mark_reg_const_zero()
  selftests/bpf: Add more uprobe multi fail tests
  bpf: Fail uprobe multi link with negative offset
  selftests/bpf: Test the release of map btf
  s390/bpf: Fix indirect trampoline generation
  selftests/bpf: Temporarily disable dummy_struct_ops test on s390
  x86/cfi,bpf: Fix bpf_exception_cb() signature
  bpf: Fix dtor CFI
  cfi: Add CFI_NOSEAL()
  x86/cfi,bpf: Fix bpf_struct_ops CFI
  x86/cfi,bpf: Fix bpf_callback_t CFI
  x86/cfi,bpf: Fix BPF JIT call
  cfi: Flip headers
  selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
  selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
  selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
  bpf: Limit the number of kprobes when attaching program to multiple kprobes
  bpf: Limit the number of uprobes when attaching program to multiple uprobes
  bpf: xdp: Register generic_kfunc_set with XDP programs
  selftests/bpf: utilize string values for delegate_xxx mount options
  ...
====================

Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 16:46:08 -08:00
Jakub Kicinski
0ee28c9ae0 wireless-next patches for v6.8
The second features pull request for v6.8. A bigger one this time with
 changes both to stack and drivers. We have a new Wifi band RFI (WBRF)
 mitigation feature for which we pulled an immutable branch shared with
 other subsystems. And, as always, other new features and bug fixes all
 over.
 
 Major changes:
 
 cfg80211/mac80211
 
 * AMD ACPI based Wifi band RFI (WBRF) mitigation feature
 
 * Basic Service Set (BSS) usage reporting
 
 * TID to link mapping support
 
 * mac80211 hardware flag to disallow puncturing
 
 iwlwifi
 
 * new debugfs file fw_dbg_clear
 
 mt76
 
 * NVMEM EEPROM improvements
 
 * mt7996 Extremely High Throughpu (EHT) improvements
 
 * mt7996 Wireless Ethernet Dispatcher (WED) support
 
 * mt7996 36-bit DMA support
 
 ath12k
 
 * support one MSI vector
 
 * WCN7850: support AP mode
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmWAdRERHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZu0RAf+JtHgfjmUMFb54xcncLgj8ZAN82E0ThE0
 bPewQDhot0QTri4s7i5Kn8PCWjk+eKEmiIK+eARM+JDyZMTlCpXs2Y92cDAGQ8KG
 +LbIMRQkwOUg0HmtX3NysUG3mGAx4QTcIX/y3+GmtMZpKXMFuNy6ODuFvuWFNJrF
 3XTq1qFQNnA0XqUDKHW9uareeCiOMVOsqcxNW2FAi2gqRUfQpKnU1Ukv5iOjkqE9
 i53GHzeAG2WI4/YjXaTEZvibkM3jqrPcquHlul3fVuq05qkKOEuiy2UalDjgDCYp
 u91vbmMpcOjhlf9GIiu2BF6K/muEUCCIjlh5oxob0k9NiKhnPUZLng==
 =6Y8M
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2023-12-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.8

The second features pull request for v6.8. A bigger one this time with
changes both to stack and drivers. We have a new Wifi band RFI (WBRF)
mitigation feature for which we pulled an immutable branch shared with
other subsystems. And, as always, other new features and bug fixes all
over.

Major changes:

cfg80211/mac80211
 * AMD ACPI based Wifi band RFI (WBRF) mitigation feature
 * Basic Service Set (BSS) usage reporting
 * TID to link mapping support
 * mac80211 hardware flag to disallow puncturing

iwlwifi
 * new debugfs file fw_dbg_clear

mt76
 * NVMEM EEPROM improvements
 * mt7996 Extremely High Throughpu (EHT) improvements
 * mt7996 Wireless Ethernet Dispatcher (WED) support
 * mt7996 36-bit DMA support

ath12k
 * support one MSI vector
 * WCN7850: support AP mode

* tag 'wireless-next-2023-12-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (207 commits)
  wifi: mt76: mt7996: Use DECLARE_FLEX_ARRAY() and fix -Warray-bounds warnings
  wifi: ath11k: workaround too long expansion sparse warnings
  Revert "wifi: ath12k: use ATH12K_PCI_IRQ_DP_OFFSET for DP IRQ"
  wifi: rt2x00: remove useless code in rt2x00queue_create_tx_descriptor()
  wifi: rtw89: only reset BB/RF for existing WiFi 6 chips while starting up
  wifi: rtw89: add DBCC H2C to notify firmware the status
  wifi: rtw89: mac: add suffix _ax to MAC functions
  wifi: rtw89: mac: add flags to check if CMAC and DMAC are enabled
  wifi: rtw89: 8922a: add power on/off functions
  wifi: rtw89: add XTAL SI for WiFi 7 chips
  wifi: rtw89: phy: print out RFK log with formatted string
  wifi: rtw89: parse and print out RFK log from C2H events
  wifi: rtw89: add C2H event handlers of RFK log and report
  wifi: rtw89: load RFK log format string from firmware file
  wifi: rtw89: fw: add version field to BB MCU firmware element
  wifi: rtw89: fw: load TX power track tables from fw_element
  wifi: mwifiex: configure BSSID consistently when starting AP
  wifi: mwifiex: add extra delay for firmware ready
  wifi: mac80211: sta_info.c: fix sentence grammar
  wifi: mac80211: rx.c: fix sentence grammar
  ...
====================

Link: https://lore.kernel.org/r/20231218163900.C031DC433C9@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 16:17:34 -08:00
Andrii Nakryiko
8e432e6197 bpf: Ensure precise is reset to false in __mark_reg_const_zero()
It is safe to always start with imprecise SCALAR_VALUE register.
Previously __mark_reg_const_zero() relied on caller to reset precise
mark, but it's very error prone and we already missed it in a few
places. So instead make __mark_reg_const_zero() reset precision always,
as it's a safe default for SCALAR_VALUE. Explanation is basically the
same as for why we are resetting (or rather not setting) precision in
current state. If necessary, precision propagation will set it to
precise correctly.

As such, also remove a big comment about forward precision propagation
in mark_reg_stack_read() and avoid unnecessarily setting precision to
true after reading from STACK_ZERO stack. Again, precision propagation
will correctly handle this, if that SCALAR_VALUE register will ever be
needed to be precise.

Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231218173601.53047-1-andrii@kernel.org
2023-12-18 23:54:21 +01:00
Jakub Kicinski
509afc7452 Merge branch 'tools-net-ynl-add-sub-message-support-to-ynl'
Donald Hunter says:

====================
tools/net/ynl: Add 'sub-message' support to ynl

This patchset adds a 'sub-message' attribute type to the netlink-raw
schema and implements it in ynl. This provides support for kind-specific
options attributes as used in rt_link and tc raw netlink families.

A description of the new 'sub-message' attribute type and the
corresponding sub-message definitions is provided in patch 3.

The patchset includes updates to the rt_link spec and a new tc spec that
make use of the new 'sub-message' attribute type.

As mentioned in patch 4, encode support is not yet implemented in ynl
and support for sub-message selectors at a different nest level from the
key attribute is not yet supported. I plan to work on these in follow-up
patches.

Patches 1 is code cleanup in ynl
Patches 2-4 add sub-message support to the schema and ynl with
documentation updates.
Patch 5 adds binary and pad support to structs in netlink-raw.
Patches 6-8 contain specs that use the sub-message attribute type.
Patches 9-13 update ynl-gen-rst and its make target
====================

Link: https://lore.kernel.org/r/20231215093720.18774-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:40:43 -08:00
Donald Hunter
9b0aa2244d tools/net/ynl-gen-rst: Remove extra indentation from generated docs
The output from ynl-gen-rst.py has extra indentation that causes extra
<blockquote> elements to be generated in the HTML output.

Reduce the indentation so that sphinx doesn't generate unnecessary
<blockquote> elements.

Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-14-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:44 -08:00
Donald Hunter
e9d7c59212 tools/net/ynl-gen-rst: Remove bold from attribute-set headings
The generated .rst for attribute-sets currently uses a sub-sub-heading
for each attribute, with the attribute name in bold. This makes
attributes stand out more than the attribute-set sub-headings they are
part of.

Remove the bold markup from attribute sub-sub-headings.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-13-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:44 -08:00
Donald Hunter
e8c32339cf tools/net/ynl-gen-rst: Sort the index of generated netlink specs
The index of netlink specs was being generated unsorted. Sort the output
before generating the index entries.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-12-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:44 -08:00
Donald Hunter
6235b3d8bc tools/net/ynl-gen-rst: Add sub-messages to generated docs
Add a section for sub-messages to the generated .rst files.

Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-11-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:44 -08:00
Donald Hunter
646158f20c doc/netlink: Regenerate netlink .rst files if ynl-gen-rst changes
Add ynl-gen-rst.py to the dependencies for the netlink .rst files in the
doc Makefile so that the docs get regenerated if the ynl-gen-rst.py
script is modified. Use $(Q) to honour V=1 in the rules that run
ynl-gen-rst.py

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-10-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:44 -08:00
Donald Hunter
a1bcfde836 doc/netlink/specs: Add a spec for tc
This is a work-in-progress spec for tc that covers:
 - most of the qdiscs
 - the flower classifier
 - new, del, get for qdisc, chain, class and filter

Notable omissions:
 - most of the stats attrs are left as binary blobs
 - notifications are not yet implemented

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-9-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:44 -08:00
Donald Hunter
6b4b0754ef doc/netlink/specs: use pad in structs in rt_link
The rt_link spec was using pad1, pad2 attributes in structs which
appears in the ynl output. Replace this with the 'pad' type which
doesn't pollute the output.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-8-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:44 -08:00
Donald Hunter
077b6022d2 doc/netlink/specs: Add sub-message type to rt_link family
Start using sub-message selectors in the rt_link spec for the
link-specific 'data' and 'slave-data' attributes.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-7-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:44 -08:00
Donald Hunter
8b6811d966 tools/net/ynl: Add binary and pad support to structs for tc
The tc netlink-raw family needs binary and pad types for several
qopt C structs. Add support for them to ynl.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-6-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:43 -08:00
Donald Hunter
1769e2be4b tools/net/ynl: Add 'sub-message' attribute decoding to ynl
Implement the 'sub-message' attribute type in ynl.

Encode support is not yet implemented. Support for sub-message selectors
at a different nest level from the key attribute is not yet supported.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-5-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:43 -08:00
Donald Hunter
17ed5c1a9e doc/netlink: Document the sub-message format for netlink-raw
Document the spec format used by netlink-raw families like rt and tc.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-4-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:43 -08:00
Donald Hunter
de2d98743b doc/netlink: Add sub-message support to netlink-raw
Add a 'sub-message' attribute type with a selector that supports
polymorphic attribute formats for raw netlink families like tc.

A sub-message attribute uses the value of another attribute as a
selector key to choose the right sub-message format. For example if the
following attribute has already been decoded:

  { "kind": "gre" }

and we encounter the following attribute spec:

  -
    name: data
    type: sub-message
    sub-message: linkinfo-data-msg
    selector: kind

Then we look for a sub-message definition called 'linkinfo-data-msg' and
use the value of the 'kind' attribute i.e. 'gre' as the key to choose
the correct format for the sub-message:

  sub-messages:
    name: linkinfo-data-msg
    formats:
      -
        value: bridge
        attribute-set: linkinfo-bridge-attrs
      -
        value: gre
        attribute-set: linkinfo-gre-attrs
      -
        value: geneve
        attribute-set: linkinfo-geneve-attrs

This would decode the attribute value as a sub-message with the
attribute-set called 'linkinfo-gre-attrs' as the attribute space.

A sub-message can have an optional 'fixed-header' followed by zero or
more attributes from an attribute-set. For example the following
'tc-options-msg' sub-message defines message formats that use a mixture
of fixed-header, attribute-set or both together:

  sub-messages:
    -
      name: tc-options-msg
      formats:
        -
          value: bfifo
          fixed-header: tc-fifo-qopt
        -
          value: cake
          attribute-set: tc-cake-attrs
        -
          value: netem
          fixed-header: tc-netem-qopt
          attribute-set: tc-netem-attrs

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-3-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:43 -08:00
Donald Hunter
62691b801d tools/net/ynl: Use consistent array index expression formatting
Use expression formatting that conforms to the python style guide.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20231215093720.18774-2-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 14:39:43 -08:00
Andrii Nakryiko
6079ae6376 Merge branch 'bpf-add-check-for-negative-uprobe-multi-offset'
Jiri Olsa says:

====================
bpf: Add check for negative uprobe multi offset

hi,
adding the check for negative offset for uprobe multi link.

v2 changes:
- add more failure checks [Alan]
- move the offset retrieval/check up in the loop to be done earlier [Song]

thanks,
jirka
---
====================

Link: https://lore.kernel.org/r/20231217215538.3361991-1-jolsa@kernel.org
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-12-18 09:52:17 -08:00
Jiri Olsa
f17d1a18a3 selftests/bpf: Add more uprobe multi fail tests
We fail to create uprobe if we pass negative offset. Add more tests
validating kernel-side error checking code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20231217215538.3361991-3-jolsa@kernel.org
2023-12-18 09:51:50 -08:00
Jiri Olsa
3983c00281 bpf: Fail uprobe multi link with negative offset
Currently the __uprobe_register will return 0 (success) when called with
negative offset. The reason is that the call to register_for_each_vma and
then build_map_info won't return error for negative offset. They just won't
do anything - no matching vma is found so there's no registered breakpoint
for the uprobe.

I don't think we can change the behaviour of __uprobe_register and fail
for negative uprobe offset, because apps might depend on that already.

But I think we can still make the change and check for it on bpf multi
link syscall level.

Also moving the __get_user call and check for the offsets to the top of
loop, to fail early without extra __get_user calls for ref_ctr_offset
and cookie arrays.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20231217215538.3361991-2-jolsa@kernel.org
2023-12-18 09:51:30 -08:00
Hou Tao
e58aac1a9a selftests/bpf: Test the release of map btf
When there is bpf_list_head or bpf_rb_root field in map value, the free
of map btf and the free of map value may run concurrently and there may
be use-after-free problem, so add two test cases to demonstrate it. And
the use-after-free problem can been easily reproduced by using bpf_next
tree and a KASAN-enabled kernel.

The first test case tests the racing between the free of map btf and the
free of array map. It constructs the racing by releasing the array map in
the end after other ref-counter of map btf has been released. To delay
the free of array map and make it be invoked after btf_free_rcu() is
invoked, it stresses system_unbound_wq by closing multiple percpu array
maps before it closes the array map.

The second case tests the racing between the free of map btf and the
free of inner map. Beside using the similar method as the first one
does, it uses bpf_map_delete_elem() to delete the inner map and to defer
the release of inner map after one RCU grace period.

The reason for using two skeletons is to prevent the release of outer
map and inner map in map_in_map_btf.c interfering the release of bpf
map in normal_map_btf.c.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20231216035510.4030605-1-houtao@huaweicloud.com
2023-12-18 18:15:49 +01:00
Alexei Starovoitov
0c970ed2f8 s390/bpf: Fix indirect trampoline generation
The func_addr used to be NULL for indirect trampolines used by struct_ops.
Now func_addr is a valid function pointer.
Hence use BPF_TRAMP_F_INDIRECT flag to detect such condition.

Fixes: 2cd3e3772e ("x86/cfi,bpf: Fix bpf_struct_ops CFI")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20231216004549.78355-1-alexei.starovoitov@gmail.com
2023-12-18 12:00:37 +01:00
David S. Miller
610a689d2a Merge branch 'rtnl-rcu'
Pedro Tammela says:

====================
net: rtnl: introduce rcu_replace_pointer_rtnl

Introduce the rcu_replace_pointer_rtnl helper to lockdep check rtnl lock
rcu replacements, alongside the already existing helpers.

Patch 2 uses the new helper in the rtnl_unregister_* functions.

Originally this change was part of the P4TC series, as it's a recurrent
pattern there, but since it has a use case in mainline we are pushing it
separately.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-18 02:05:45 +00:00
Pedro Tammela
174523479a net: rtnl: use rcu_replace_pointer_rtnl in rtnl_unregister_*
With the introduction of the rcu_replace_pointer_rtnl helper,
cleanup the rtnl_unregister_* functions to use the helper instead
of open coding it.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-18 02:05:45 +00:00
Jamal Hadi Salim
32da0f00dd net: rtnl: introduce rcu_replace_pointer_rtnl
Introduce the rcu_replace_pointer_rtnl helper to lockdep check rtnl lock
rcu replacements, alongside the already existing helpers.

This is a quality of life helper so instead of using:
   rcu_replace_pointer(rp, p, lockdep_rtnl_is_held())
   .. or the open coded..
   rtnl_dereference() / rcu_assign_pointer()
   .. or the lazy check version ..
   rcu_replace_pointer(rp, p, 1)
Use:
   rcu_replace_pointer_rtnl(rp, p)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-18 02:05:45 +00:00
David S. Miller
54f4c2570a Merge branch 'phy-ackage-addr-mmd-apis'
Christian Marangi says:

====================
net: phy: add PHY package base addr + mmd APIs

This small series is required for the upcoming qca807x PHY that
will make use of PHY package mmd API and the new implementation
with read/write based on base addr.

The MMD PHY package patch currently has no use but it will be
used in the upcoming patch and it does complete what a PHY package
may require in addition to basic read/write to setup global PHY address.

(Changelog for all the revision is present in the single patch)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17 20:10:08 +00:00
Christian Marangi
d63710fc0f net: phy: add support for PHY package MMD read/write
Some PHY in PHY package may require to read/write MMD regs to correctly
configure the PHY package.

Add support for these additional required function in both lock and no
lock variant.

It's assumed that the entire PHY package is either C22 or C45. We use
C22 or C45 way of writing/reading to mmd regs based on the passed phydev
whether it's C22 or C45.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17 20:10:07 +00:00
Christian Marangi
028672bd1d net: phy: restructure __phy_write/read_mmd to helper and phydev user
Restructure phy_write_mmd and phy_read_mmd to implement generic helper
for direct mdiobus access for mmd and use these helper for phydev user.

This is needed in preparation of PHY package API that requires generic
access to the mdiobus and are deatched from phydev struct but instead
access them based on PHY package base_addr and offsets.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17 20:10:07 +00:00
Christian Marangi
9eea577eb1 net: phy: extend PHY package API to support multiple global address
Current API for PHY package are limited to single address to configure
global settings for the PHY package.

It was found that some PHY package (for example the qca807x, a PHY
package that is shipped with a bundle of 5 PHY) requires multiple PHY
address to configure global settings. An example scenario is a PHY that
have a dedicated PHY for PSGMII/serdes calibrarion and have a specific
PHY in the package where the global PHY mode is set and affects every
other PHY in the package.

Change the API in the following way:
- Change phy_package_join() to take the base addr of the PHY package
  instead of the global PHY addr.
- Make __/phy_package_write/read() require an additional arg that
  select what global PHY address to use by passing the offset from the
  base addr passed on phy_package_join().

Each user of this API is updated to follow this new implementation
following a pattern where an enum is defined to declare the offset of the
addr.

We also drop the check if shared is defined as any user of the
phy_package_read/write is expected to use phy_package_join first. Misuse
of this will correctly trigger a kernel panic for NULL pointer
exception.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17 20:10:07 +00:00
Christian Marangi
ebb30ccbbd net: phy: make addr type u8 in phy_package_shared struct
Switch addr type in phy_package_shared struct to u8.

The value is already checked to be non negative and to be less than
PHY_MAX_ADDR, hence u8 is better suited than using int.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17 20:10:07 +00:00
Suman Ghosh
dd78428786 octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs
On some silicon variants the number of available CAM entries are
less. Reserving one entry for each NIX-LF for default DMAC based pkt
forwarding rules will reduce the number of available CAM entries
further. Hence add configurability via devlink to set maximum number of
NIX-LFs needed which inturn frees up some CAM entries.

Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17 20:05:35 +00:00
Kalle Valo
c5a3f56fcd ath.git patches for v6.8.
We have new features only for ath12k but lots of small cleanup for
 ath10k, ath11k and ath12k. And of course smaller fixes to several
 drivers.
 
 Major changes:
 
 ath12k
 
 * support one MSI vector
 
 * WCN7850: support AP mode
 -----BEGIN PGP SIGNATURE-----
 
 iQFLBAABCgA1FiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmV8odsXHHF1aWNfa3Zh
 bG9AcXVpY2luYy5jb20ACgkQbhckVSbrbZu1hgf9H+4qF8EvxpDPqse5ynDfUqeq
 sStOSAWtNvwqTLlCeHHOc5xpuifNo9ff9J59mb2WrRo6gO0PY0meTxQYPgaq1xZF
 lA4tW8yVpcyO+d3CCpFPWUOx0pbJW7UTzFl5A5TDCfGiY1ypFEGUzxnztvHfdY4L
 rIjWGAFEsVRGDNM2QVNsfhGm3DbXIKm+9XzhhNlNspaZ0/GS+E9BcXwSyKP5G77A
 MG9oRTIM++tdkCiJAId8Wt0AwBTGGuik5QnRb8JVsbBeigdWuUU6otXTc3WonzJE
 93OEOcYJvmQgr5KLyk9euXJ7XVmwnn2JYY9jwPRSgZKvU3PkvxAALLYwMxOW8w==
 =5NRj
 -----END PGP SIGNATURE-----

Merge tag 'ath-next-20231215' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath

ath.git patches for v6.8.

We have new features only for ath12k but lots of small cleanup for
ath10k, ath11k and ath12k. And of course smaller fixes to several
drivers.

Major changes:

ath12k

* support one MSI vector

* WCN7850: support AP mode
2023-12-17 13:20:18 +02:00
Gustavo A. R. Silva
40d51f70f0 wifi: mt76: mt7996: Use DECLARE_FLEX_ARRAY() and fix -Warray-bounds warnings
Transform zero-length arrays `rate`, `adm_stat` and `msdu_cnt` into
proper flexible-array members in anonymous union in `struct
mt7996_mcu_all_sta_info_event` via the DECLARE_FLEX_ARRAY()
helper; and fix multiple -Warray-bounds warnings:

drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:544:61: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:551:58: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:553:58: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:530:61: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:538:66: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:540:66: warning: array subscript <unknown> is outside array bounds of 'struct <anonymous>[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:520:57: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:526:76: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:526:76: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:526:76: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=]
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:526:76: warning: array subscript <unknown> is outside array bounds of 'struct all_sta_trx_rate[0]' [-Warray-bounds=]

This results in no differences in binary output, helps with the ongoing
efforts to globally enable -Warray-bounds.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/ZXiU9ayVCslt3qiI@work
2023-12-17 13:17:03 +02:00
David S. Miller
3a3af3aedb Merge branch 'skb-coalescing-page_pool'
Liang Chen says:

====================
skbuff: Optimize SKB coalescing for page pool

The combination of the following condition was excluded from skb coalescing:

from->pp_recycle = 1
from->cloned = 1
to->pp_recycle = 1

With page pool in use, this combination can be quite common(ex.
NetworkMananger may lead to the additional packet_type being registered,
thus the cloning). In scenarios with a higher number of small packets, it
can significantly affect the success rate of coalescing.

This patchset aims to optimize this scenario and enable coalescing of this
particular combination. That also involves supporting multiple users
referencing the same fragment of a pp page to accomondate the need to
increment the "from" SKB page's pp page reference count.

Changes from v10:
- re-number patches to 1/3, 2/3, 3/3

Changes from v9:
- patch 1 was already applied
- imporve description for patch 2
- make sure skb_pp_frag_ref only work for pp aware skbs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17 10:56:33 +00:00
Liang Chen
f7dc3248dc skbuff: Optimization of SKB coalescing for page pool
In order to address the issues encountered with commit 1effe8ca4e
("skbuff: fix coalescing for page_pool fragment recycling"), the
combination of the following condition was excluded from skb coalescing:

from->pp_recycle = 1
from->cloned = 1
to->pp_recycle = 1

However, with page pool environments, the aforementioned combination can
be quite common(ex. NetworkMananger may lead to the additional
packet_type being registered, thus the cloning). In scenarios with a
higher number of small packets, it can significantly affect the success
rate of coalescing. For example, considering packets of 256 bytes size,
our comparison of coalescing success rate is as follows:

Without page pool: 70%
With page pool: 13%

Consequently, this has an impact on performance:

Without page pool: 2.57 Gbits/sec
With page pool: 2.26 Gbits/sec

Therefore, it seems worthwhile to optimize this scenario and enable
coalescing of this particular combination. To achieve this, we need to
ensure the correct increment of the "from" SKB page's page pool
reference count (pp_ref_count).

Following this optimization, the success rate of coalescing measured in
our environment has improved as follows:

With page pool: 60%

This success rate is approaching the rate achieved without using page
pool, and the performance has also been improved:

With page pool: 2.52 Gbits/sec

Below is the performance comparison for small packets before and after
this optimization. We observe no impact to packets larger than 4K.

packet size     before      after       improved
(bytes)         (Gbits/sec) (Gbits/sec)
128             1.19        1.27        7.13%
256             2.26        2.52        11.75%
512             4.13        4.81        16.50%
1024            6.17        6.73        9.05%
2048            14.54       15.47       6.45%
4096            25.44       27.87       9.52%

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17 10:56:33 +00:00
Liang Chen
8cfa2dee32 skbuff: Add a function to check if a page belongs to page_pool
Wrap code for checking if a page is a page_pool page into a
function for better readability and ease of reuse.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Mina Almasry <almarsymina@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-17 10:56:33 +00:00