Commit graph

1135586 commits

Author SHA1 Message Date
Jie Meng
77d8f5d47b bpf,x64: use shrx/sarx/shlx when available
BMI2 provides 3 shift instructions (shrx, sarx and shlx) that use VEX
encoding but target general purpose registers [1]. They allow the shift
count in any general purpose register and have the same performance as
non BMI2 shift instructions [2].

Instead of shr/sar/shl that implicitly use %cl (lowest 8 bit of %rcx),
emit their more flexible alternatives provided in BMI2 when advantageous;
keep using the non BMI2 instructions when shift count is already in
BPF_REG_4/%rcx as non BMI2 instructions are shorter.

To summarize, when BMI2 is available:
-------------------------------------------------
            |   arbitrary dst
=================================================
src == ecx  |   shl dst, cl
-------------------------------------------------
src != ecx  |   shlx dst, dst, src
-------------------------------------------------

And no additional register shuffling is needed.

A concrete example between non BMI2 and BMI2 codegen.  To shift %rsi by
%rdi:

Without BMI2:

 ef3:   push   %rcx
        51
 ef4:   mov    %rdi,%rcx
        48 89 f9
 ef7:   shl    %cl,%rsi
        48 d3 e6
 efa:   pop    %rcx
        59

With BMI2:

 f0b:   shlx   %rdi,%rsi,%rsi
        c4 e2 c1 f7 f6

[1] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
[2] https://www.agner.org/optimize/instruction_tables.pdf

Signed-off-by: Jie Meng <jmeng@fb.com>
Link: https://lore.kernel.org/r/20221007202348.1118830-3-jmeng@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-19 16:53:51 -07:00
Jie Meng
81b35e7cad bpf,x64: avoid unnecessary instructions when shift dest is ecx
x64 JIT produces redundant instructions when a shift operation's
destination register is BPF_REG_4/ecx and this patch removes them.

Specifically, when dest reg is BPF_REG_4 but the src isn't, we
needn't push and pop ecx around shift only to get it overwritten
by r11 immediately afterwards.

In the rare case when both dest and src registers are BPF_REG_4,
a single shift instruction is sufficient and we don't need the
two MOV instructions around the shift.

To summarize using shift left as an example, without patch:
-------------------------------------------------
            |   dst == ecx     |    dst != ecx
=================================================
src == ecx  |   mov r11, ecx   |    shl dst, cl
            |   shl r11, ecx   |
            |   mov ecx, r11   |
-------------------------------------------------
src != ecx  |   mov r11, ecx   |    push ecx
            |   push ecx       |    mov ecx, src
            |   mov ecx, src   |    shl dst, cl
            |   shl r11, cl    |    pop ecx
            |   pop ecx        |
            |   mov ecx, r11   |
-------------------------------------------------

With patch:
-------------------------------------------------
            |   dst == ecx     |    dst != ecx
=================================================
src == ecx  |   shl ecx, cl    |    shl dst, cl
-------------------------------------------------
src != ecx  |   mov r11, ecx   |    push ecx
            |   mov ecx, src   |    mov ecx, src
            |   shl r11, cl    |    shl dst, cl
            |   mov ecx, r11   |    pop ecx
-------------------------------------------------

Signed-off-by: Jie Meng <jmeng@fb.com>
Link: https://lore.kernel.org/r/20221007202348.1118830-2-jmeng@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-19 16:53:51 -07:00
Alexei Starovoitov
7d8d535546 Merge branch 'libbpf: support non-mmap()'able data sections'
Andrii Nakryiko says:

====================

Make libbpf more conservative in using BPF_F_MMAPABLE flag with internal BPF
array maps that are backing global data sections. See patch #2 for full
description and justification.

Changes in this dataset support having bpf_spinlock, kptr, rb_tree nodes and
other "special" variables as global variables. Combining this with libbpf's
existing support for multiple custom .data.* sections allows BPF programs to
utilize multiple spinlock/rbtree_node/kptr variables in a pretty natural way
by just putting all such variables into separate data sections (and thus ARRAY
maps).

v1->v2:
  - address Stanislav's feedback, adds acks.
====================

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-19 16:41:05 -07:00
Andrii Nakryiko
2f968e9f4a libbpf: add non-mmapable data section selftest
Add non-mmapable data section to test_skeleton selftest and make sure it
really isn't mmapable by trying to mmap() it anyways.

Also make sure that libbpf doesn't report BPF_F_MMAPABLE flag to users.

Additional, some more manual testing was performed that this feature
works as intended.

Looking at created map through bpftool shows that flags passed to kernel are
indeed zero:

  $ bpftool map show
  ...
  1782: array  name .data.non_mmapa  flags 0x0
          key 4B  value 16B  max_entries 1  memlock 4096B
          btf_id 1169
          pids test_progs(8311)
  ...

Checking BTF uploaded to kernel for this map shows that zero_key and
zero_value are indeed marked as static, even though zero_key is actually
original global (but STV_HIDDEN) variable:

  $ bpftool btf dump id 1169
  ...
  [51] VAR 'zero_key' type_id=2, linkage=static
  [52] VAR 'zero_value' type_id=7, linkage=static
  ...
  [62] DATASEC '.data.non_mmapable' size=16 vlen=2
          type_id=51 offset=0 size=4 (VAR 'zero_key')
          type_id=52 offset=4 size=12 (VAR 'zero_value')
  ...

And original BTF does have zero_key marked as linkage=global:

  $ bpftool btf dump file test_skeleton.bpf.linked3.o
  ...
  [51] VAR 'zero_key' type_id=2, linkage=global
  [52] VAR 'zero_value' type_id=7, linkage=static
  ...
  [62] DATASEC '.data.non_mmapable' size=16 vlen=2
          type_id=51 offset=0 size=4 (VAR 'zero_key')
          type_id=52 offset=4 size=12 (VAR 'zero_value')

Bpftool didn't require any changes at all because it checks whether internal
map is mmapable already, but just to double-check generated skeleton, we
see that .data.non_mmapable neither sets mmaped pointer nor has
a corresponding field in the skeleton:

  $ grep non_mmapable test_skeleton.skel.h
                  struct bpf_map *data_non_mmapable;
          s->maps[7].name = ".data.non_mmapable";
          s->maps[7].map = &obj->maps.data_non_mmapable;

But .data.read_mostly has all of those things:

  $ grep read_mostly test_skeleton.skel.h
                  struct bpf_map *data_read_mostly;
          struct test_skeleton__data_read_mostly {
                  int read_mostly_var;
          } *data_read_mostly;
          s->maps[6].name = ".data.read_mostly";
          s->maps[6].map = &obj->maps.data_read_mostly;
          s->maps[6].mmaped = (void **)&obj->data_read_mostly;
          _Static_assert(sizeof(s->data_read_mostly->read_mostly_var) == 4, "unexpected size of 'read_mostly_var'");

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221019002816.359650-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-19 16:40:45 -07:00
Andrii Nakryiko
4fcac46c7e libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars
Teach libbpf to not add BPF_F_MMAPABLE flag unnecessarily for ARRAY maps
that are backing data sections, if such data sections don't expose any
variables to user-space. Exposed variables are those that have
STB_GLOBAL or STB_WEAK ELF binding and correspond to BTF VAR's
BTF_VAR_GLOBAL_ALLOCATED linkage.

The overall idea is that if some data section doesn't have any variable that
is exposed through BPF skeleton, then there is no reason to make such
BPF array mmapable. Making BPF array mmapable is not a free no-op
action, because BPF verifier doesn't allow users to put special objects
(such as BPF spin locks, RB tree nodes, linked list nodes, kptrs, etc;
anything that has a sensitive internal state that should not be modified
arbitrarily from user space) into mmapable arrays, as there is no way to
prevent user space from corrupting such sensitive state through direct
memory access through memory-mapped region.

By making sure that libbpf doesn't add BPF_F_MMAPABLE flag to BPF array
maps corresponding to data sections that only have static variables
(which are not supposed to be visible to user space according to libbpf
and BPF skeleton rules), users now can have spinlocks, kptrs, etc in
either default .bss/.data sections or custom .data.* sections (assuming
there are no global variables in such sections).

The only possible hiccup with this approach is the need to use global
variables during BPF static linking, even if it's not intended to be
shared with user space through BPF skeleton. To allow such scenarios,
extend libbpf's STV_HIDDEN ELF visibility attribute handling to
variables. Libbpf is already treating global hidden BPF subprograms as
static subprograms and adjusts BTF accordingly to make BPF verifier
verify such subprograms as static subprograms with preserving entire BPF
verifier state between subprog calls. This patch teaches libbpf to treat
global hidden variables as static ones and adjust BTF information
accordingly as well. This allows to share variables between multiple
object files during static linking, but still keep them internal to BPF
program and not get them exposed through BPF skeleton.

Note, that if the user has some advanced scenario where they absolutely
need BPF_F_MMAPABLE flag on .data/.bss/.rodata BPF array map despite
only having static variables, they still can achieve this by forcing it
through explicit bpf_map__set_map_flags() API.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20221019002816.359650-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-19 16:40:45 -07:00
Andrii Nakryiko
f33f742d56 libbpf: clean up and refactor BTF fixup step
Refactor libbpf's BTF fixup step during BPF object open phase. The only
functional change is that we now ignore BTF_VAR_GLOBAL_EXTERN variables
during fix up, not just BTF_VAR_STATIC ones, which shouldn't cause any
change in behavior as there shouldn't be any extern variable in data
sections for valid BPF object anyways.

Otherwise it's just collapsing two functions that have no reason to be
separate, and switching find_elf_var_offset() helper to return entire
symbol pointer, not just its offset. This will be used by next patch to
get ELF symbol visibility.

While refactoring, also "normalize" debug messages inside
btf_fixup_datasec() to follow general libbpf style and print out data
section name consistently, where it's available.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221019002816.359650-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-19 16:40:45 -07:00
Daniel Müller
81bfcc3fcd bpf/docs: Summarize CI system and deny lists
This change adds a brief summary of the BPF continuous integration (CI)
to the BPF selftest documentation. The summary focuses not so much on
actual workings of the CI, as it is maintained outside of the
repository, but aims to document the few bits of it that are sourced
from this repository and that developers may want to adjust as part of
patch submissions: the BPF kernel configuration and the deny list
file(s).

Changelog:
- v1->v2:
  - use s390x instead of s390 for consistency

Signed-off-by: Daniel Müller <deso@posteo.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221018164015.1970862-1-deso@posteo.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-10-19 11:42:01 -07:00
Daniel Müller
2c4d72d66b samples/bpf: Fix typos in README
This change fixes some typos found in the BPF samples README file.

Signed-off-by: Daniel Müller <deso@posteo.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221018163231.1926462-1-deso@posteo.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-10-19 11:30:09 -07:00
Shaomin Deng
01dea9548f samples/bpf: Fix double word in comments
Remove the repeated word "by" in comments.

Signed-off-by: Shaomin Deng <dengshaomin@cdjrlc.com>
Link: https://lore.kernel.org/r/20221017142303.8299-1-dengshaomin@cdjrlc.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-10-19 11:21:49 -07:00
Gerhard Engleder
7a698edf95 samples/bpf: Fix MAC address swapping in xdp2_kern
xdp2_kern rewrites and forwards packets out on the same interface.
Forwarding still works but rewrite got broken when xdp multibuffer
support has been added.

With xdp multibuffer a local copy of the packet has been introduced. The
MAC address is now swapped in the local copy, but the local copy in not
written back.

Fix MAC address swapping be adding write back of modified packet.

Fixes: 7722517422 ("samples/bpf: fixup some tools to be able to support xdp multibuffer")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Link: https://lore.kernel.org/r/20221015213050.65222-1-gerhard@engleder-embedded.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-10-19 11:06:29 -07:00
Gerhard Engleder
05ee658c65 samples/bpf: Fix map iteration in xdp1_user
BPF map iteration in xdp1_user results in endless loop without any
output, because the return value of bpf_map_get_next_key() is checked
against the wrong value.

Other call locations of bpf_map_get_next_key() check for equal 0 for
continuing the iteration. xdp1_user checks against unequal -1. This is
wrong for a function which can return arbitrary negative errno values,
because a return value of e.g. -2 results in an endless loop.

With this fix xdp1_user is printing statistics again:
proto 0:          1 pkt/s
proto 0:          1 pkt/s
proto 17:     107383 pkt/s
proto 17:     881655 pkt/s
proto 17:     882083 pkt/s
proto 17:     881758 pkt/s

Fixes: bd054102a8 ("libbpf: enforce strict libbpf 1.0 behaviors")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221013200922.17167-1-gerhard@engleder-embedded.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-10-19 10:56:21 -07:00
Alexandru Tachici
a526a3cc9c net: ethernet: adi: adin1110: Fix SPI transfers
No need to use more than one SPI transfer for reads.
Use only one from now as ADIN1110/2111 does not tolerate
CS changes during reads.

The BCM2711/2708 SPI controllers worked fine, but the NXP
IMX8MM could not keep CS lowered during SPI bursts.

This change aims to make the ADIN1110/2111 driver compatible
with both SPI controllers, without any loss of bandwidth/other
capabilities.

Fixes: bc93e19d08 ("net: ethernet: adi: Add ADIN1110 support")
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 14:20:37 +01:00
David S. Miller
ac3208fbac Merge branch 'net-bridge-mc-cleanups'
Ido Schimmel says:

====================
bridge: A few multicast cleanups

Clean up a few issues spotted while working on the bridge multicast code
and running its selftests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 14:01:08 +01:00
Ido Schimmel
d1942cd47d bridge: mcast: Simplify MDB entry creation
Before creating a new MDB entry, br_multicast_new_group() will call
br_mdb_ip_get() to see if one exists and return it if so.

Therefore, simply call br_multicast_new_group() and omit the call to
br_mdb_ip_get().

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 14:01:08 +01:00
Ido Schimmel
262985fad1 bridge: mcast: Use spin_lock() instead of spin_lock_bh()
IGMPv3 / MLDv2 Membership Reports are only processed from the data path
with softIRQ disabled, so there is no need to call spin_lock_bh(). Use
spin_lock() instead.

This is consistent with how other IGMP / MLD packets are processed.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 14:01:08 +01:00
Ido Schimmel
b526b2ea14 selftests: bridge_igmp: Remove unnecessary address deletion
The test group address is added and removed in v2reportleave_test().
There is no need to delete it again during cleanup as it results in the
following error message:

 # bash -x ./bridge_igmp.sh
 [...]
 + cleanup
 + pre_cleanup
 [...]
 + ip address del dev swp4 239.10.10.10/32
 RTNETLINK answers: Cannot assign requested address
 + h2_destroy

Solve by removing the unnecessary address deletion.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 14:01:08 +01:00
Ido Schimmel
6fb1faa1b9 selftests: bridge_vlan_mcast: Delete qdiscs during cleanup
The qdiscs are added during setup, but not deleted during cleanup,
resulting in the following error messages:

 # ./bridge_vlan_mcast.sh
 [...]
 # ./bridge_vlan_mcast.sh
 Error: Exclusivity flag on, cannot modify.
 Error: Exclusivity flag on, cannot modify.

Solve by deleting the qdiscs during cleanup.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 14:01:08 +01:00
David S. Miller
5cacb2c7c9 Merge branch 'dpaa-phylink'
Sean Anderson says:

====================
net: dpaa: Convert to phylink

This series converts the DPAA driver to phylink.

I have tried to maintain backwards compatibility with existing device
trees whereever possible. However, one area where I was unable to
achieve this was with QSGMII. Please refer to patch 2 for details.

All mac drivers have now been converted. I would greatly appreciate if
anyone has T-series or P-series boards they can test/debug this series
on. I only have an LS1046ARDB. Everything but QSGMII should work without
breakage; QSGMII needs patches 7 and 8. For this reason, the last 4
patches in this series should be applied together (and should not go
through separate trees).

Changes in v7:
- provide phylink_validate_mask_caps() helper
- Fix oops if memac_pcs_create returned -EPROBE_DEFER
- Fix using pcs-names instead of pcs-handle-names
- Fix not checking for -ENODATA when looking for sgmii pcs
- Fix 81-character line
- Simplify memac_validate with phylink_validate_mask_caps

Changes in v6:
- Remove unnecessary $ref from renesas,rzn1-a5psw
- Remove unnecessary type from pcs-handle-names
- Add maxItems to pcs-handle
- Fix 81-character line
- Fix uninitialized variable in dtsec_mac_config

Changes in v5:
- Add Lynx PCS binding

Changes in v4:
- Use pcs-handle-names instead of pcs-names, as discussed
- Don't fail if phy support was not compiled in
- Split off rate adaptation series
- Split off DPAA "preparation" series
- Split off Lynx 10G support
- t208x: Mark MAC1 and MAC2 as 10G
- Add XFI PCS for t208x MAC1/MAC2

Changes in v3:
- Expand pcs-handle to an array
- Add vendor prefix 'fsl,' to rgmii and mii properties.
- Set maxItems for pcs-names
- Remove phy-* properties from example because dt-schema complains and I
  can't be bothered to figure out how to make it work.
- Add pcs-handle as a preferred version of pcsphy-handle
- Deprecate pcsphy-handle
- Remove mii/rmii properties
- Put the PCS mdiodev only after we are done with it (since the PCS
  does not perform a get itself).
- Remove _return label from memac_initialization in favor of returning
  directly
- Fix grabbing the default PCS not checking for -ENODATA from
  of_property_match_string
- Set DTSEC_ECNTRL_R100M in dtsec_link_up instead of dtsec_mac_config
- Remove rmii/mii properties
- Replace 1000Base... with 1000BASE... to match IEEE capitalization
- Add compatibles for QSGMII PCSs
- Split arm and powerpcs dts updates

Changes in v2:
- Better document how we select which PCS to use in the default case
- Move PCS_LYNX dependency to fman Kconfig
- Remove unused variable slow_10g_if
- Restrict valid link modes based on the phy interface. This is easier
  to set up, and mostly captures what I intended to do the first time.
  We now have a custom validate which restricts half-duplex for some SoCs
  for RGMII, but generally just uses the default phylink validate.
- Configure the SerDes in enable/disable
- Properly implement all ethtool ops and ioctls. These were mostly
  stubbed out just enough to compile last time.
- Convert 10GEC and dTSEC as well
- Fix capitalization of mEMAC in commit messages
- Add nodes for QSGMII PCSs
- Add nodes for QSGMII PCSs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson
4e748b1bd7 arm64: dts: layerscape: Add nodes for QSGMII PCSs
Now that we actually read registers from QSGMII PCSs, it's important
that we have the correct address (instead of hoping that we're the MAC
with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII
PCSs.  The exact mapping of QSGMII to MACs depends on the SoC.

Since the first QSGMII PCSs share an address with the SGMII and XFI
PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts
on the bus.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson
4e31b808fa powerpc: dts: qoriq: Add nodes for QSGMII PCSs
Now that we actually read registers from QSGMII PCSs, it's important
that we have the correct address (instead of hoping that we're the MAC
with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII
PCSs. They have the same addresses on all SoCs (e.g. if QSGMIIA is
present it's used for MACs 1 through 4).

Since the first QSGMII PCSs share an address with the SGMII and XFI
PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts
on the bus.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson
36926a7d70 powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G
On the T208X SoCs, MAC1 and MAC2 support XGMII. Add some new MAC dtsi
fragments, and mark the QMAN ports as 10G.

Fixes: da414bb923 ("powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)")
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson
5d93cfcf73 net: dpaa: Convert to phylink
This converts DPAA to phylink. All macs are converted. This should work
with no device tree modifications (including those made in this series),
except for QSGMII (as noted previously).

The mEMAC configuration is one of the tricker areas. I have tried to
capture all the restrictions across the various models. Most of the time,
we assume that if the serdes supports a mode or the phy-interface-mode
specifies it, then we support it. The only place we can't do this is
(RG)MII, since there's no serdes. In that case, we rely on a (new)
devicetree property. There are also several cases where half-duplex is
broken. Unfortunately, only a single compatible is used for the MAC, so we
have to use the board compatible instead.

The 10GEC conversion is very straightforward, since it only supports XAUI.
There is generally nothing to configure.

The dTSEC conversion is broadly similar to mEMAC, but is simpler because we
don't support configuring the SerDes (though this can be easily added) and
we don't have multiple PCSs. From what I can tell, there's nothing
different in the driver or documentation between SGMII and 1000BASE-X
except for the advertising. Similarly, I couldn't find anything about
2500BASE-X. In both cases, I treat them like SGMII. These modes aren't used
by any in-tree boards. Similarly, despite being mentioned in the driver, I
couldn't find any documented SoCs which supported QSGMII.  I have left it
unimplemented for now.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson
a7c2a32e7f net: fman: memac: Use lynx pcs driver
Although not stated in the datasheet, as far as I can tell PCS for mEMACs
is a "Lynx." By reusing the existing driver, we can remove the PCS
management code from the memac driver. This requires calling some PCS
functions manually which phylink would usually do for us, but we will let
it do that soon.

One problem is that we don't actually have a PCS for QSGMII. We pretend
that each mEMAC's MDIO bus has four QSGMII PCSs, but this is not the case.
Only the "base" mEMAC's MDIO bus has the four QSGMII PCSs. This is not an
issue yet, because we never get the PCS state. However, it will be once the
conversion to phylink is complete, since the links will appear to never
come up. To get around this, we allow specifying multiple PCSs in pcsphy.
This breaks backwards compatibility with old device trees, but only for
QSGMII. IMO this is the only reasonable way to figure out what the actual
QSGMII PCS is.

Additionally, we now also support a separate XFI PCS. This can allow the
SerDes driver to set different addresses for the SGMII and XFI PCSs so they
can be accessed at the same time.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson
0fc83bd795 net: fman: memac: Add serdes support
This adds support for using a serdes which has to be configured. This is
primarly in preparation for phylink conversion, which will then change the
serdes mode dynamically.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Russell King (Oracle)
f392a18464 net: phylink: provide phylink_validate_mask_caps() helper
Provide a helper that restricts the link modes according to the
phylink capabilities.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[rebased on net-next/master and added documentation]
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson
045d05018a dt-bindings: net: fman: Add additional interface properties
At the moment, mEMACs are configured almost completely based on the
phy-connection-type. That is, if the phy interface is RGMII, it assumed
that RGMII is supported. For some interfaces, it is assumed that the
RCW/bootloader has set up the SerDes properly. This is generally OK, but
restricts runtime reconfiguration. The actual link state is never
reported.

To address these shortcomings, the driver will need additional
information. First, it needs to know how to access the PCS/PMAs (in
order to configure them and get the link status). The SGMII PCS/PMA is
the only currently-described PCS/PMA. Add the XFI and QSGMII PCS/PMAs as
well. The XFI (and 10GBASE-KR) PCS/PMA is a c45 "phy" which sits on the
same MDIO bus as SGMII PCS/PMA. By default they will have conflicting
addresses, but they are also not enabled at the same time by default.
Therefore, we can let the XFI PCS/PMA be the default when
phy-connection-type is xgmii. This will allow for
backwards-compatibility.

QSGMII, however, cannot work with the current binding. This is because
the QSGMII PCS/PMAs are only present on one MAC's MDIO bus. At the
moment this is worked around by having every MAC write to the PCS/PMA
addresses (without checking if they are present). This only works if
each MAC has the same configuration, and only if we don't need to know
the status. Because the QSGMII PCS/PMA will typically be located on a
different MDIO bus than the MAC's SGMII PCS/PMA, there is no fallback
for the QSGMII PCS/PMA.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson
00af103d06 dt-bindings: net: Add Lynx PCS binding
This binding is fairly bare-bones for now, since the Lynx driver doesn't
parse any properties (or match based on the compatible). We just need it
in order to prevent the PCS nodes from having phy devices attached to
them. This is not really a problem, but it is a bit inefficient.

This binding is really for three separate PCSs (SGMII, QSGMII, and XFI).
However, the driver treats all of them the same. This works because the
SGMII and XFI devices typically use the same address, and the SerDes
driver (or RCW) muxes between them. The QSGMII PCSs have the same
register layout as the SGMII PCSs. To do things properly, we'd probably
do something like

	ethernet-pcs@0 {
		#pcs-cells = <1>;
		compatible = "fsl,lynx-pcs";
		reg = <0>, <1>, <2>, <3>;
	};

but that would add complexity, and we can describe the hardware just
fine using separate PCSs for now.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson
76025ee53b dt-bindings: net: Expand pcs-handle to an array
This allows multiple phandles to be specified for pcs-handle, such as
when multiple PCSs are present for a single MAC. To differentiate
between them, also add a pcs-handle-names property.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
David S. Miller
88a2b3cbb7 Merge branch 'net-marvell-yaml'
Michał Grzelak says:

====================
net: further improvements to marvell,pp2.yaml

This patchset addresses problems with reg ranges and
additional $refs. It also limits phy-mode and aligns examples.

Best regards,
Michał

---
Changelog:
v4->v5
- drop '+' from all patternProperties
- restrict range of patternProperties to [0-2] in top level
- drop the $ref in patternProperties:'^...':properties:reg
- add patternProperties:'^...':properties:reg:maximum:2
- drop $ref in patternProperties:'^...':properties:phys
- add patternProperties:'^...':properties:phys:maxItems:1
- limit phy-mode to the subset found in dts files
- reflect the order of subnodes' properties in subnodes' required:
- restrict range of pattern to [0-2] in marvell,armada-7k-pp22 case
- restrict range of pattern to [0-1] in marvell,armada-375-pp2 case
- align to 4 spaces all examples:
- add specified maximum to allOf:if:then-else:properties:reg

v3->v4
- change commit message of first patch
- move allOf:$ref to patternProperties:'^...':$ref
- deprecate port-id in favour of reg
- move reg to front of properties list in patternProperties
- reflect the order of properties in required list in
  patternProperties
- add unevaluatedProperties: false to patternProperties
- change unevaluated- to additionalProperties at top level
- add property phys: to ports subnode
- extend example binding with additional information about phys and sfp
- hook phys property to phy-consumer.yaml schema

v2->v3
- move 'reg:description' to 'allOf:if:then'
- change '#size-cells: true' and '#address-cells: true'
  to '#size-cells: const: 0' and '#address-cells: const: 1'
- replace all occurences of pattern "^eth\{hex_num}*"
  with "^(ethernet-)?port@[0-9]+$"
- add description in 'patternProperties:^...'
- add 'patternProperties:^...:interrupt-names:minItems: 1'
- add 'patternProperties:^...:reg:description'
- update 'patternProperties:^...:port-id:description'
- add 'patternProperties:^...:required: - reg'
- update '*:description:' to uppercase
- add 'allOf:then:required:marvell,system-controller'
- skip quotation marks from 'allOf:$ref'
- add 'else' schema to match 'allOf:if:then'
- restrict 'clocks' in 'allOf:if:then'
- restrict 'clock-names' in 'allOf:if:then'
- add #address-cells=<1>; #size-cells=<0>; in 'examples:'
- change every "ethX" to "ethernet-port@X" in 'examples:'
- add "reg" and comment in all ports in 'examples:'
- change /ethernet/eth0/phy-mode in examples://Armada-375
  to "rgmii-id"
- replace each cpm_ with cp0_ in 'examples:'
- replace each _syscon0 with _clk0 in 'examples:'
- remove each eth0X label in 'examples:'
- update armada-375.dtsi and armada-cp11x.dtsi to match
  marvell,pp2.yaml

v1->v2
- move 'properties' to the front of the file
- remove blank line after 'properties'
- move 'compatible' to the front of 'properties'
- move 'clocks', 'clock-names' and 'reg' definitions to 'properties'
- substitute all occurences of 'marvell,armada-7k-pp2' with
  'marvell,armada-7k-pp22'
- add properties:#size-cells and properties:#address-cells
- specify list in 'interrupt-names'
- remove blank lines after 'patternProperties'
- remove '^interrupt' and '^#.*-cells$' patterns
- remove blank line after 'allOf'
- remove first 'if-then-else' block from 'allOf'
- negate the condition in allOf:if schema
- delete 'interrupt-controller' from section 'examples'
- delete '#interrupt-cells' from section 'examples'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 09:49:38 +01:00
Marcin Wojtas
844e44988f ARM: dts: armada-375: Update network description to match schema
Update the PP2 ethernet ports subnodes' names to match
schema enforced by the marvell,pp2.yaml contents.

Add new required properties ('reg') which contains information
about the port ID, keeping 'port-id' ones for backward
compatibility.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 09:49:38 +01:00
Marcin Wojtas
2994bf7705 arm64: dts: marvell: Update network description to match schema
Update the PP2 ethernet ports subnodes' names to match
schema enforced by the marvell,pp2.yaml contents.

Add new required properties ('reg') which contains information
about the port ID, keeping 'port-id' ones for backward
compatibility.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 09:49:38 +01:00
Michał Grzelak
c4d175c323 dt-bindings: net: marvell,pp2: convert to json-schema
Convert the marvell,pp2 bindings from text to proper schema.

Move 'marvell,system-controller' and 'dma-coherent' properties from
port up to the controller node, to match what is actually done in DT.

Rename all subnodes to match "^(ethernet-)?port@[0-2]$" and deprecate
port-id in favour of 'reg'.

Signed-off-by: Michał Grzelak <mig@semihalf.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 09:49:38 +01:00
Govindarajulu Varadarajan
e2ac2a00da enic: define constants for legacy interrupts offset
Use macro instead of function calls. These values are constant and will
not change.

Signed-off-by: Govindarajulu Varadarajan <govind.varadar@gmail.com>
Link: https://lore.kernel.org/r/20221018005804.188643-1-govind.varadar@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18 19:34:28 -07:00
Shenwei Wang
f3d27ae079 net: fec: remove the unused functions
Removed those unused functions since we simplified the driver
by using the page pool to manage RX buffers.

Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://lore.kernel.org/r/20221017161236.1563975-1-shenwei.wang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18 19:34:13 -07:00
Arnd Bergmann
a2fd08448f net: remove smc911x driver
This driver was used on Arm and SH machines until 2009, when the
last platforms moved to the smsc911x driver for the same hardware.

Time to retire this version.

Link: https://lore.kernel.org/netdev/1232010482-3744-1-git-send-email-steve.glendinning@smsc.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20221017121900.3520108-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18 19:33:51 -07:00
Jakub Kicinski
3566a79c9e bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY08JQQAKCRDbK58LschI
 g0M0AQCWGrJcnQFut1qwR9efZUadwxtKGAgpaA/8Smd8+v7c8AD/SeHQuGfkFiD6
 rx18hv1mTfG0HuPnFQy6YZQ98vmznwE=
 =DaeS
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2022-10-18

We've added 33 non-merge commits during the last 14 day(s) which contain
a total of 31 files changed, 874 insertions(+), 538 deletions(-).

The main changes are:

1) Add RCU grace period chaining to BPF to wait for the completion
   of access from both sleepable and non-sleepable BPF programs,
   from Hou Tao & Paul E. McKenney.

2) Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
   values. In the wild we have seen OS vendors doing buggy backports
   where helper call numbers mismatched. This is an attempt to make
   backports more foolproof, from Andrii Nakryiko.

3) Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions,
   from Roberto Sassu.

4) Fix libbpf's BTF dumper for structs with padding-only fields,
   from Eduard Zingerman.

5) Fix various libbpf bugs which have been found from fuzzing with
   malformed BPF object files, from Shung-Hsi Yu.

6) Clean up an unneeded check on existence of SSE2 in BPF x86-64 JIT,
   from Jie Meng.

7) Fix various ASAN bugs in both libbpf and selftests when running
   the BPF selftest suite on arm64, from Xu Kuohai.

8) Fix missing bpf_iter_vma_offset__destroy() call in BPF iter selftest
   and use in-skeleton link pointer to remove an explicit bpf_link__destroy(),
   from Jiri Olsa.

9) Fix BPF CI breakage by pointing to iptables-legacy instead of relying
   on symlinked iptables which got upgraded to iptables-nft,
   from Martin KaFai Lau.

10) Minor BPF selftest improvements all over the place, from various others.

* tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (33 commits)
  bpf/docs: Update README for most recent vmtest.sh
  bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
  bpf: Use rcu_trace_implies_rcu_gp() in local storage map
  bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
  rcu-tasks: Provide rcu_trace_implies_rcu_gp()
  selftests/bpf: Use sys_pidfd_open() helper when possible
  libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()
  libbpf: Deal with section with no data gracefully
  libbpf: Use elf_getshdrnum() instead of e_shnum
  selftest/bpf: Fix error usage of ASSERT_OK in xdp_adjust_tail.c
  selftests/bpf: Fix error failure of case test_xdp_adjust_tail_grow
  selftest/bpf: Fix memory leak in kprobe_multi_test
  selftests/bpf: Fix memory leak caused by not destroying skeleton
  libbpf: Fix memory leak in parse_usdt_arg()
  libbpf: Fix use-after-free in btf_dump_name_dups
  selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test
  selftests/bpf: Alphabetize DENYLISTs
  selftests/bpf: Add tests for _opts variants of bpf_*_get_fd_by_id()
  libbpf: Introduce bpf_link_get_fd_by_id_opts()
  libbpf: Introduce bpf_btf_get_fd_by_id_opts()
  ...
====================

Link: https://lore.kernel.org/r/20221018210631.11211-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18 18:56:43 -07:00
Daniel Müller
6c4e777fbb bpf/docs: Update README for most recent vmtest.sh
Since commit 40b09653b1 ("selftests/bpf: Adjust vmtest.sh to use local
kernel configuration") the vmtest.sh script no longer downloads a kernel
configuration but uses the local, in-repository one.
This change updates the README, which still mentions the old behavior.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221017232458.1272762-1-deso@posteo.net
2022-10-18 21:54:05 +02:00
Alexei Starovoitov
79d878f7ad Merge branch 'Remove unnecessary RCU grace period chaining'
Hou Tao says:

====================
Now bpf uses RCU grace period chaining to wait for the completion of
access from both sleepable and non-sleepable bpf program: calling
call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace
period, then in its callback calls call_rcu() or kfree_rcu() to wait for
a normal RCU grace period.

According to the implementation of RCU Tasks Trace, it inovkes
->postscan_func() to wait for one RCU-tasks-trace grace period and
rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
normal RCU grace period in turn, so one RCU-tasks-trace grace period
will imply one normal RCU grace period. To codify the implication,
introduces rcu_trace_implies_rcu_gp() in patch #1. And using it in patch
Other two uses of call_rcu_tasks_trace() are unchanged: for
__bpf_prog_put_rcu() there is no gp chain and for
__bpf_tramp_image_put_rcu_tasks() it chains RCU tasks trace GP and RCU
tasks GP.

An alternative way to remove these unnecessary RCU grace period
chainings is using the RCU polling API to check whether or not a normal
RCU grace period has passed (e.g. get_state_synchronize_rcu()). But it
needs an unsigned long space for each free element or each call, and
it is not affordable for local storage element, so as for now always
rcu_trace_implies_rcu_gp().

Comments are always welcome.

Change Log:

v2:
 * codify the implication of RCU Tasks Trace grace period instead of
   assuming for it

v1: https://lore.kernel.org/bpf/20221011071128.3470622-1-houtao@huaweicloud.com

Hou Tao (3):
  bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
  bpf: Use rcu_trace_implies_rcu_gp() in local storage map
  bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
====================

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-18 10:27:05 -07:00
Hou Tao
4835f9ee98 bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
To support both sleepable and normal uprobe bpf program, the freeing of
trace program array chains a RCU-tasks-trace grace period and a normal
RCU grace period one after the other.

With the introduction of rcu_trace_implies_rcu_gp(),
__bpf_prog_array_free_sleepable_cb() can check whether or not a normal
RCU grace period has also passed after a RCU-tasks-trace grace period
has passed. If it is true, it is safe to invoke kfree() directly.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20221014113946.965131-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-18 10:27:02 -07:00
Hou Tao
d39d1445d3 bpf: Use rcu_trace_implies_rcu_gp() in local storage map
Local storage map is accessible for both sleepable and non-sleepable bpf
program, and its memory is freed by using both call_rcu_tasks_trace() and
kfree_rcu() to wait for both RCU-tasks-trace grace period and RCU grace
period to pass.

With the introduction of rcu_trace_implies_rcu_gp(), both
bpf_selem_free_rcu() and bpf_local_storage_free_rcu() can check whether
or not a normal RCU grace period has also passed after a RCU-tasks-trace
grace period has passed. If it is true, it is safe to call kfree()
directly.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20221014113946.965131-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-18 10:27:02 -07:00
Hou Tao
59be91e5e7 bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
The memory free logic in bpf memory allocator chains a RCU Tasks Trace
grace period and a normal RCU grace period one after the other, so it
can ensure that both sleepable and non-sleepable programs have finished.

With the introduction of rcu_trace_implies_rcu_gp(),
__free_rcu_tasks_trace() can check whether or not a normal RCU grace
period has also passed after a RCU Tasks Trace grace period has passed.
If it is true, freeing these elements directly, else freeing through
call_rcu().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20221014113946.965131-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-18 10:27:02 -07:00
Paul E. McKenney
e6c86c513f rcu-tasks: Provide rcu_trace_implies_rcu_gp()
As an accident of implementation, an RCU Tasks Trace grace period also
acts as an RCU grace period.  However, this could change at any time.
This commit therefore creates an rcu_trace_implies_rcu_gp() that currently
returns true to codify this accident.  Code relying on this accident
must call this function to verify that this accident is still happening.

Reported-by: Hou Tao <houtao@huaweicloud.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Link: https://lore.kernel.org/r/20221014113946.965131-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-18 10:27:02 -07:00
Jiapeng Chong
f00909e2e6 net: ip6_gre: Remove the unused function ip6gre_tnl_addr_conflict()
The function ip6gre_tnl_addr_conflict() is defined in the ip6_gre.c file,
but not called elsewhere, so delete this unused function.

net/ipv6/ip6_gre.c:887:20: warning: unused function 'ip6gre_tnl_addr_conflict'.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2419
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221017093540.26806-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-18 11:57:17 +02:00
Hou Tao
62c69e89e8 selftests/bpf: Use sys_pidfd_open() helper when possible
SYS_pidfd_open may be undefined for old glibc, so using sys_pidfd_open()
helper defined in task_local_storage_helpers.h instead to fix potential
build failure.

And according to commit 7615d9e178 ("arch: wire-up pidfd_open()"), the
syscall number of pidfd_open is always 434 except for alpha architure,
so update the definition of __NR_pidfd_open accordingly.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221011071249.3471760-1-houtao@huaweicloud.com
2022-10-13 12:09:19 -07:00
Andrii Nakryiko
e94e0a2d37 Merge branch 'libbpf: fix fuzzer-reported issues'
Shung-Hsi Yu says:

====================

Hi, this patch set fixes several fuzzer-reported issues of libbpf when
dealing with (malformed) BPF object file:

- patch #1 fix out-of-bound heap write reported by oss-fuzz (currently
  incorrectly marked as fixed)

- patch #2 and #3 fix null-pointer dereference found by locally-run
  fuzzer.

v2:
- Rebase to bpf-next
- Move elf_getshdrnum() closer to where it's result is used in patch #1, as
  suggested by Andrii
  - Touch up the comment in bpf_object__elf_collect(), replacing mention of
    e_shnum with elf_getshdrnum()
- Minor wording change in commit message of patch #1 to for better readability
- Remove extra note that comes after commit message in patch #1

v1: https://lore.kernel.org/bpf/20221007174816.17536-1-shung-hsi.yu@suse.com/
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-10-13 10:53:35 -07:00
Andrii Nakryiko
6e73e683b6 Merge branch 'Fix bugs found by ASAN when running selftests'
Xu Kuohai says:

====================

From: Xu Kuohai <xukuohai@huawei.com>

This series fixes bugs found by ASAN when running bpf selftests on arm64.

v4:
- Address Andrii's suggestions

v3: https://lore.kernel.org/bpf/5311e154-c2d4-91a5-ccb8-f5adede579ed@huawei.com
- Fix error failure of case test_xdp_adjust_tail_grow exposed by this series

v2: https://lore.kernel.org/bpf/20221010070454.577433-1-xukuohai@huaweicloud.com
- Rebase and fix conflict

v1: https://lore.kernel.org/bpf/20221009131830.395569-1-xukuohai@huaweicloud.com
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-10-13 10:53:34 -07:00
Shung-Hsi Yu
d0d382f95a libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()
When there are no program sections, obj->programs is left unallocated,
and find_prog_by_sec_insn()'s search lands on &obj->programs[0] == NULL,
and will cause null-pointer dereference in the following access to
prog->sec_idx.

Guard the search with obj->nr_programs similar to what's being done in
__bpf_program__iter() to prevent null-pointer access from happening.

Fixes: db2b8b0642 ("libbpf: Support CO-RE relocations for multi-prog sections")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221012022353.7350-4-shung-hsi.yu@suse.com
2022-10-13 10:53:34 -07:00
Shung-Hsi Yu
35a855509e libbpf: Deal with section with no data gracefully
ELF section data pointer returned by libelf may be NULL (if section has
SHT_NOBITS), so null check section data pointer before attempting to
copy license and kversion section.

Fixes: cb1e5e9619 ("bpf tools: Collect version and license from ELF sections")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221012022353.7350-3-shung-hsi.yu@suse.com
2022-10-13 10:53:34 -07:00
Shung-Hsi Yu
51deedc9b8 libbpf: Use elf_getshdrnum() instead of e_shnum
This commit replace e_shnum with the elf_getshdrnum() helper to fix two
oss-fuzz-reported heap-buffer overflow in __bpf_object__open. Both
reports are incorrectly marked as fixed and while still being
reproducible in the latest libbpf.

  # clusterfuzz-testcase-minimized-bpf-object-fuzzer-5747922482888704
  libbpf: loading object 'fuzz-object' from buffer
  libbpf: sec_cnt is 0
  libbpf: elf: section(1) .data, size 0, link 538976288, flags 2020202020202020, type=2
  libbpf: elf: section(2) .data, size 32, link 538976288, flags 202020202020ff20, type=1
  =================================================================
  ==13==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000c0 at pc 0x0000005a7b46 bp 0x7ffd12214af0 sp 0x7ffd12214ae8
  WRITE of size 4 at 0x6020000000c0 thread T0
  SCARINESS: 46 (4-byte-write-heap-buffer-overflow-far-from-bounds)
      #0 0x5a7b45 in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3414:24
      #1 0x5733c0 in bpf_object_open /src/libbpf/src/libbpf.c:7223:16
      #2 0x5739fd in bpf_object__open_mem /src/libbpf/src/libbpf.c:7263:20
      ...

The issue lie in libbpf's direct use of e_shnum field in ELF header as
the section header count. Where as libelf implemented an extra logic
that, when e_shnum == 0 && e_shoff != 0, will use sh_size member of the
initial section header as the real section header count (part of ELF
spec to accommodate situation where section header counter is larger
than SHN_LORESERVE).

The above inconsistency lead to libbpf writing into a zero-entry calloc
area. So intead of using e_shnum directly, use the elf_getshdrnum()
helper provided by libelf to retrieve the section header counter into
sec_cnt.

Fixes: 0d6988e16a ("libbpf: Fix section counting logic")
Fixes: 25bbbd7a44 ("libbpf: Remove assumptions about uniqueness of .rodata/.data/.bss maps")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40868
Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40957
Link: https://lore.kernel.org/bpf/20221012022353.7350-2-shung-hsi.yu@suse.com
2022-10-13 10:53:34 -07:00
Xu Kuohai
cbc1c998da selftest/bpf: Fix error usage of ASSERT_OK in xdp_adjust_tail.c
xdp_adjust_tail.c calls ASSERT_OK() to check the return value of
bpf_prog_test_load(), but the condition is not correct. Fix it.

Fixes: 791cad0250 ("bpf: selftests: Get rid of CHECK macro in xdp_adjust_tail.c")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-7-xukuohai@huaweicloud.com
2022-10-13 10:53:30 -07:00