Commit graph

326 commits

Author SHA1 Message Date
Vladimir Oltean
a00da30c05 net: ethtool: fix __ethtool_dev_mm_supported() implementation
The MAC Merge layer is supported when ops->get_mm() returns 0.
The implementation was changed during review, and in this process, a bug
was introduced.

Link: https://lore.kernel.org/netdev/20230111161706.1465242-5-vladimir.oltean@nxp.com/
Fixes: 04692c9020 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Link: https://lore.kernel.org/all/20230220122343.1156614-2-vladimir.oltean@nxp.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21 09:05:01 -08:00
Bo Liu
7ec077744a ethtool: pse-pd: Fix double word in comments
Remove the repeated word "for" in comments.

Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20230221083036.2414-1-liubo03@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21 09:02:13 -08:00
Shannon Nelson
5b4e9a7a71 net: ethtool: extend ringparam set/get APIs for rx_push
Similar to what was done for TX_PUSH, add an RX_PUSH concept
to the ethtool interfaces.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13 11:05:12 +00:00
Vladimir Oltean
ca8e4cbff6 ethtool: mm: fix get_mm() return code not propagating to user space
If ops->get_mm() returns a non-zero error code, we goto out_complete,
but there, we return 0. Fix that to propagate the "ret" variable to the
caller. If ops->get_mm() succeeds, it will always return 0.

Fixes: 2b30f8291a ("net: ethtool: add support for MAC Merge layer")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230206094932.446379-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-07 15:39:06 +01:00
Jakub Kicinski
04007961bf ethtool: netlink: convert commands to common SET
Convert all SET commands where new common code is applicable.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27 12:24:32 +00:00
Jakub Kicinski
99132b6eb7 ethtool: netlink: handle SET intro/outro in the common code
Most ethtool SET callbacks follow the same general structure.

  ethnl_parse_header_dev_get()
  rtnl_lock()
  ethnl_ops_begin()

  ... do stuff ...

  ethtool_notify()
  ethnl_ops_complete()
  rtnl_unlock()
  ethnl_parse_header_dev_put()

This leads to a lot of copy / pasted code an bugs when people
mis-handle the error path.

Add a generic implementation of this pattern with a .set callback
in struct ethnl_request_ops called to "do stuff".

Also add an optional .set_validate which is called before
ethnl_ops_begin() -- a lot of implementations do basic request
capability / sanity checking at that point.

Because we want to avoid generating the notification when
no change happened - adopt a slightly hairy return values:
 - 0 means nothing to do (no notification)
 - 1 means done / continue
 - negative error codes on error

Reuse .hdr_attr from struct ethnl_request_ops, GET and SET
use the same attr spaces in all cases.

Convert pause as an example (and to avoid unused function warnings).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27 12:24:31 +00:00
Vladimir Oltean
f5be9caf7b net: ethtool: fix NULL pointer dereference in pause_prepare_data()
In the following call path:

ethnl_default_dumpit
-> ethnl_default_dump_one
   -> ctx->ops->prepare_data
      -> pause_prepare_data

struct genl_info *info will be passed as NULL, and pause_prepare_data()
dereferences it while getting the extended ack pointer.

To avoid that, just set the extack to NULL if "info" is NULL, since the
netlink extack handling messages know how to deal with that.

The pattern "info ? info->extack : NULL" is present in quite a few other
"prepare_data" implementations, so it's clear that it's a more general
problem to be dealt with at a higher level, but the code should have at
least adhered to the current conventions to avoid the NULL dereference.

Fixes: 04692c9020 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reported-by: syzbot+9d44aae2720fc40b8474@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 09:57:41 +00:00
Vladimir Oltean
c96de13632 net: ethtool: fix NULL pointer dereference in stats_prepare_data()
In the following call path:

ethnl_default_dumpit
-> ethnl_default_dump_one
   -> ctx->ops->prepare_data
      -> stats_prepare_data

struct genl_info *info will be passed as NULL, and stats_prepare_data()
dereferences it while getting the extended ack pointer.

To avoid that, just set the extack to NULL if "info" is NULL, since the
netlink extack handling messages know how to deal with that.

The pattern "info ? info->extack : NULL" is present in quite a few other
"prepare_data" implementations, so it's clear that it's a more general
problem to be dealt with at a higher level, but the code should have at
least adhered to the current conventions to avoid the NULL dereference.

Fixes: 04692c9020 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 09:56:31 +00:00
David S. Miller
dc0b98a175 ethtool: Add and use ethnl_update_bool.
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23 13:57:39 +00:00
Vladimir Oltean
449c545964 net: ethtool: add helpers for aggregate statistics
When a pMAC exists but the driver is unable to atomically query the
aggregate eMAC+pMAC statistics, the user should be given back at least
the sum of eMAC and pMAC counters queried separately.

This is a generic problem, so add helpers in ethtool to do this
operation, if the driver doesn't have a better way to report aggregate
stats. Do this in a way that does not require changes to these functions
when new stats are added (basically treat the structures as an array of
u64 values, except for the first element which is the stats source).

In include/linux/ethtool.h, there is already a section where helper
function prototypes should be placed. The trouble is, this section is
too early, before the definitions of struct ethtool_eth_mac_stats et.al.
Move that section at the end and append these new helpers to it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23 12:44:18 +00:00
Vladimir Oltean
04692c9020 net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)
IEEE 802.3-2018 clause 99 defines a MAC Merge sublayer which contains an
Express MAC and a Preemptible MAC. Both MACs are hidden to higher and
lower layers and visible as a single MAC (packet classification to eMAC
or pMAC on TX is done based on priority; classification on RX is done
based on SFD).

For devices which support a MAC Merge sublayer, it is desirable to
retrieve individual packet counters from the eMAC and the pMAC, as well
as aggregate statistics (their sum).

Introduce a new ETHTOOL_A_STATS_SRC attribute which is part of the
policy of ETHTOOL_MSG_STATS_GET and, and an ETHTOOL_A_PAUSE_STATS_SRC
which is part of the policy of ETHTOOL_MSG_PAUSE_GET (accepted when
ETHTOOL_FLAG_STATS is set in the common ethtool header). Both of these
take values from enum ethtool_mac_stats_src, defaulting to "aggregate"
in the absence of the attribute.

Existing drivers do not need to pay attention to this enum which was
added to all driver-facing structures, just the ones which report the
MAC merge layer as supported.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23 12:44:18 +00:00
Vladimir Oltean
2b30f8291a net: ethtool: add support for MAC Merge layer
The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2
specifications (the other being Frame Preemption; IEEE 802.1Q-2018
clause 6.7.2), which work together to minimize latency caused by frame
interference at TX. The overall goal of TSN is for normal traffic and
traffic with a bounded deadline to be able to cohabitate on the same L2
network and not bother each other too much.

The standards achieve this (partly) by introducing the concept of
preemptible traffic, i.e. Ethernet frames that have a custom value for
the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented
and reassembled at L2 on a link-local basis. The non-preemptible frames
are called express traffic, they are transmitted using a normal SFD, and
they can preempt preemptible frames, therefore having lower latency,
which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
frames around 9K). Preemption is not recursive, i.e. a P frame cannot
preempt another P frame. Preemption also does not depend upon priority,
or otherwise said, an E frame with prio 0 will still preempt a P frame
with prio 7.

In terms of implementation, the standards talk about the presence of an
express MAC (eMAC) which handles express traffic, and a preemptible MAC
(pMAC) which handles preemptible traffic, and these MACs are multiplexed
on the same MII by a MAC merge layer.

To support frame preemption, the definition of the SFD was generalized
to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an
Ethernet frame fragment, or a complete frame. Stations unaware of an SMD
value different from the standard SFD will treat P frames as error
frames. To prevent that from happening, a negotiation process is
defined.

On RX, packets are dispatched to the eMAC or pMAC after being filtered
by their SMD. On TX, the eMAC/pMAC classification decision is taken by
the 802.1Q spec, based on packet priority (each of the 8 user priority
values may have an admin-status of preemptible or express).

The MAC Merge layer and the Frame Preemption parameters have some degree
of independence in terms of how software stacks are supposed to deal
with them. The activation of the MM layer is supposed to be controlled
by an LLDP daemon (after it has been communicated that the link partner
also supports it), after which a (hardware-based or not) verification
handshake takes place, before actually enabling the feature. So the
process is intended to be relatively plug-and-play. Whereas FP settings
are supposed to be coordinated across a network using something
approximating NETCONF.

The support contained here is exclusively for the 802.3 (MAC Merge)
portions and not for the 802.1Q (Frame Preemption) parts. This API is
sufficient for an LLDP daemon to do its job. The FP adminStatus variable
from 802.1Q is outside the scope of an LLDP daemon.

I have taken a few creative licenses and augmented the Linux kernel UAPI
compared to the standard managed objects recommended by IEEE 802.3.
These are:

- ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive
  Processing state diagram, a MAC Merge layer is always supposed to be
  able to receive P frames. However, this implies keeping the pMAC
  powered on, which will consume needless power in applications where FP
  will never be used. If LLDP is used, the reception of an Additional
  Ethernet Capabilities TLV from the link partner is sufficient
  indication that the pMAC should be enabled. So my proposal is that in
  Linux, we keep the pMAC turned off by default and that user space
  turns it on when needed.

- ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called
  aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in
  the boolean netlink attributes offered, so this is also positive here.
  Other than the meaning being reversed, they correspond to the same
  thing.

- ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP
  daemon to maximize the verifyTime variable (delay between SMD-V
  transmissions), to maximize its chances that the LP replies. IEEE says
  that the verifyTime can range between 1 and 128 ms, but the NXP ENETC
  stupidly keeps this variable in a 7 bit register, so the maximum
  supported value is 127 ms. I could have chosen to hardcode this in the
  LLDP daemon to a lower value, but why not let the kernel expose its
  supported range directly.

- ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called
  aMACMergeAddFragSize, and expresses the "additional" fragment size
  (on top of ETH_ZLEN), whereas this expresses the absolute value of the
  fragment size.

- ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed
  object mandated by the standard, but user space clearly needs to know
  what is the minimum supported fragment size of our local receiver,
  since LLDP must advertise a value no lower than that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23 12:44:18 +00:00
Jakub Kicinski
b3c588cd55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ipa/ipa_interrupt.c
drivers/net/ipa/ipa_interrupt.h
  9ec9b2a308 ("net: ipa: disable ipa interrupt during suspend")
  8e461e1f09 ("net: ipa: introduce ipa_interrupt_enable()")
  d50ed35587 ("net: ipa: enable IPA interrupt handlers separate from registration")
https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/
https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20 12:28:23 -08:00
Piergiorgio Beruto
28dbf774bc plca.c: fix obvious mistake in checking retval
Revert a wrong fix that was done during the review process. The
intention was to substitute "if(ret < 0)" with "if(ret)".
Unfortunately, the intended fix did not meet the code.
Besides, after additional review, it was decided that "if(ret < 0)"
was actually the right thing to do.

Fixes: 8580e16c28 ("net/ethtool: add netlink interface for the PLCA RS")
Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f2277af8951a51cfee2fb905af8d7a812b7beaf4.1673616357.git.piergiorgio.beruto@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13 21:42:53 -08:00
Daniele Palmas
31de284239 ethtool: add tx aggregation parameters
Add the following ethtool tx aggregation parameters:

ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES
Maximum size in bytes of a tx aggregated block of frames.

ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES
Maximum number of frames that can be aggregated into a block.

ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS
Time in usecs after the first packet arrival in an aggregated
block for the block to be sent.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13 10:23:52 +00:00
Sudheer Mogilappagari
ea22f4319c ethtool: add netlink attr in rss get reply only if value is not null
Current code for RSS_GET ethtool command includes netlink attributes
in reply message to user space even if they are null. Added checks
to include netlink attribute in reply message only if a value is
received from driver. Drivers might return null for RSS indirection
table or hash key. Instead of including attributes with empty value
in the reply message, add netlink attribute only if there is content.

Fixes: 7112a04664 ("ethtool: add netlink based get rss support")
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/20230111235607.85509-1-sudheer.mogilappagari@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-12 21:52:46 -08:00
Piergiorgio Beruto
16178c8ef5 drivers/net/phy: add the link modes for the 10BASE-T1S Ethernet PHY
This patch adds the link modes for the IEEE 802.3cg Clause 147 10BASE-T1S
Ethernet PHY. According to the specifications, the 10BASE-T1S supports
Point-To-Point Full-Duplex, Point-To-Point Half-Duplex and/or
Point-To-Multipoint (AKA Multi-Drop) Half-Duplex operations.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11 08:35:02 +00:00
Piergiorgio Beruto
8580e16c28 net/ethtool: add netlink interface for the PLCA RS
Add support for configuring the PLCA Reconciliation Sublayer on
multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g.,
10BASE-T1S). This patch adds the appropriate netlink interface
to ethtool.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11 08:35:02 +00:00
Daniil Tatianin
201ed315f9 net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers
So that it's easier to follow and make sense of the branching and
various conditions.

Stats retrieval has been split into two separate functions
ethtool_get_phy_stats_phydev & ethtool_get_phy_stats_ethtool.
The former attempts to retrieve the stats using phydev & phy_ops, while
the latter uses ethtool_ops.

Actual n_stats validation & array allocation has been moved into a new
ethtool_vzalloc_stats_array helper.

This also fixes a potential NULL dereference of
ops->get_ethtool_phy_stats where it was getting called in an else branch
unconditionally without making sure it was actually present.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:55:24 +00:00
Daniil Tatianin
fd4778581d net/ethtool/ioctl: remove if n_stats checks from ethtool_get_phy_stats
Now that we always early return if we don't have any stats we can remove
these checks as they're no longer necessary.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:55:24 +00:00
Daniil Tatianin
9deb1e9fb8 net/ethtool/ioctl: return -EOPNOTSUPP if we have no phy stats
It's not very useful to copy back an empty ethtool_stats struct and
return 0 if we didn't actually have any stats. This also allows for
further simplification of this function in the future commits.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28 11:55:24 +00:00
Willem de Bruijn
b534dc46c8 net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP
Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
write_seq sockets instead of snd_una.

This should have been the behavior from the start. Because processes
may now exist that rely on the established behavior, do not change
behavior of the existing option, but add the right behavior with a new
flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on
stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID.

Intuitively the contract is that the counter is zero after the
setsockopt, so that the next write N results in a notification for
the last byte N - 1.

On idle sockets snd_una == write_seq and this holds for both. But on
sockets with data in transmission, snd_una records the unacked offset
in the stream. This depends on the ACK response from the peer. A
process cannot learn this in a race free manner (ioctl SIOCOUTQ is one
racy approach).

write_seq records the offset at the last byte written by the process.
This is a better starting point. It matches the intuitive contract in
all circumstances, unaffected by external behavior.

The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
Move the field in struct sock to avoid growing the socket (for some
common CONFIG variants). The UAPI interface so_timestamping.flags is
already int, so 32 bits wide.

Reported-by: Sotirios Delimanolis <sotodel@meta.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20221207143701.29861-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 19:49:21 -08:00
Sudheer Mogilappagari
7112a04664 ethtool: add netlink based get rss support
Add netlink based support for "ethtool -x <dev> [context x]"
command by implementing ETHTOOL_MSG_RSS_GET netlink message.
This is equivalent to functionality provided via ETHTOOL_GRSSH
in ioctl path. It sends RSS table, hash key and hash function
of an interface to user space.

This patch implements existing functionality available
in ioctl path and enables addition of new RSS context
based parameters in future.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Link: https://lore.kernel.org/r/20221202002555.241580-1-sudheer.mogilappagari@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-05 17:25:00 -08:00
Maxim Korotkov
64a8f8f712 ethtool: avoiding integer overflow in ethtool_phys_id()
The value of an arithmetic expression "n * id.data" is subject
to possible overflow due to a failure to cast operands to a larger data
type before performing arithmetic. Used macro for multiplication instead
operator for avoiding overflow.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Maxim Korotkov <korotkov.maxim.s@gmail.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221122122901.22294-1-korotkov.maxim.s@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23 20:03:51 -08:00
Vincent Mailhol
edaf5df22c ethtool: ethtool_get_drvinfo: populate drvinfo fields even if callback exits
If ethtool_ops::get_drvinfo() callback isn't set,
ethtool_get_drvinfo() will fill the ethtool_drvinfo::name and
ethtool_drvinfo::bus_info fields.

However, if the driver provides the callback function, those two
fields are not touched. This means that the driver has to fill these
itself.

Allow the driver to leave those two fields empty and populate them in
such case. This way, the driver can rely on the default values for the
name and the bus_info. If the driver provides values, do nothing.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20221108035754.2143-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10 09:01:58 -08:00
Gal Pressman
47f3ecf476 ethtool: Fail number of channels change when it conflicts with rxnfc
Similar to what we do with the hash indirection table [1], when network
flow classification rules are forwarding traffic to channels greater
than the requested number of channels, fail the operation.
Without this, traffic could be directed to channels which no longer
exist (dropped) after changing number of channels.

[1] commit d4ab428627 ("ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}")

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20221106123127.522985-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-08 12:08:44 +01:00
Jakub Kicinski
9a0f830f80 ethtool: linkstate: add a statistic for PHY down events
The previous attempt to augment carrier_down (see Link)
was not met with much enthusiasm so let's do the simple
thing of exposing what some devices already maintain.
Add a common ethtool statistic for link going down.
Currently users have to maintain per-driver mapping
to extract the right stat from the vendor-specific ethtool -S
stats. carrier_down does not fit the bill because it counts
a lot of software related false positives.

Add the statistic to the extended link state API to steer
vendors towards implementing all of it.

Implement for bnxt and all Linux-controlled PHYs. mlx5 and (possibly)
enic also have a counter for this but I leave the implementation
to their maintainers.

Link: https://lore.kernel.org/r/20220520004500.2250674-1-kuba@kernel.org
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20221104190125.684910-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-08 10:36:54 +01:00
Jiri Pirko
8eba37f7e9 net: devlink: use devlink_port pointer instead of ndo_get_devlink_port
Use newly introduced devlink_port pointer instead of getting it calling
to ndo_get_devlink_port op.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03 20:48:36 -07:00
Jakub Kicinski
31f1aa4f74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
  2871edb32f ("can: kvaser_usb: Fix possible completions during init_completion")
  abb8670938 ("can: kvaser_usb_leaf: Ignore stale bus-off after start")
  8d21f5927a ("can: kvaser_usb_leaf: Fix improved state not being reported")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 16:56:36 -07:00
Xin Long
9d9effca9d ethtool: eeprom: fix null-deref on genl_info in dump
The similar fix as commit 46cdedf2a0 ("ethtool: pse-pd: fix null-deref on
genl_info in dump") is also needed for ethtool eeprom.

Fixes: c781ff12a2 ("ethtool: Allow network drivers to dump arbitrary EEPROM data")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/5575919a2efc74cd9ad64021880afc3805c54166.1666362167.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-24 19:08:07 -07:00
Jakub Kicinski
96917bb3a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/linux/net.h
  a5ef058dc4 ("net: introduce and use custom sockopt socket flag")
  e993ffe3da ("net: flag sockets supporting msghdr originated zerocopy")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-24 13:44:11 -07:00
Amit Cohen
404c76783f ethtool: Add support for 800Gbps link modes
Add support for 800Gbps speed, link modes of 100Gbps per lane.
As mentioned in slide 21 in IEEE documentation [1], all adopted 802.3df
copper and optical PMDs baselines using 100G/lane will be supported.

Add the relevant PMDs which are mentioned in slide 5 in IEEE
documentation [1] and were approved on 10-2022 [2]:
BP - KR8
Cu Cable - CR8
MMF 50m - VR8
MMF 100m - SR8
SMF 500m - DR8
SMF 2km - DR8-2

[1]: https://www.ieee802.org/3/df/public/22_10/22_1004/shrikhande_3df_01a_221004.pdf
[2]: https://ieee802.org/3/df/KeyMotions_3df_221005.pdf

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:43:39 +01:00
Jakub Kicinski
46cdedf2a0 ethtool: pse-pd: fix null-deref on genl_info in dump
ethnl_default_dump_one() passes NULL as info.

It's correct not to set extack during dump, as we should just
silently skip interfaces which can't provide the information.

Reported-by: syzbot+81c4b4bbba6eea2cfcae@syzkaller.appspotmail.com
Fixes: 18ff0bcda6 ("ethtool: add interface to interact with Ethernet Power Equipment")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-21 13:18:05 +01:00
Oleksij Rempel
18ff0bcda6 ethtool: add interface to interact with Ethernet Power Equipment
Add interface to support Power Sourcing Equipment. At current step it
provides generic way to address all variants of PSE devices as defined
in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4
PoDL Power Sourcing Equipment (PSE).

Currently supported and mandatory objects are:
IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus
IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState
IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl

This is minimal interface needed to control PSE on each separate
ethernet port but it provides not all mandatory objects specified in
IEEE 802.3-2018.

Since "PoDL PSE" and "PSE" have similar names, but some different values
I decide to not merge them and keep separate naming schema. This should
allow as to be as close to IEEE 802.3 spec as possible and avoid name
conflicts in the future.

This implementation is connected to PHYs instead of MACs because PSE
auto classification can potentially interfere with PHY auto negotiation.
So, may be some extra PHY related initialization will be needed.

With WIP version of ethtools interaction with PSE capable link looks
as following:

$ ip l
...
5: t1l1@eth0: <BROADCAST,MULTICAST> ..
...

$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: disabled
PoDL PSE Power Detection Status: disabled

$ ethtool --set-pse t1l1 podl-pse-admin-control enable
$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: enabled
PoDL PSE Power Detection Status: delivering power

Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03 17:33:57 -07:00
Sean Anderson
0c3e10cb44 net: phy: Add support for rate matching
This adds support for rate matching (also known as rate adaptation) to
the phy subsystem. The general idea is that the phy interface runs at
one speed, and the MAC throttles the rate at which it sends packets to
the link speed. There's a good overview of several techniques for
achieving this at [1]. This patch adds support for three: pause-frame
based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and
2BASE-TL), and open-loop-based (such as in 10GBASE-W).

This patch makes a few assumptions and a few non assumptions about the
types of rate matching available. First, it assumes that different phys
may use different forms of rate matching. Second, it assumes that phys
can use rate matching for any of their supported link speeds (e.g. if a
phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T).
Third, it does not assume that all interface modes will use the same
form of rate matching. Fourth, it does not assume that all phy devices
will support rate matching (even if some do). Relaxing or strengthening
these (non-)assumptions could result in a different API. For example, if
all interface modes were assumed to use the same form of rate matching,
then a bitmask of interface modes supportting rate matching would
suffice.

For some better visibility into the process, the current rate matching
mode is exposed as part of the ethtool ksettings. For the moment, only
read access is supported. I'm not sure what userspace might want to
configure yet (disable it altogether, disable just one mode, specify the
mode to use, etc.). For the moment, since only pause-based rate
adaptation support is added in the next few commits, rate matching can
be disabled altogether by adjusting the advertisement.

802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and
"rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls
this feature "rate adaptation". I chose "rate matching" because it is
shorter, and because Russell doesn't think "adaptation" is correct in this
context.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:55:35 +01:00
Li Zhong
05cd823863 ethtool: tunnels: check the return value of nla_nest_start()
Check the return value of nla_nest_start(). When starting the entry
level nested attributes, if the tailroom of socket buffer is
insufficient to store the attribute header and payload, the return value
will be NULL.

There is, however, no real bug here since if the skb is full
nla_put_be16() will fail as well and we'll error out.

Signed-off-by: Li Zhong <floridsleeves@gmail.com>
Link: https://lore.kernel.org/r/20220921181716.1629541-1-floridsleeves@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 19:28:10 -07:00
Jakub Kicinski
4f5059e629 ethtool: report missing header via ext_ack in the default handler
The actual presence check for the header is in
ethnl_parse_header_dev_get() but it's a few layers in,
and already has a ton of arguments so let's just pick
the low hanging fruit and check for missing header in
the default request handler.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-30 12:20:43 +02:00
Jakub Kicinski
08d1d0e784 ethtool: strset: report missing ETHTOOL_A_STRINGSET_ID via ext_ack
Strset needs ETHTOOL_A_STRINGSET_ID, use it as an example of
reporting attrs missing in nests.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-30 12:20:43 +02:00
Jakub Kicinski
9c5d03d362 genetlink: start to validate reserved header bytes
We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.

One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.

To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-29 12:47:15 +01:00
Wolfram Sang
a71af8902b ethtool: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20220818210218.8443-1-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-22 18:06:55 -07:00
William Dean
aa246499bb net: delete extra space and tab in blank line
delete extra space and tab in blank line, there is no functional change.

Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: William Dean <williamsukatube@gmail.com>
Link: https://lore.kernel.org/r/20220723073222.2961602-1-williamsukatube@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25 19:38:31 -07:00
Jakub Kicinski
93817be8b6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-23 12:33:24 -07:00
Ivan Vecera
a3bb7b6381 ethtool: Fix get module eeprom fallback
Function fallback_set_params() checks if the module type returned
by a driver is ETH_MODULE_SFF_8079 and in this case it assumes
that buffer returns a concatenated content of page  A0h and A2h.
The check is wrong because the correct type is ETH_MODULE_SFF_8472.

Fixes: 96d971e307 ("ethtool: Add fallback to get_module_eeprom from netlink command")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220616160856.3623273-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17 20:22:16 -07:00
Marco Bonelli
19d62f5eea ethtool: Fix and simplify ethtool_convert_link_mode_to_legacy_u32()
Fix the implementation of ethtool_convert_link_mode_to_legacy_u32(), which
is supposed to return false if src has bits higher than 31 set. The current
implementation uses the complement of bitmap_fill(ext, 32) to test high
bits of src, which is wrong as bitmap_fill() fills _with long granularity_,
and sizeof(long) can be > 4. No users of this function currently check the
return value, so the bug was dormant.

Also remove the check for __ETHTOOL_LINK_MODE_MASK_NBITS > 32, as the enum
ethtool_link_mode_bit_indices contains far beyond 32 values. Using
find_next_bit() to test the src bitmask works regardless of this anyway.

Signed-off-by: Marco Bonelli <marco@mebeim.net>
Link: https://lore.kernel.org/r/20220609134900.11201-1-marco@mebeim.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-13 23:11:35 -07:00
Jakub Kicinski
d62607c3fe net: rename reference+tracking helpers
Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.

Rename:
 dev_hold_track()    -> netdev_hold()
 dev_put_track()     -> netdev_put()
 dev_replace_track() -> netdev_ref_replace()

Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:52:55 -07:00
Alexandru Tachici
3254e0b9eb ethtool: Add 10base-T1L link mode entry
Add entry for the 10base-T1L full duplex mode.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-01 17:45:35 +01:00
Jie Wang
bde292c07b net: ethtool: move checks before rtnl_lock() in ethnl_set_rings
Currently these two checks in ethnl_set_rings are added after rtnl_lock()
which will do useless works if the request is invalid.

So this patch moves these checks before the rtnl_lock() to avoid these
costs.

Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15 11:41:45 -07:00
Jie Wang
4dc84c06a3 net: ethtool: extend ringparam set/get APIs for tx_push
Currently tx push is a standard driver feature which controls use of a fast
path descriptor push. So this patch extends the ringparam APIs and data
structures to support set/get tx push by ethtool -G/g.

Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15 11:41:35 -07:00
Subbaraya Sundeep
1241e329ce ethtool: add support to set/get completion queue event size
Add support to set completion queue event size via ethtool -G
parameter and get it via ethtool -g parameter.

~ # ./ethtool -G eth0 cqe-size 512
~ # ./ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:             1048576
RX Mini:        n/a
RX Jumbo:       n/a
TX:             1048576
Current hardware settings:
RX:             256
RX Mini:        n/a
RX Jumbo:       n/a
TX:             4096
RX Buf Len:             2048
CQE Size:                128

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 20:33:05 -08:00
Jakub Kicinski
9690ae6042 ethtool: add header/data split indication
For applications running on a mix of platforms it's useful
to have a clear indication whether host's NIC supports the
geometry requirements of TCP zero-copy. TCP zero-copy Rx
requires data to be neatly placed into memory pages.
Most NICs can't do that.

This patch is adding GET support only, since the NICs
I work with either always have the feature enabled or
enable it whenever MTU is set to jumbo. In other words
I don't need SET. But adding set should be trivial.
(The only note on SET is that we will likely want
the setting to be "sticky" and use 0 / `unknown`
to reset it back to driver default.)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-28 14:43:47 +00:00
Tom Rix
ccd21ec5b8 ethtool: use phydev variable
In ethtool_get_phy_stats(), the phydev varaible is set to
dev->phydev but dev->phydev is still used.  Replace
dev->phydev uses with phydev.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-06 12:33:35 +00:00
Eric Dumazet
2d6ec25539 netlink: do not allocate a device refcount tracker in ethnl_default_notify()
As reported by Johannes, the tracker allocated in
ethnl_default_notify() is not really needed, as this
function is not expected to change a device reference count.

Fixes: e4b8954074 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/20220105170849.2610470-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05 09:50:06 -08:00
David S. Miller
e63a023489 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31 14:35:40 +00:00
luo penghao
c09f103e89 ethtool: Remove redundant ret assignments
The assignment here will be overwritten, so it should be deleted

The clang_analyzer complains as follows:

net/ethtool/netlink.c:

Value stored to 'ret' is never read

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-30 13:29:14 +00:00
Jakub Kicinski
b6459415b3 net: Don't include filter.h from net/sock.h
sock.h is pretty heavily used (5k objects rebuilt on x86 after
it's touched). We can drop the include of filter.h from it and
add a forward declaration of struct sk_filter instead.
This decreases the number of rebuilt objects when bpf.h
is touched from ~5k to ~1k.

There's a lot of missing includes this was masking. Primarily
in networking tho, this time.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-29 08:48:14 -08:00
Jakub Kicinski
3bc14ea0d1 ethtool: always write dev in ethnl_parse_header_dev_get
Commit 0976b888a1 ("ethtool: fix null-ptr-deref on ref tracker")
made the write to req_info.dev conditional, but as Eric points out
in a different follow up the structure is often allocated on the
stack and not kzalloc()'d so seems safer to always write the dev,
in case it's garbage on input.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:09:24 +00:00
Eric Dumazet
34ac17ecbf ethtool: use ethnl_parse_header_dev_put()
It seems I missed that most ethnl_parse_header_dev_get() callers
declare an on-stack struct ethnl_req_info, and that they simply call
dev_put(req_info.dev) when about to return.

Add ethnl_parse_header_dev_put() helper to properly untrack
reference taken by ethnl_parse_header_dev_get().

Fixes: e4b8954074 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 10:27:47 +00:00
Jakub Kicinski
0976b888a1 ethtool: fix null-ptr-deref on ref tracker
dev can be a NULL here, not all requests set require_dev.

Fixes: e4b8954074 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14 12:35:56 +00:00
Jakub Kicinski
3150a73366 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 13:23:02 -08:00
Eric Dumazet
e4b8954074 netlink: add net device refcount tracker to struct ethnl_req_info
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:59 -08:00
Antoine Tenart
dde91ccfa2 ethtool: do not perform operations on net devices being unregistered
There is a short period between a net device starts to be unregistered
and when it is actually gone. In that time frame ethtool operations
could still be performed, which might end up in unwanted or undefined
behaviours[1].

Do not allow ethtool operations after a net device starts its
unregistration. This patch targets the netlink part as the ioctl one
isn't affected: the reference to the net device is taken and the
operation is executed within an rtnl lock section and the net device
won't be found after unregister.

[1] For example adding Tx queues after unregister ends up in NULL
    pointer exceptions and UaFs, such as:

      BUG: KASAN: use-after-free in kobject_get+0x14/0x90
      Read of size 1 at addr ffff88801961248c by task ethtool/755

      CPU: 0 PID: 755 Comm: ethtool Not tainted 5.15.0-rc6+ #778
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/014
      Call Trace:
       dump_stack_lvl+0x57/0x72
       print_address_description.constprop.0+0x1f/0x140
       kasan_report.cold+0x7f/0x11b
       kobject_get+0x14/0x90
       kobject_add_internal+0x3d1/0x450
       kobject_init_and_add+0xba/0xf0
       netdev_queue_update_kobjects+0xcf/0x200
       netif_set_real_num_tx_queues+0xb4/0x310
       veth_set_channels+0x1c3/0x550
       ethnl_set_channels+0x524/0x610

Fixes: 041b1c5d4a ("ethtool: helper functions for netlink interface")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/20211203101318.435618-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:53:32 -08:00
Eric Dumazet
5ae2195088 net: add net device refcount tracker to ethtool_phys_id()
This helper might hold a netdev reference for a long time,
lets add reference tracking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:05:10 -08:00
Christophe JAILLET
72a2ff567f ethtool: netlink: Slightly simplify 'ethnl_features_to_bitmap()'
The 'dest' bitmap is fully initialized by the 'for' loop, so there is no
need to explicitly reset it.

This also makes this function in line with 'ethnl_features_to_bitmap32()'
which does not clear the destination before writing it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/17fca158231c6f03689bd891254f0dd1f4e84cb8.1638091829.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29 20:16:46 -08:00
Tonghao Zhang
bde3b0fd80 net: ethtool: set a default driver name
The netdev (e.g. ifb, bareudp), which not support ethtool ops
(e.g. .get_drvinfo), we can use the rtnl kind as a default name.

ifb netdev may be created by others prefix, not ifbX.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Hao Chen <chenhao288@hisilicon.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Danielle Ratson <danieller@nvidia.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20211125163049.84970-1-xiangxia.m.yue@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26 16:44:53 -08:00
Jakub Kicinski
93d5404e89 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ipa/ipa_main.c
  8afc7e471a ("net: ipa: separate disabling setup from modem stop")
  76b5fbcd6b ("net: ipa: kill ipa_modem_init()")

Duplicated include, drop one.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26 13:45:19 -08:00
Julian Wiedmann
0276af2176 ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
ethtool_set_coalesce() now uses both the .get_coalesce() and
.set_coalesce() callbacks. But the check for their availability is
buggy, so changing the coalesce settings on a device where the driver
provides only _one_ of the callbacks results in a NULL pointer
dereference instead of an -EOPNOTSUPP.

Fix the condition so that the availability of both callbacks is
ensured. This also matches the netlink code.

Note that reproducing this requires some effort - it only affects the
legacy ioctl path, and needs a specific combination of driver options:
- have .get_coalesce() and .coalesce_supported but no
 .set_coalesce(), or
- have .set_coalesce() but no .get_coalesce(). Here eg. ethtool doesn't
  cause the crash as it first attempts to call ethtool_get_coalesce()
  and bails out on error.

Fixes: f3ccfda193 ("ethtool: extend coalesce setting uAPI with CQE mode")
Cc: Yufeng Mo <moyufeng@huawei.com>
Cc: Huazhong Tan <tanhuazhong@huawei.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Link: https://lore.kernel.org/r/20211126175543.28000-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26 11:17:47 -08:00
Hao Chen
7462494408 ethtool: extend ringparam setting/getting API with rx_buf_len
Add two new parameters kernel_ringparam and extack for
.get_ringparam and .set_ringparam to extend more ring params
through netlink.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 12:31:49 +00:00
Hao Chen
0b70c256eb ethtool: add support to set/get rx buf len via ethtool
Add support to set rx buf len via ethtool -G parameter and get
rx buf len via ethtool -g parameter.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 12:31:48 +00:00
Hao Chen
448f413a8b ethtool: add support to set/get tx copybreak buf size via ethtool
Add support for ethtool to set/get tx copybreak buf size.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 12:31:47 +00:00
Kees Cook
812ad3d270 ethtool: stats: Use struct_group() to clear all stats at once
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add struct_group() to mark region of struct stats_reply_data that should
be initialized, which can now be done in a single memset() call.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-19 11:53:02 +00:00
Jakub Kicinski
1aabe578dd ethtool: fix ethtool msg len calculation for pause stats
ETHTOOL_A_PAUSE_STAT_MAX is the MAX attribute id,
so we need to subtract non-stats and add one to
get a count (IOW -2+1 == -1).

Otherwise we'll see:

  ethnl cmd 21: calculated reply length 40, but consumed 52

Fixes: 9a27a33027 ("ethtool: add standard pause stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-03 11:20:45 +00:00
Jakub Kicinski
1af0a0948e ethtool: don't drop the rtnl_lock half way thru the ioctl
devlink compat code needs to drop rtnl_lock to take
devlink->lock to ensure correct lock ordering.

This is problematic because we're not strictly guaranteed
that the netdev will not disappear after we re-lock.
It may open a possibility of nested ->begin / ->complete
calls.

Instead of calling into devlink under rtnl_lock take
a ref on the devlink instance and make the call after
we've dropped rtnl_lock.

We (continue to) assume that netdevs have an implicit
reference on the devlink returned from ndo_get_devlink_port

Note that ndo_get_devlink_port will now get called
under rtnl_lock. That should be fine since none of
the drivers seem to be taking serious locks inside
ndo_get_devlink_port.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:26:07 +00:00
Jakub Kicinski
095cfcfe13 ethtool: handle info/flash data copying outside rtnl_lock
We need to increase the lifetime of the data for .get_info
and .flash_update beyond their handlers inside rtnl_lock.

Allocate a union on the heap and use it instead.

Note that we now copy the ethcmd before we lookup dev,
hopefully there is no crazy user space depending on error
codes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:26:07 +00:00
Jakub Kicinski
f49deaa64a ethtool: push the rtnl_lock into dev_ethtool()
Don't take the lock in net/core/dev_ioctl.c,
we'll have things to do outside rtnl_lock soon.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:26:07 +00:00
Sean Anderson
4973056cce net: convert users of bitmap_foo() to linkmode_foo()
This converts instances of
	bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS)
to
	linkmode_foo(args...)

I manually fixed up some lines to prevent them from being excessively
long. Otherwise, this change was generated with the following semantic
patch:

// Generated with
// echo linux/linkmode.h > includes
// git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \
// | sort | uniq | tee new_includes | wc -l && mv new_includes includes
// and repeating until the number stopped going up
@i@
@@

(
 #include <linux/acpi_mdio.h>
|
 #include <linux/brcmphy.h>
|
 #include <linux/dsa/loop.h>
|
 #include <linux/dsa/sja1105.h>
|
 #include <linux/ethtool.h>
|
 #include <linux/ethtool_netlink.h>
|
 #include <linux/fec.h>
|
 #include <linux/fs_enet_pd.h>
|
 #include <linux/fsl/enetc_mdio.h>
|
 #include <linux/fwnode_mdio.h>
|
 #include <linux/linkmode.h>
|
 #include <linux/lsm_audit.h>
|
 #include <linux/mdio-bitbang.h>
|
 #include <linux/mdio.h>
|
 #include <linux/mdio-mux.h>
|
 #include <linux/mii.h>
|
 #include <linux/mii_timestamper.h>
|
 #include <linux/mlx5/accel.h>
|
 #include <linux/mlx5/cq.h>
|
 #include <linux/mlx5/device.h>
|
 #include <linux/mlx5/driver.h>
|
 #include <linux/mlx5/eswitch.h>
|
 #include <linux/mlx5/fs.h>
|
 #include <linux/mlx5/port.h>
|
 #include <linux/mlx5/qp.h>
|
 #include <linux/mlx5/rsc_dump.h>
|
 #include <linux/mlx5/transobj.h>
|
 #include <linux/mlx5/vport.h>
|
 #include <linux/of_mdio.h>
|
 #include <linux/of_net.h>
|
 #include <linux/pcs-lynx.h>
|
 #include <linux/pcs/pcs-xpcs.h>
|
 #include <linux/phy.h>
|
 #include <linux/phy_led_triggers.h>
|
 #include <linux/phylink.h>
|
 #include <linux/platform_data/bcmgenet.h>
|
 #include <linux/platform_data/xilinx-ll-temac.h>
|
 #include <linux/pxa168_eth.h>
|
 #include <linux/qed/qed_eth_if.h>
|
 #include <linux/qed/qed_fcoe_if.h>
|
 #include <linux/qed/qed_if.h>
|
 #include <linux/qed/qed_iov_if.h>
|
 #include <linux/qed/qed_iscsi_if.h>
|
 #include <linux/qed/qed_ll2_if.h>
|
 #include <linux/qed/qed_nvmetcp_if.h>
|
 #include <linux/qed/qed_rdma_if.h>
|
 #include <linux/sfp.h>
|
 #include <linux/sh_eth.h>
|
 #include <linux/smsc911x.h>
|
 #include <linux/soc/nxp/lpc32xx-misc.h>
|
 #include <linux/stmmac.h>
|
 #include <linux/sunrpc/svc_rdma.h>
|
 #include <linux/sxgbe_platform.h>
|
 #include <net/cfg80211.h>
|
 #include <net/dsa.h>
|
 #include <net/mac80211.h>
|
 #include <net/selftests.h>
|
 #include <rdma/ib_addr.h>
|
 #include <rdma/ib_cache.h>
|
 #include <rdma/ib_cm.h>
|
 #include <rdma/ib_hdrs.h>
|
 #include <rdma/ib_mad.h>
|
 #include <rdma/ib_marshall.h>
|
 #include <rdma/ib_pack.h>
|
 #include <rdma/ib_pma.h>
|
 #include <rdma/ib_sa.h>
|
 #include <rdma/ib_smi.h>
|
 #include <rdma/ib_umem.h>
|
 #include <rdma/ib_umem_odp.h>
|
 #include <rdma/ib_verbs.h>
|
 #include <rdma/iw_cm.h>
|
 #include <rdma/mr_pool.h>
|
 #include <rdma/opa_addr.h>
|
 #include <rdma/opa_port_info.h>
|
 #include <rdma/opa_smi.h>
|
 #include <rdma/opa_vnic.h>
|
 #include <rdma/rdma_cm.h>
|
 #include <rdma/rdma_cm_ib.h>
|
 #include <rdma/rdmavt_cq.h>
|
 #include <rdma/rdma_vt.h>
|
 #include <rdma/rdmavt_qp.h>
|
 #include <rdma/rw.h>
|
 #include <rdma/tid_rdma_defs.h>
|
 #include <rdma/uverbs_ioctl.h>
|
 #include <rdma/uverbs_named_ioctl.h>
|
 #include <rdma/uverbs_std_types.h>
|
 #include <rdma/uverbs_types.h>
|
 #include <soc/mscc/ocelot.h>
|
 #include <soc/mscc/ocelot_ptp.h>
|
 #include <soc/mscc/ocelot_vcap.h>
|
 #include <trace/events/ib_mad.h>
|
 #include <trace/events/rdma_core.h>
|
 #include <trace/events/rdma.h>
|
 #include <trace/events/rpcrdma.h>
|
 #include <uapi/linux/ethtool.h>
|
 #include <uapi/linux/ethtool_netlink.h>
|
 #include <uapi/linux/mdio.h>
|
 #include <uapi/linux/mii.h>
)

@depends on i@
expression list args;
@@

(
- bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_zero(args)
|
- bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_copy(args)
|
- bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_and(args)
|
- bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_or(args)
|
- bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_empty(args)
|
- bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_andnot(args)
|
- bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_equal(args)
|
- bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_intersects(args)
|
- bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_subset(args)
)

Add missing linux/mii.h include to mellanox. -DaveM

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24 13:58:52 +01:00
Ido Schimmel
353407d917 ethtool: Add ability to control transceiver modules' power mode
Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and
'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver
modules parameters and retrieve their status.

The first parameter to control is the power mode of the module. It is
only relevant for paged memory modules, as flat memory modules always
operate in low power mode.

When a paged memory module is in low power mode, its power consumption
is reduced to the minimum, the management interface towards the host is
available and the data path is deactivated.

User space can choose to put modules that are not currently in use in
low power mode and transition them to high power mode before putting the
associated ports administratively up. This is useful for user space that
favors reduced power consumption and lower temperatures over reduced
link up times. In QSFP-DD modules the transition from low power mode to
high power mode can take a few seconds and this transition is only
expected to get longer with future / more complex modules.

User space can control the power mode of the module via the power mode
policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible
values:

* high: Module is always in high power mode.

* auto: Module is transitioned by the host to high power mode when the
  first port using it is put administratively up and to low power mode
  when the last port using it is put administratively down.

The operational power mode of the module is available to user space via
the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not
reported to user space when a module is not plugged-in.

The user API is designed to be generic enough so that it could be used
for modules with different memory maps (e.g., SFF-8636, CMIS).

The only implementation of the device driver API in this series is for a
MAC driver (mlxsw) where the module is controlled by the device's
firmware, but it is designed to be generic enough so that it could also
be used by implementations where the module is controlled by the CPU.

CMIS testing
============

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x03 (ModuleReady)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : Off

The module is not in low power mode, as it is not forced by hardware
(LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off).

The power mode can be queried from the kernel. In case
LowPwrAllowRequestHW was on, the kernel would need to take into account
the state of the LowPwrRequestHW signal, which is not visible to user
space.

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy high
 power-mode high

Change the power mode policy to 'auto':

 # ethtool --set-module swp11 power-mode-policy auto

Query the power mode again:

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy auto
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x01 (ModuleLowPwr)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : On

Put the associated port administratively up which will instruct the host
to transition the module to high power mode:

 # ip link set dev swp11 up

Query the power mode again:

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy auto
 power-mode high

Verify with the data read from the EEPROM:

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x03 (ModuleReady)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : Off

Put the associated port administratively down which will instruct the
host to transition the module to low power mode:

 # ip link set dev swp11 down

Query the power mode again:

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy auto
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x01 (ModuleLowPwr)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : On

SFF-8636 testing
================

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 ...
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) enabled
 Power set                                 : Off
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.7733 mW / -1.12 dBm
 Transmit avg optical power (Channel 2)    : 0.7649 mW / -1.16 dBm
 Transmit avg optical power (Channel 3)    : 0.7790 mW / -1.08 dBm
 Transmit avg optical power (Channel 4)    : 0.7837 mW / -1.06 dBm
 Rcvr signal avg optical power(Channel 1)  : 0.9302 mW / -0.31 dBm
 Rcvr signal avg optical power(Channel 2)  : 0.9079 mW / -0.42 dBm
 Rcvr signal avg optical power(Channel 3)  : 0.8993 mW / -0.46 dBm
 Rcvr signal avg optical power(Channel 4)  : 0.8778 mW / -0.57 dBm

The module is not in low power mode, as it is not forced by hardware
(Power override is on) or by software (Power set is off).

The power mode can be queried from the kernel. In case Power override
was off, the kernel would need to take into account the state of the
LPMode signal, which is not visible to user space.

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy high
 power-mode high

Change the power mode policy to 'auto':

 # ethtool --set-module swp13 power-mode-policy auto

Query the power mode again:

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy auto
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) not enabled
 Power set                                 : On
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 1)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm

Put the associated port administratively up which will instruct the host
to transition the module to high power mode:

 # ip link set dev swp13 up

Query the power mode again:

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy auto
 power-mode high

Verify with the data read from the EEPROM:

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 ...
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) enabled
 Power set                                 : Off
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.7934 mW / -1.01 dBm
 Transmit avg optical power (Channel 2)    : 0.7859 mW / -1.05 dBm
 Transmit avg optical power (Channel 3)    : 0.7885 mW / -1.03 dBm
 Transmit avg optical power (Channel 4)    : 0.7985 mW / -0.98 dBm
 Rcvr signal avg optical power(Channel 1)  : 0.9325 mW / -0.30 dBm
 Rcvr signal avg optical power(Channel 2)  : 0.9034 mW / -0.44 dBm
 Rcvr signal avg optical power(Channel 3)  : 0.9086 mW / -0.42 dBm
 Rcvr signal avg optical power(Channel 4)  : 0.8885 mW / -0.51 dBm

Put the associated port administratively down which will instruct the
host to transition the module to low power mode:

 # ip link set dev swp13 down

Query the power mode again:

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy auto
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 ...
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) not enabled
 Power set                                 : On
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 1)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-06 17:47:49 -07:00
Gustavo A. R. Silva
ed717613f9 ethtool: ioctl: Use array_size() helper in copy_{from,to}_user()
Use array_size() helper instead of the open-coded version in
copy_{from,to}_user().  These sorts of multiplication factors
need to be wrapped in array_size().

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29 11:32:14 +01:00
Heiner Kallweit
b9bbc4c1de ethtool: prevent endless loop if eeprom size is smaller than announced
It shouldn't happen, but can happen that readable eeprom size is smaller
than announced. Then we would be stuck in an endless loop here because
after reaching the actual end reads return eeprom.len = 0. I faced this
issue when making a mistake in driver development. Detect this scenario
and return an error.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14 14:26:47 +01:00
Yufeng Mo
f3ccfda193 ethtool: extend coalesce setting uAPI with CQE mode
In order to support more coalesce parameters through netlink,
add two new parameter kernel_coal and extack for .set_coalesce
and .get_coalesce, then some extra info can return to user with
the netlink API.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24 07:38:29 -07:00
Yufeng Mo
029ee6b143 ethtool: add two coalesce attributes for CQE mode
Currently, there are many drivers who support CQE mode configuration,
some configure it as a fixed when initialized, some provide an
interface to change it by ethtool private flags. In order to make it
more generic, add two new 'ETHTOOL_A_COALESCE_USE_CQE_TX' and
'ETHTOOL_A_COALESCE_USE_CQE_RX' coalesce attributes, then these
parameters can be accessed by ethtool netlink coalesce uAPI.

Also add an new structure kernel_ethtool_coalesce, then the
new parameter can be added into this struct.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24 07:38:28 -07:00
Heiner Kallweit
596690e9f4 ethtool: return error from ethnl_ops_begin if dev is NULL
Julian reported that after d43c65b05b Coverity complains about a
missing check whether dev is NULL in ethnl_ops_complete().
There doesn't seem to be any valid case where dev could be NULL when
calling ethnl_ops_begin(), therefore return an error if dev is NULL.

Fixes: d43c65b05b ("ethtool: runtime-resume netdev parent in ethnl_ops_begin")
Reported-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06 10:39:04 +01:00
Yajun Deng
1160dfa178 net: Remove redundant if statements
The 'if (dev)' statement already move into dev_{put , hold}, so remove
redundant if statements.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05 13:27:50 +01:00
Heiner Kallweit
d43c65b05b ethtool: runtime-resume netdev parent in ethnl_ops_begin
If a network device is runtime-suspended then:
- network device may be flagged as detached and all ethtool ops (even if not
  accessing the device) will fail because netif_device_present() returns
  false
- ethtool ops may fail because device is not accessible (e.g. because being
  in D3 in case of a PCI device)

It may not be desirable that userspace can't use even simple ethtool ops
that not access the device if interface or link is down. To be more friendly
to userspace let's ensure that device is runtime-resumed when executing the
respective ethtool op in kernel.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03 12:58:22 +01:00
Heiner Kallweit
41107ac22f ethtool: move netif_device_present check from ethnl_parse_header_dev_get to ethnl_ops_begin
If device is runtime-suspended and not accessible then it may be
flagged as not present. If checking whether device is present is
done too early then we may bail out before we have the chance to
runtime-resume the device. Therefore move this check to
ethnl_ops_begin(). This is in preparation of a follow-up patch
that tries to runtime-resume the device before executing ethtool
ops.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03 12:58:22 +01:00
Heiner Kallweit
c5ab51df03 ethtool: move implementation of ethnl_ops_begin/complete to netlink.c
In preparation of subsequent extensions to both functions move the
implementations from netlink.h to netlink.c.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03 12:58:22 +01:00
Heiner Kallweit
f32a213765 ethtool: runtime-resume netdev parent before ethtool ioctl ops
If a network device is runtime-suspended then:
- network device may be flagged as detached and all ethtool ops (even if not
  accessing the device) will fail because netif_device_present() returns
  false
- ethtool ops may fail because device is not accessible (e.g. because being
  in D3 in case of a PCI device)

It may not be desirable that userspace can't use even simple ethtool ops
that not access the device if interface or link is down. To be more friendly
to userspace let's ensure that device is runtime-resumed when executing the
respective ethtool op in kernel.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03 12:58:22 +01:00
Arnd Bergmann
a554bf96b4 dev_ioctl: pass SIOCDEVPRIVATE data separately
The compat handlers for SIOCDEVPRIVATE are incorrect for any driver that
passes data as part of struct ifreq rather than as an ifr_data pointer, or
that passes data back this way, since the compat_ifr_data_ioctl() helper
overwrites the ifr_data pointer and does not copy anything back out.

Since all drivers using devprivate commands are now converted to the
new .ndo_siocdevprivate callback, fix this by adding the missing piece
and passing the pointer separately the whole way.

This further unifies the native and compat logic for socket ioctls,
as the new code now passes the correct pointer as well as the correct
data for both native and compat ioctls.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27 20:11:44 +01:00
Saeed Mahameed
9b29a161ef ethtool: Fix rxnfc copy to user buffer overflow
In the cited commit, copy_to_user() got called with the wrong pointer,
instead of passing the actual buffer ptr to copy from, a pointer to
the pointer got passed, which causes a buffer overflow calltrace to pop
up when executing "ethtool -x ethX".

Fix ethtool_rxnfc_copy_to_user() to use the rxnfc pointer as passed
to the function, instead of a pointer to it.

This fixes below call trace:
[   15.533533] ------------[ cut here ]------------
[   15.539007] Buffer overflow detected (8 < 192)!
[   15.544110] WARNING: CPU: 3 PID: 1801 at include/linux/thread_info.h:200 copy_overflow+0x15/0x20
[   15.549308] Modules linked in:
[   15.551449] CPU: 3 PID: 1801 Comm: ethtool Not tainted 5.14.0-rc2+ #1058
[   15.553919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[   15.558378] RIP: 0010:copy_overflow+0x15/0x20
[   15.560648] Code: e9 7c ff ff ff b8 a1 ff ff ff eb c4 66 0f 1f 84 00 00 00 00 00 55 48 89 f2 89 fe 48 c7 c7 88 55 78 8a 48 89 e5 e8 06 5c 1e 00 <0f> 0b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55
[   15.565114] RSP: 0018:ffffad49c0523bd0 EFLAGS: 00010286
[   15.566231] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000
[   15.567616] RDX: 0000000000000001 RSI: ffffffff8a7912e7 RDI: 00000000ffffffff
[   15.569050] RBP: ffffad49c0523bd0 R08: ffffffff8ab2ae28 R09: 00000000ffffdfff
[   15.570534] R10: ffffffff8aa4ae40 R11: ffffffff8aa4ae40 R12: 0000000000000000
[   15.571899] R13: 00007ffd4cc2a230 R14: ffffad49c0523c00 R15: 0000000000000000
[   15.573584] FS:  00007f538112f740(0000) GS:ffff96d5bdd80000(0000) knlGS:0000000000000000
[   15.575639] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.577092] CR2: 00007f5381226d40 CR3: 0000000013542000 CR4: 00000000001506e0
[   15.578929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   15.580695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   15.582441] Call Trace:
[   15.582970]  ethtool_rxnfc_copy_to_user+0x30/0x46
[   15.583815]  ethtool_get_rxnfc.cold+0x23/0x2b
[   15.584584]  dev_ethtool+0x29c/0x25f0
[   15.585286]  ? security_netlbl_sid_to_secattr+0x77/0xd0
[   15.586728]  ? do_set_pte+0xc4/0x110
[   15.587349]  ? _raw_spin_unlock+0x18/0x30
[   15.588118]  ? __might_sleep+0x49/0x80
[   15.588956]  dev_ioctl+0x2c1/0x490
[   15.589616]  sock_ioctl+0x18e/0x330
[   15.591143]  __x64_sys_ioctl+0x41c/0x990
[   15.591823]  ? irqentry_exit_to_user_mode+0x9/0x20
[   15.592657]  ? irqentry_exit+0x33/0x40
[   15.593308]  ? exc_page_fault+0x32f/0x770
[   15.593877]  ? exit_to_user_mode_prepare+0x3c/0x130
[   15.594775]  do_syscall_64+0x35/0x80
[   15.595397]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   15.596037] RIP: 0033:0x7f5381226d4b
[   15.596492] Code: 0f 1e fa 48 8b 05 3d b1 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d b1 0c 00 f7 d8 64 89 01 48
[   15.598743] RSP: 002b:00007ffd4cc2a1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   15.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5381226d4b
[   15.600795] RDX: 00007ffd4cc2a350 RSI: 0000000000008946 RDI: 0000000000000003
[   15.601712] RBP: 00007ffd4cc2a340 R08: 00007ffd4cc2a350 R09: 0000000000000001
[   15.602751] R10: 00007f538128a990 R11: 0000000000000246 R12: 0000000000000000
[   15.603882] R13: 00007ffd4cc2a350 R14: 00007ffd4cc2a4b0 R15: 0000000000000000
[   15.605042] ---[ end trace 325cf185e2795048 ]---

Fixes: dd98d2895d ("ethtool: improve compat ioctl handling")
Reported-by: Shannon Nelson <snelson@pensando.io>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tested-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27 11:09:43 +01:00
Arnd Bergmann
dd98d2895d ethtool: improve compat ioctl handling
The ethtool compat ioctl handling is hidden away in net/socket.c,
which introduces a couple of minor oddities:

- The implementation may end up diverging, as seen in the RXNFC
  extension in commit 84a1d9c482 ("net: ethtool: extend RXNFC
  API to support RSS spreading of filter matches") that does not work
  in compat mode.

- Most architectures do not need the compat handling at all
  because u64 and compat_u64 have the same alignment.

- On x86, the conversion is done for both x32 and i386 user space,
  but it's actually wrong to do it for x32 and cannot work there.

- On 32-bit Arm, it never worked for compat oabi user space, since
  that needs to do the same conversion but does not.

- It would be nice to get rid of both compat_alloc_user_space()
  and copy_in_user() throughout the kernel.

None of these actually seems to be a serious problem that real
users are likely to encounter, but fixing all of them actually
leads to code that is both shorter and more readable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23 14:20:25 +01:00
Yangbo Lu
d463126e23 net: sock: extend SO_TIMESTAMPING for PHC binding
Since PTP virtual clock support is added, there can be
several PTP virtual clocks based on one PTP physical
clock for timestamping.

This patch is to extend SO_TIMESTAMPING API to support
PHC (PTP Hardware Clock) binding by adding a new flag
SOF_TIMESTAMPING_BIND_PHC. When PTP virtual clocks are
in use, user space can configure to bind one for
timestamping, but PTP physical clock is not supported
and not needed to bind.

This patch is preparation for timestamp conversion from
raw timestamp to a specific PTP virtual clock time in
core net.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01 13:08:18 -07:00
Yangbo Lu
c156174a67 ethtool: add a new command for getting PHC virtual clocks
Add an interface for getting PHC (PTP Hardware Clock)
virtual clocks, which are based on PHC physical clock
providing hardware timestamp to network packets.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01 13:08:18 -07:00
Ido Schimmel
88f9a87afe ethtool: Validate module EEPROM offset as part of policy
Validate the offset to read from module EEPROM as part of the netlink
policy and remove the corresponding check from the code.

This also makes it possible to query the offset range from user space:

 $ genl ctrl policy name ethtool
 ...
 ID: 0x14  policy[32]:attr[2]: type=U32 range:[0,255]
 ...

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
Ido Schimmel
0dc7dd02ba ethtool: Validate module EEPROM length as part of policy
Validate the number of bytes to read from the module EEPROM as part of
the netlink policy and remove the corresponding check from the code.

This also makes it possible to query the length range from user space:

 $ genl ctrl policy name ethtool
 ...
 ID: 0x14  policy[32]:attr[3]: type=U32 range:[1,128]
 ...

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
Ido Schimmel
f5fe211d13 ethtool: Decrease size of module EEPROM get policy array
The 'ETHTOOL_A_MODULE_EEPROM_DATA' attribute is not part of the get
request.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
Jakub Kicinski
adc2e56ebe Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18 19:47:02 -07:00
Jakub Kicinski
4d1fb7cde0 ethtool: add a stricter length check
There has been a few errors in the ethtool reply size calculations,
most of those are hard to trigger during basic testing because of
skb size rounding up and netdev names being shorter than max.
Add a more precise check.

This change will affect the value of payload length displayed in
case of -EMSGSIZE but that should be okay, "payload length" isn't
a well defined term here.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:40:44 -07:00
Jakub Kicinski
e175aef902 ethtool: strset: fix message length calculation
Outer nest for ETHTOOL_A_STRSET_STRINGSETS is not accounted for.
This may result in ETHTOOL_MSG_STRSET_GET producing a warning like:

    calculated message payload length (684) not sufficient
    WARNING: CPU: 0 PID: 30967 at net/ethtool/netlink.c:369 ethnl_default_doit+0x87a/0xa20

and a splat.

As usually with such warnings three conditions must be met for the warning
to trigger:
 - there must be no skb size rounding up (e.g. reply_size of 684);
 - string set must be per-device (so that the header gets populated);
 - the device name must be at least 12 characters long.

all in all with current user space it looks like reading priv flags
is the only place this could potentially happen. Or with syzbot :)

Reported-by: syzbot+59aa77b92d06cd5a54f2@syzkaller.appspotmail.com
Fixes: 71921690f9 ("ethtool: provide string sets with STRSET_GET request")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14 12:14:24 -07:00
Austin Kim
80ec82e3d2 net: ethtool: clear heap allocations for ethtool function
Several ethtool functions leave heap uncleared (potentially) by
drivers. This will leave the unused portion of heap unchanged and
might copy the full contents back to userspace.

Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-09 13:53:31 -07:00
Ido Schimmel
51c96a561f ethtool: Fix NULL pointer dereference during module EEPROM dump
When get_module_eeprom_by_page() is not implemented by the driver, NULL
pointer dereference can occur [1].

Fix by testing if get_module_eeprom_by_page() is implemented instead of
get_module_info().

[1]
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 [...]
 CPU: 0 PID: 251 Comm: ethtool Not tainted 5.13.0-rc3-custom-00940-g3822d0670c9d #989
 Call Trace:
  eeprom_prepare_data+0x101/0x2d0
  ethnl_default_doit+0xc2/0x290
  genl_family_rcv_msg_doit+0xdc/0x140
  genl_rcv_msg+0xd7/0x1d0
  netlink_rcv_skb+0x49/0xf0
  genl_rcv+0x1f/0x30
  netlink_unicast+0x1f6/0x2c0
  netlink_sendmsg+0x1f9/0x400
  __sys_sendto+0xe1/0x130
  __x64_sys_sendto+0x1b/0x20
  do_syscall_64+0x3a/0x70
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: c97a31f66e ("ethtool: wire in generic SFP module access")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07 13:10:34 -07:00
Zheng Yongjun
b676c7f1c3 ethtool: Fix a typo
atribute  ==> attribute

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-02 14:01:55 -07:00