Since
commit e6e771b3d8 ("s390/qeth: detach netdevice while card is offline")
there was a timing window during recovery, that qeth_query_card_info could
be sent to the card, even before it was ready for it, leading to a failing
card recovery. There is evidence that this window was hit, as not all
callers of get_link_ksettings() check for netif_device_present.
Use cached values in qeth_get_link_ksettings(), instead of calling
qeth_query_card_info() and falling back to default values in case it
fails. Link info is already updated when the card goes online, e.g. after
STARTLAN (physical link up). Set the link info to default values, when the
card goes offline or at STOPLAN (physical link down). A follow-on patch
will improve values reported for link down.
Fixes: e6e771b3d8 ("s390/qeth: detach netdevice while card is offline")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com>
Link: https://lore.kernel.org/r/20220805155714.59609-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add two new parameters kernel_ringparam and extack for
.get_ringparam and .set_ringparam to extend more ring params
through netlink.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support more coalesce parameters through netlink,
add two new parameter kernel_coal and extack for .set_coalesce
and .get_coalesce, then some extra info can return to user with
the netlink API.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit fb64de1bc3 ("s390/qeth: phase out OSN support") spelled out
why the OSN support in qeth is in a bad shape, and put any remaining
interested parties on notice to speak up before it gets ripped out.
It's 2021 now, so make true on that promise and remove all the
OSN-specific parts from qeth. This also means that we no longer need to
export various parts of the cmd & data path internals to the L2 driver.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a recently introduced helper to fill our ethtool stats strings.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While the qdio layer already tracks the number of HW interrupts for a
device, there's value in understanding how many of them have been
raised due to our TX completion logic.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The link mode is a combination of port speed and port mode. But we
currently only consider the speed, and then typically select the
corresponding TP-based link mode. For 1G and 10G Fibre links this means
we display the wrong link modes.
Move the SPEED_* switch statements inside the PORT_* cases, and only
consider valid combinations where we can select the corresponding
link mode. Add the relevant link modes (1000baseX, 10000baseSR and
1000baseLR) that were introduced back with
commit 5711a98221 ("net: ethtool: add support for 1000BaseX and missing 10G link modes").
To differentiate between 10000baseSR and 10000baseLR, use the detailed
media_type information that QUERY OAT provides.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the default case for PORT_* and SPEED_* in our ethtool code.
The only time these could be hit is if qeth_init_link_info() was unable
to determine the port type from an OSA adapter's link_type.
We already throw a message in this case, so reduce the noise and don't
report bad data (ie. it's much more likely that any future link_type
will represent a PORT_FIBRE link ...).
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hard-code the minimal link info at initialization time, after we
obtained the link_type. qeth_get_link_ksettings() can still override
this with more accurate data from QUERY CARD INFO later on.
Don't set arbitrary defaults for unknown OSA link types, they
certainly won't match any future type.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move all the HW reply data parsing into qeth_query_card_info_cb(), and
use common ethtool enums for transporting the information back to the
caller.
Also only look at the .port_speed field when we couldn't determine the
speed from the .card_type field, and introduce some 'default' cases for
SPEED_UNKNOWN, PORT_OTHER and DUPLEX_UNKNOWN.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By the time that our .get_link_ksettings() code issues a QUERY CARD INFO
cmd to get link-related information, we already set up a good amount of
static link data.
Return this data when the cmd fails, same as when the cmd is not
supported.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For OSA devices that are _not_ configured in prio-queue mode, give users
the option of selecting the number of active TX queues.
This requires setting up the HW queues with a reasonable default QoS
value in the QIB's PQUE parm area.
As with the other device types, we bring up the device with a minimal
number of TX queues for compatibility reasons.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When re-initializing a device, we can hit a situation where
qeth_osa_set_output_queues() detects that it supports more or less
HW TX queues than before. Right now we adjust dev->real_num_tx_queues
from right there, but
1. it's getting more & more complicated to cover all cases, and
2. we can't re-enable the actually expected number of TX queues later
because we lost the needed information.
So keep track of the wanted TX queues (on initial setup, and whenever
its changed via .set_channels), and later use that information when
re-enabling the netdevice.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since IQD devices complete (most of) their transmissions synchronously,
they don't offer TX completion IRQs and have no HW coalescing controls.
But we can fake the easy parts in SW, and give the user some control wrt
to how often the TX NAPI code should be triggered to process the TX
completions.
Having per-queue controls can in particular help the dedicated mcast
queue, as it likely benefits from different fine-tuning than what the
ucast queues need.
CC: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count the number of TX doorbells we issue to the qdio layer.
Also count the number of actual frames in a TX buffer, and then
use this data along with the byte count during TX completion.
We'll make additional use of the frame count in a subsequent patch.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Versions are meaningless for an in-kernel driver.
Instead use the UTS_RELEASE that is set by ethtool_get_drvinfo().
Cc: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for SOF_TIMESTAMPING_TX_SOFTWARE.
No support for non-IQD devices, since they orphan the skb in their xmit
path.
To play nice with TX bulking, set the timestamp when the buffer that
contains the skb(s) is actually flushed out to HW.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ucast traffic, qeth_iqd_select_queue() falls back to
netdev_pick_tx(). This will potentially use skb_tx_hash() to distribute
the flow over all active TX queues - so txq 0 is a valid selection, and
qeth_iqd_select_queue() needs to check for this and put it on some other
queue. As a result, the distribution for ucast flows is unbalanced and
hits QETH_IQD_MIN_UCAST_TXQ heavier than the other queues.
Open-coding a custom variant of skb_tx_hash() isn't an option, since
netdev_pick_tx() also gives us eg. access to XPS. But we can pull a
little trick: add a single TC class that excludes the mcast txq, and
thus encourage skb_tx_hash() to not pick the mcast txq.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to the support for z/VM NICs, but we need to take extra care
about the dedicated mcast queue:
1. netdev_pick_tx() is unaware of this limitation and might select the
mcast txq. Catch this.
2. require at least _two_ TX queues - one for ucast, one for mcast.
3. when reducing the number of TX queues, there's a potential race
where netdev_cap_txqueue() over-rules the selected txq index and
falls back to index 0. This would place ucast traffic on the mcast
queue, and result in TX errors.
So for IQD, reject a reduction while the interface is running.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for ETHTOOL_SCHANNELS to change the count of active
TX queues.
Since all TX queue structs are pre-allocated and -registered, we just
need to trivially adjust dev->real_num_tx_queues.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the ethtool hooks for the ETHTOOL_RX_COPYBREAK tunable.
The copybreak is stored into netdev_priv, so that we automatically go
back to the default value if the netdev is re-allocated.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Depending on a packet's type, the RX path needs to access fields in the
packet headers and thus requires a minimum packet length.
Enforce this length when building the skb.
On the other hand a single runt packet is no reason to drop the whole
RX buffer. So just skip it, and continue processing on the next packet.
Fixes: 4a71df5004 ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Where available, use the fine-grained counters in rtnl_link_stats64 to
indicate different RX error causes. For drop reasons, use driver-private
ethtool counters.
In particular this patch allows us to keep track of driver-side drops due
to unknown/unsupported HW descriptor format.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to their large MTU and potentially low utilization of TX buffers,
IQD devices in particular require fast TX recycling. This makes them
a prime candidate for a TX NAPI path in qeth.
qeth_tx_poll() uses the recently introduced qdio_inspect_queue() helper
to poll the TX queue for completed buffers. To avoid hogging the CPU for
too long, we yield to the stack after completing an entire queue's worth
of buffers.
While IQD is expected to transfer its buffers synchronously (and thus
doesn't support TX interrupts), a timer covers for the odd case where a
TX buffer doesn't complete synchronously. Currently this timer should
only ever fire for
(1) the mcast queue,
(2) the occasional race, where the NAPI poll code observes an update to
queue->used_buffers while the TX doorbell hasn't been issued yet.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current xmit code only stops the txq after attempting to fill an
IO buffer that hasn't been TX-completed yet. In many-connection
scenarios, this can result in frequent rejected TX attempts, requeuing
of skbs with NETDEV_TX_BUSY and extra overhead.
Now that we have a proper 1-to-1 relation between stack-side txqs and
our HW Queues, overhaul the stop/wake logic so that the xmit code
stops the txq as needed.
Given that we might map multiple skbs into a single buffer, it's crucial
to ensure that the queue always provides an _entirely_ empty IO buffer.
Otherwise large skbs (eg TSO) might not fit into the last available
buffer. So whenever qeth_do_send_packet() first utilizes an _empty_
buffer, it updates & checks the used_buffers count.
This now ensures that an skb passed to qeth_xmit() can always be mapped
into an IO buffer, so remove all of the -EBUSY roll-back handling in the
TX path. We preserve the minimal safety-checks ("Is this IO buffer
really available?"), just in case some nasty future bug ever attempts to
corrupt an in-use buffer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth has been supporting multiple HW Output Queues for a long time. But
rather than exposing those queues to the stack, it uses its own queue
selection logic in .ndo_start_xmit... with all the drawbacks that
entails.
Start off by switching IQD devices over to a proper mqs net_device,
and converting all the netdev_queue management code.
One oddity with IQD devices is the requirement to place all mcast
traffic on the _highest_ established HW queue. Doing so via
.ndo_select_queue seems straight-forward - but that won't work if only
some of the HW queues are active
(ie. when dev->real_num_tx_queues < dev->num_tx_queues), since
netdev_cap_txqueue() will not allow us to put skbs on the higher queues.
To make this work, we
1. let .ndo_select_queue() map all mcast traffic to netdev_queue 0, and
2. later re-map the netdev_queue and HW queue indices in
.ndo_start_xmit and the TX completion handler.
With this patch we default to a fixed set of 1 ucast and 1 mcast queue.
Support for dynamic reconfiguration is added at a later time.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement a trivial callback that exposes the queue sizes.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Accumulate per-TX queue statistics, and increase their size to 64 bit.
Don't bother with enabling/disabling the statistics, the overhead is
negligible.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>