Commit graph

966497 commits

Author SHA1 Message Date
Xie He
14b20704a1 net: hdlc_fr: Change the use of "dev" in fr_rx to make the code cleaner
The eth_type_trans function is called when we receive frames carrying
Ethernet frames. This function expects a non-NULL pointer as an argument,
and assigns it directly to skb->dev.

However, the code handling other types of frames first assigns the pointer
to "dev", and then at the end checks whether the value is NULL, and if it
is not NULL, assigns it to skb->dev.

The two flows are different. Mixing them in this function makes the code
messy. It's better that we convert the second flow to align with how
eth_type_trans does things.

So this patch changes the code to: first make sure the pointer is not
NULL, then assign it directly to skb->dev. "dev" is no longer needed until
the end where we use it to update stats.

Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-03 15:19:09 -08:00
Xie He
583d5333ed net: hdlc_fr: Simpify fr_rx by using "goto rx_drop" to drop frames
When the fr_rx function drops a received frame (because the protocol type
is not supported, or because the PVC virtual device that corresponds to
the DLCI number and the protocol type doesn't exist), the function frees
the skb and returns.

The code for freeing the skb and returning is repeated several times, this
patch uses "goto rx_drop" to replace them so that the code looks cleaner.

Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-03 15:19:09 -08:00
Lijun Pan
16b5f5ce35 ibmvnic: merge do_change_param_reset into do_reset
Commit b27507bb59 ("net/ibmvnic: unlock rtnl_lock in reset so
linkwatch_event can run") introduced do_change_param_reset function to
solve the rtnl lock issue. Majority of the code in do_change_param_reset
duplicates do_reset. Also, we can handle the rtnl lock issue in do_reset
itself. Hence merge do_change_param_reset back into do_reset to clean up
the code.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201031094645.17255-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-03 15:08:14 -08:00
Guillaume Nault
0992d67bc2 mpls: drop skb's dst in mpls_forward()
Commit 394de110a7 ("net: Added pointer check for
dst->ops->neigh_lookup in dst_neigh_lookup_skb") added a test in
dst_neigh_lookup_skb() to avoid a NULL pointer dereference. The root
cause was the MPLS forwarding code, which doesn't call skb_dst_drop()
on incoming packets. That is, if the packet is received from a
collect_md device, it has a metadata_dst attached to it that doesn't
implement any dst_ops function.

To align the MPLS behaviour with IPv4 and IPv6, let's drop the dst in
mpls_forward(). This way, dst_neigh_lookup_skb() doesn't need to test
->neigh_lookup any more. Let's keep a WARN condition though, to
document the precondition and to ease detection of such problems in the
future.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/f8c2784c13faa54469a2aac339470b1049ca6b63.1604102750.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-03 12:55:53 -08:00
Jakub Kicinski
6d89076e6e Merge branch 'net-mac80211-kernel-enable-kcov-remote-coverage-collection-for-802-11-frame-handling'
Aleksandr Nogikh says:

====================
net, mac80211, kernel: enable KCOV remote coverage collection for 802.11 frame handling

This patch series enables remote KCOV coverage collection during
802.11 frames processing. These changes make it possible to perform
coverage-guided fuzzing in search of remotely triggerable bugs.

Normally, KCOV collects coverage information for the code that is
executed inside the system call context. It is easy to identify where
that coverage should go and whether it should be collected at all by
looking at the current process. If KCOV was enabled on that process,
coverage will be stored in a buffer specific to that process.
Howerever, it is not always enough as handling can happen elsewhere
(e.g. in separate kernel threads).

When it is impossible to infer KCOV-related info just by looking at
the currently running process, one needs to manually pass some
information to the code that should be instrumented. The information
takes the form of 64 bit integers (KCOV remote handles). Zero is the
special value that corresponds to an empty handle. More details on
KCOV and remote coverage collection can be found in
Documentation/dev-tools/kcov.rst.

The series consists of three commits.
1. Apply a minor fix to kcov_common_handle() so that it returns a
valid handle (zero) when called in an interrupt context.
2. Take the remote handle from KCOV and attach it to newly allocated
SKBs as an skb extension. If the allocation happens inside a system
call context, the SKB will be tied to the process that issued the
syscall (if that process is interested in remote coverage collection).
3. Annotate the code that processes incoming 802.11 frames with
kcov_remote_start()/kcov_remote_stop().

v5:
* Collecting remote coverate at ieee80211_rx_list() instead of
  ieee80211_rx()

v4:
https://lkml.kernel.org/r/20201028182018.1780842-1-aleksandrnogikh@gmail.com
* CONFIG_SKB_EXTENSIONS is now automatically selected by CONFIG_KCOV.
* Elaborated on a minor optimization in skb_set_kcov_handle().

v3:
https://lkml.kernel.org/r/20201026150851.528148-1-aleksandrnogikh@gmail.com
* kcov_handle is now stored in skb extensions instead of sk_buff
  itself.
* Updated the cover letter.

v2:
https://lkml.kernel.org/r/20201009170202.103512-1-a.nogikh@gmail.com
* Moved KCOV annotations from ieee80211_tasklet_handler to
  ieee80211_rx.
* Updated kcov_common_handle() to return 0 if it is called in
  interrupt context.
* Updated the cover letter.

v1:
https://lkml.kernel.org/r/20201007101726.3149375-1-a.nogikh@gmail.com
====================

Link: https://lore.kernel.org/r/20201029173620.2121359-1-aleksandrnogikh@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 18:01:46 -08:00
Aleksandr Nogikh
261e411bb2 mac80211: add KCOV remote annotations to incoming frame processing
Add KCOV remote annotations to ieee80211_iface_work() and
ieee80211_rx_list(). This will enable coverage-guided fuzzing of
mac80211 code that processes incoming 802.11 frames.

Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 18:01:45 -08:00
Aleksandr Nogikh
6370cc3bbd net: add kcov handle to skb extensions
Remote KCOV coverage collection enables coverage-guided fuzzing of the
code that is not reachable during normal system call execution. It is
especially helpful for fuzzing networking subsystems, where it is
common to perform packet handling in separate work queues even for the
packets that originated directly from the user space.

Enable coverage-guided frame injection by adding kcov remote handle to
skb extensions. Default initialization in __alloc_skb and
__build_skb_around ensures that no socket buffer that was generated
during a system call will be missed.

Code that is of interest and that performs packet processing should be
annotated with kcov_remote_start()/kcov_remote_stop().

An alternative approach is to determine kcov_handle solely on the
basis of the device/interface that received the specific socket
buffer. However, in this case it would be impossible to distinguish
between packets that originated during normal background network
processes or were intentionally injected from the user space.

Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 18:01:34 -08:00
Aleksandr Nogikh
b08e84da20 kernel: make kcov_common_handle consider the current context
kcov_common_handle is a method that is used to obtain a "default" KCOV
remote handle of the current process. The handle can later be passed
to kcov_remote_start in order to collect coverage for the processing
that is initiated by one process, but done in another. For details see
Documentation/dev-tools/kcov.rst and comments in kernel/kcov.c.

Presently, if kcov_common_handle is called in an IRQ context, it will
return a handle for the interrupted process. This may lead to
unreliable and incorrect coverage collection.

Adjust the behavior of kcov_common_handle in the following way. If it
is called in a task context, return the common handle for the
currently running task. Otherwise, return 0.

Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 18:00:20 -08:00
Tom Rix
0e8c266c59 net: dsa: mt7530: remove unneeded semicolon
A semicolon is not needed after a switch statement.

Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20201031153047.2147341-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:51:28 -08:00
Tom Rix
c568db7fd0 net/mlx4_core : remove unneeded semicolon
A semicolon is not needed after a switch statement.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20201101140528.2279424-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:51:17 -08:00
Tom Rix
1c5825e664 net: stmmac: dwmac-meson8b: remove unneeded semicolon
A semicolon is not needed after a switch statement.

Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20201101140720.2280013-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:51:10 -08:00
Tom Rix
5d867245c4 net: core: remove unneeded semicolon
A semicolon is not needed after a switch statement.

Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20201101153647.2292322-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:51:02 -08:00
Tom Rix
9d253c02ac ethtool: remove unneeded semicolon
A semicolon is not needed after a switch statement.

Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20201101155601.2294374-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:50:53 -08:00
Tom Rix
f2219c322f tipc: remove unneeded semicolon
A semicolon is not needed after a switch statement.

Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20201101155822.2294856-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:50:43 -08:00
Jakub Kicinski
0b6f164d5a Merge branch 'generic-tx-reallocation-for-dsa'
Vladimir Oltean says:

====================
Generic TX reallocation for DSA

Christian has reported buggy usage of skb_put() in tag_ksz.c, which is
only triggerable in real life using his not-yet-published patches for
IEEE 1588 timestamping on Micrel KSZ switches.

The concrete problem there is that the driver can end up calling
skb_put() and exceed the end of the skb data area, because even though
it had reallocated the frame once before, it hadn't reallocated it large
enough. Christian explained it in more detail here:

https://lore.kernel.org/netdev/20201014161719.30289-1-ceggers@arri.de/
https://lore.kernel.org/netdev/20201016200226.23994-1-ceggers@arri.de/

But actually there's a bigger problem, which is that some taggers which
get more rarely tested tend to do some shenanigans which are uncaught
for the longest time, and in the meanwhile, their code gets copy-pasted
into other taggers, creating a mess. For example, the tail tagging
driver for Marvell 88E6060 currently reallocates _every_single_frame_ on
TX. Is that an obvious indication that nobody is using it? Sure. Is it a
good model to follow when developing a new tail tagging driver? No.

DSA has all the information it needs in order to simplify the job of a
tagger on TX. It knows whether it's a normal or a tail tagger, and what
is the protocol overhead it incurs. So this series performs the
reallocation centrally.

Changes in v3:
- Use dev_kfree_skb_any due to potential hardirq context in xmit path.

Changes in v2:
- Dropped the tx_realloc counters for now, since the patch was pretty
  controversial and I lack the time at the moment to introduce new UAPI
  for that.
- Do padding for tail taggers irrespective of whether they need to
  reallocate the skb or not.
====================

Link: https://lore.kernel.org/r/20201101191620.589272-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:20 -08:00
Vladimir Oltean
86c4ad9a78 net: dsa: tag_ar9331: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: Per Forlin <per.forlin@axis.com>
Cc: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Oleksij Rempel <linux@rempel-privat.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:17 -08:00
Vladimir Oltean
9b9826ae11 net: dsa: tag_gswip: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

This one is interesting, the DSA tag is 8 bytes on RX and 4 bytes on TX.
Because DSA is unaware of asymmetrical tag lengths, the overhead/needed
headroom is declared as 8 bytes and therefore 4 bytes larger than it
needs to be. If this becomes a problem, and the GSWIP driver can't be
converted to a uniform header length, we might need to make DSA aware of
separate RX/TX overhead values.

Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean
952a063450 net: dsa: tag_dsa: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Similar to the EtherType DSA tagger, the old Marvell tagger can
transform an 802.1Q header if present into a DSA tag, so there is no
headroom required in that case. But we are ensuring that it exists,
regardless (practically speaking, the headroom must be 4 bytes larger
than it needs to be).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean
2f0d030c5f net: dsa: tag_brcm: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean
c6c4e1237d net: dsa: tag_edsa: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Note that the VLAN code path needs a smaller extra headroom than the
regular EtherType DSA path. That isn't a problem, because this tagger
declares the larger tag length (8 bytes vs 4) as the protocol overhead,
so we are covered in both cases.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean
6ed94135f5 net: dsa: tag_lan9303: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean
941f66beb7 net: dsa: tag_mtk: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean
9c5c3bd005 net: dsa: tag_ocelot: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean
9bbda29ae1 net: dsa: tag_qca: let DSA core deal with TX reallocation
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: John Crispin <john@phrozen.org>
Cc: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Christian Eggers
ef3f72fee2 net: dsa: trailer: don't allocate additional memory for padding/tagging
The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Christian Eggers
88fda8eefd net: dsa: tag_ksz: don't allocate additional memory for padding/tagging
The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:16 -08:00
Vladimir Oltean
a3b0b64797 net: dsa: implement a central TX reallocation procedure
At the moment, taggers are left with the task of ensuring that the skb
headers are writable (which they aren't, if the frames were cloned for
TX timestamping, for flooding by the bridge, etc), and that there is
enough space in the skb data area for the DSA tag to be pushed.

Moreover, the life of tail taggers is even harder, because they need to
ensure that short frames have enough padding, a problem that normal
taggers don't have.

The principle of the DSA framework is that everything except for the
most intimate hardware specifics (like in this case, the actual packing
of the DSA tag bits) should be done inside the core, to avoid having
code paths that are very rarely tested.

So provide a TX reallocation procedure that should cover the known needs
of DSA today.

Note that this patch also gives the network stack a good hint about the
headroom/tailroom it's going to need. Up till now it wasn't doing that.
So the reallocation procedure should really be there only for the
exceptional cases, and for cloned packets which need to be unshared.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Christian Eggers <ceggers@arri.de> # For tail taggers only
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:41:15 -08:00
YueHaibing
92f9e238c9 openvswitch: Use IS_ERR instead of IS_ERR_OR_NULL
Fix smatch warning:

net/openvswitch/meter.c:427 ovs_meter_cmd_set() warn: passing zero to 'PTR_ERR'

dp_meter_create() never returns NULL, use IS_ERR
instead of IS_ERR_OR_NULL to fix this.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Link: https://lore.kernel.org/r/20201031060153.39912-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:34:26 -08:00
YueHaibing
36ed77cd05 net: hns3: Remove duplicated include
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20201031024940.29716-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:32:22 -08:00
YueHaibing
0b833eef92 liquidio: cn68xx: Remove duplicated include
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20201031024744.39020-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:32:13 -08:00
Yuchung Cheng
7e901ee7b6 tcp: avoid slow start during fast recovery on new losses
During TCP fast recovery, the congestion control in charge is by
default the Proportional Rate Reduction (PRR) unless the congestion
control module specified otherwise (e.g. BBR).

Previously when tcp_packets_in_flight() is below snd_ssthresh PRR
would slow start upon receiving an ACK that
   1) cumulatively acknowledges retransmitted data
   and
   2) does not detect further lost retransmission

Such conditions indicate the repair is in good steady progress
after the first round trip of recovery. Otherwise PRR adopts the
packet conservation principle to send only the amount that was
newly delivered (indicated by this ACK).

This patch generalizes the previous design principle to include
also the newly sent data beside retransmission: as long as
the delivery is making good progress, both retransmission and
new data should be accounted to make PRR more cautious in slow
starting.

Suggested-by: Matt Mathis <mattmathis@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201031013412.1973112-1-ycheng@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:17:40 -08:00
Jakub Kicinski
51e4082c7c Merge branch 'vlan-improvements-for-ocelot-switch'
Vladimir Oltean says:

====================
VLAN improvements for Ocelot switch

The main reason why I started this work is that deleting the bridge mdb
entries fails when the bridge is deleted, as described here:
https://lore.kernel.org/netdev/20201015173355.564934-1-vladimir.oltean@nxp.com/

In short, that happens because the bridge mdb entries are added with a
vid of 1, but deletion is attempted with a vid of 0. So the deletion
code fails to find the mdb entries.

The solution is to make ocelot use a pvid of 0 when it is under a bridge
with vlan_filtering 0. When vlan_filtering is 1, the pvid of the bridge
is what is programmed into the hardware.

The patch series also uncovers more bugs and does some more cleanup, but
the above is the main idea behind it.
====================

Link: https://lore.kernel.org/r/20201031102916.667619-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:09:10 -08:00
Vladimir Oltean
9a72068080 net: dsa: felix: improve the workaround for multiple native VLANs on NPI port
After the good discussion with Florian from here:
https://lore.kernel.org/netdev/20200911000337.htwr366ng3nc3a7d@skbuf/

I realized that the VLAN settings on the NPI port (the hardware "CPU port",
in DSA parlance) don't actually make any difference, because that port
is hardcoded in hardware to use what mv88e6xxx would call "unmodified"
egress policy for VLANs.

So earlier patch 183be6f967 ("net: dsa: felix: send VLANs on CPU port
as egress-tagged") was incorrect in the sense that it didn't actually
make the VLANs be sent on the NPI port as egress-tagged. It only made
ocelot_port_set_native_vlan shut up.

Now that we have moved the check from ocelot_port_set_native_vlan to
ocelot_vlan_prepare, we can simply shunt ocelot_vlan_prepare from DSA,
and avoid calling it. This is the correct way to deal with things,
because the NPI port configuration is DSA-specific, so the ocelot switch
library should not have the check for multiple native VLANs refined in
any way, it is correct the way it is.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:09:07 -08:00
Vladimir Oltean
2f0402fedf net: mscc: ocelot: deny changing the native VLAN from the prepare phase
Put the preparation phase of switchdev VLAN objects to some good use,
and move the check we already had, for preventing the existence of more
than one egress-untagged VLAN per port, to the preparation phase of the
addition.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:09:07 -08:00
Vladimir Oltean
be0576fed6 net: mscc: ocelot: move the logic to drop 802.1p traffic to the pvid deletion
Currently, the ocelot_port_set_native_vlan() function starts dropping
untagged and prio-tagged traffic when the native VLAN is removed?

What is the native VLAN? It is the only egress-untagged VLAN that ocelot
supports on a port. If the port is a trunk with 100 VLANs, one of those
VLANs can be transmitted as egress-untagged, and that's the native VLAN.

Is it wrong to drop untagged and prio-tagged traffic if there's no
native VLAN? Yes and no.

In this case, which is more typical, it's ok to apply that drop
configuration:
$ bridge vlan add dev swp0 vid 1 pvid untagged <- this is the native VLAN
$ bridge vlan add dev swp0 vid 100
$ bridge vlan add dev swp0 vid 101
$ bridge vlan del dev swp0 vid 1 <- delete the native VLAN
But only because the pvid and the native VLAN have the same ID.

In this case, it isn't:
$ bridge vlan add dev swp0 vid 1 pvid
$ bridge vlan add dev swp0 vid 100 untagged <- this is the native VLAN
$ bridge vlan del dev swp0 vid 101
$ bridge vlan del dev swp0 vid 100 <- delete the native VLAN

It's wrong, because the switch will drop untagged and prio-tagged
traffic now, despite having a valid pvid of 1.

The confusion seems to stem from the fact that the native VLAN is an
egress setting, while the PVID is an ingress setting. It would be
correct to drop untagged and prio-tagged traffic only if there was no
pvid on the port. So let's do just that.

Background:
https://lore.kernel.org/netdev/CA+h21hrRMrLH-RjBGhEJSTZd6_QPRSd3RkVRQF-wNKkrgKcRSA@mail.gmail.com/#t

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:09:06 -08:00
Vladimir Oltean
e2b2e83e52 net: mscc: ocelot: add a "valid" boolean to struct ocelot_vlan
Currently we are checking in some places whether the port has a native
VLAN on egress or not, by comparing the ocelot_port->vid value with zero.

That works, because VID 0 can never be a native VLAN configured by the
bridge, but now we want to make similar checks for the pvid. That won't
work, because there are cases when we do have the pvid set to 0 (not by
the bridge, by ourselves, but still.. it's confusing). And we can't
encode a negative value into an u16, so add a bool to the structure.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:09:06 -08:00
Vladimir Oltean
c3e58a750e net: mscc: ocelot: transform the pvid and native vlan values into a structure
This is a mechanical patch only.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:09:06 -08:00
Vladimir Oltean
110e847ca7 net: mscc: ocelot: don't reset the pvid to 0 when deleting it
I have no idea why this code is here, but I have 2 hypotheses:

1.
A desperate attempt to keep untagged traffic working when the bridge
deletes the pvid on a port.

There was a fairly okay discussion here:
https://lore.kernel.org/netdev/CA+h21hrRMrLH-RjBGhEJSTZd6_QPRSd3RkVRQF-wNKkrgKcRSA@mail.gmail.com/#t
which established that in vlan_filtering=1 mode, the absence of a pvid
should denote that the ingress port should drop untagged and priority
tagged traffic. While in vlan_filtering=0 mode, nothing should change.

So in vlan_filtering=1 mode, we should simply let things happen, and not
attempt to save the day. And in vlan_filtering=0 mode, the pvid is 0
anyway, no need to do anything.

2.
The driver encodes the native VLAN (ocelot_port->vid) value of 0 as
special, meaning "not valid". There are checks based on that. But there
are no such checks for the ocelot_port->pvid value of 0. In fact, that's
a perfectly valid value, which is used in standalone mode. Maybe there
was some confusion and the author thought that 0 means "invalid" here as
well.

In conclusion, delete the code*.

*in fact we'll add it back later, in a slightly different form, but for
an entirely different reason than the one for which this exists now.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:09:06 -08:00
Vladimir Oltean
75e5a554c8 net: mscc: ocelot: use the pvid of zero when bridged with vlan_filtering=0
Currently, mscc_ocelot ports configure pvid=0 in standalone mode, and
inherit the pvid from the bridge when one is present.

When the bridge has vlan_filtering=0, the software semantics are that
packets should be received regardless of whether there's a pvid
configured on the ingress port or not. However, ocelot does not observe
those semantics today.

Moreover, changing the PVID is also a problem with vlan_filtering=0.
We are privately remapping the VID of FDB, MDB entries to the port's
PVID when those are VLAN-unaware (i.e. when the VID of these entries
comes to us as 0). But we have no logic of adjusting that remapping when
the user changes the pvid and vlan_filtering is 0. So stale entries
would be left behind, and untagged traffic will stop matching on them.

And even if we were to solve that, there's an even bigger problem. If
swp0 has pvid 1, and swp1 has pvid 2, and both are under a vlan_filtering=0
bridge, they should be able to forward traffic between one another.
However, with ocelot they wouldn't do that.

The simplest way of fixing this is to never configure the pvid based on
what the bridge is asking for, when vlan_filtering is 0. Only if there
was a VLAN that the bridge couldn't mangle, that we could use as pvid....
So, turns out, there's 0 just for that. And for a reason: IEEE
802.1Q-2018, page 247, Table 9-2-Reserved VID values says:

	The null VID. Indicates that the tag header contains only
	priority information; no VID is present in the frame.
	This VID value shall not be configured as a PVID or a member
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	of a VID Set, or configured in any FDB entry, or used in any
	Management operation.

So, aren't we doing exactly what 802.1Q says not to? Well, in a way, but
what we're doing here is just driver-level bookkeeping, all for the
better. The fact that we're using a pvid of 0 is not observable behavior
from the outside world: the network stack does not see the classified
VLAN that the switch uses, in vlan_filtering=0 mode. And we're also more
consistent with the standalone mode now.

And now that we use the pvid of 0 in this mode, there's another advantage:
we don't need to perform any VID remapping for FDB and MDB entries either,
we can just use the VID of 0 that the bridge is passing to us.

The only gotcha is that every time we change the vlan_filtering setting,
we need to reapply the pvid (either to 0, or to the value from the bridge).
A small side-effect visible in the patch is that ocelot_port_set_pvid
needs to be moved above ocelot_port_vlan_filtering, so that it can be
called from there without forward-declarations.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:09:06 -08:00
Jakub Kicinski
802dcb4340 Merge branch 'net-ethernet-ti-am65-cpsw-add-multi-port-support-in-mac-only-mode'
Grygorii Strashko says:

====================
net: ethernet: ti: am65-cpsw: add multi port support in mac-only mode

This series adds multi-port support in mac-only mode (multi MAC mode) to TI
AM65x CPSW driver in preparation for enabling support for multi-port devices,
like Main CPSW0 on K3 J721E SoC or future CPSW3g on K3 AM64x SoC.

The multi MAC mode is implemented by configuring every enabled port in "mac-only"
mode (all ingress packets are sent only to the Host port and egress packets
directed to target Ext. Port) and creating separate net_device for
every enabled Ext. port.

This series does not affect on existing CPSW2g one Ext. Port devices and xmit
path changes are done only for multi-port devices by splitting xmit path for
one-port and multi-port devices.

Patches 1-3: Preparation patches to improve K3 CPSW configuration depending on DT
Patches 4-5: Fix VLAN offload for multi MAC mode
Patch 6: Fixes CPTS context lose issue during PM runtime transition
Patch 7: Fixes TX csum offload for multi MAC mode
Patches 8-9: add multi-port support to TI AM65x CPSW
Patch 10: handle deferred probe with new dev_err_probe() API

changes in v3:
 - rebased
 - added Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
 - added Patch 10 which is minor optimization

changes in v2:
- patch 8: xmit path split for one-port and multi-port devices to avoid
  performance losses
- patch 9: fixed the case when Port 1 is disabled
- Patch 7: added fix for TX csum offload

v2: https://lore.kernel.org/patchwork/cover/1321608/
v1: https://lore.kernel.org/patchwork/cover/1315766/
====================

Link: https://lore.kernel.org/r/20201030200707.24294-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:10 -08:00
Grygorii Strashko
8fbc2f9edc net: ethernet: ti: am65-cpsw: handle deferred probe with dev_err_probe()
Use new dev_err_probe() API to handle deferred probe properly and simplify
the code.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00
Grygorii Strashko
84b4aa4932 net: ethernet: ti: am65-cpsw: add multi port support in mac-only mode
This patch adds final multi-port support to TI AM65x CPSW driver path in
preparation for adding support for multi-port devices, like Main CPSW0 on
K3 J721E SoC or future CPSW3g on K3 AM64x SoC.
- the separate netdev is created for every enabled external Port;
- DMA channels are common/shared for all external Ports and the RX/TX NAPI
and DMA processing assigned to first available netdev;
- external Ports are configured in mac-only mode, which is similar to TI
"dual-mac" mode for legacy TI CPSW - packets are sent to the Host port only
in ingress and directly to the Port on egress. No packet switching between
external ports happens.
- every port supports the same features as current AM65x CPSW on external
device.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00
Grygorii Strashko
a9e60cf0b4 net: ethernet: ti: am65-cpsw: prepare xmit/rx path for multi-port devices in mac-only mode
This patch adds multi-port support to TI AM65x CPSW driver xmit/rx path in
preparation for adding support for multi-port devices, like Main CPSW0 on
K3 J721E SoC or future CPSW3g on K3 AM64x SoC.
Hence DMA channels are common/shared for all ext Ports and the RX/TX NAPI
and DMA processing going to be assigned to first available netdev this patch:
 - ensures all RX descriptors fields are initialized;
 - adds synchronization for TX DMA push/pop operation (locking) as
Networking core locks are not enough any more;
 - updates TX bql processing for every packet in
am65_cpsw_nuss_tx_compl_packets() as every completed TX skb can have
different ndev assigned (come from different netdevs).

To avoid performance issues for existing one-port CPSW2g devices the above
changes are done only for multi-port devices by splitting xmit path for
one-port and multi-port devices.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00
Grygorii Strashko
97067aaf12 net: ethernet: ti: am65-cpsw: fix tx csum offload for multi mac mode
The current implementation uses .ndo_set_features() callback to track
NETIF_F_HW_CSUM feature changes and update generic
CPSW_P0_CONTROL_REG.RX_CHECKSUM_EN option accordingly. It's not going to
work in case of multi-port devices as TX csum offload can be changed per
netdev.

On K3 CPSWxG devices TX csum offload enabled in the following way:

 - the CPSW_P0_CONTROL_REG.RX_CHECKSUM_EN option enables TX csum offload in
generic and affects all TX DMA channels and packets;

 - corresponding fields in TX DMA descriptor have to be filed properly when
upper layer wants to offload TX csum (skb->ip_summed == CHECKSUM_PARTIAL)
and it's per-packet option.

The Linux Network core is expected to never request TX csum offload if
netdev NETIF_F_HW_CSUM feature is disabled, and, as result, TX DMA
descriptors should not be modified, and per-packet TX csum offload will be
disabled (or enabled) on per-netdev basis. Which, in turn, makes it safe to
enable the CPSW_P0_CONTROL_REG.RX_CHECKSUM_EN option unconditionally.

Hence, fix TX csum offload for multi-port devices by:
 - enabling the CPSW_P0_CONTROL_REG.RX_CHECKSUM_EN option in
am65_cpsw_nuss_common_open() unconditionally
 - and removing .ndo_set_features() callback implementation, which was used
only NETIF_F_HW_CSUM feature update purposes

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00
Grygorii Strashko
a9c7470072 net: ethernet: ti: am65-cpsw: keep active if cpts enabled
Some K3 CPSW NUSS instances can lose context after PM runtime ON->OFF->ON
transition depending on integration (including all submodules: CPTS, MDIO,
etc), like J721E Main CPSW (CPSW9G).

In case CPTS is enabled it's initialized during probe and does not expect
to be reset. Hence, keep K3 CPSW active by forbidding PM runtime if CPTS is
enabled.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00
Grygorii Strashko
2d64a03432 net: ethernet: ti: am65-cpsw: fix vlan offload for multi mac mode
The VLAN offload for AM65x CPSW2G is implemented using existing ALE APIs,
which are also used by legacy CPSW drivers.
So, now it always adds current Ext. Port and Host as VLAN members when VLAN
is added by 8021Q core (.ndo_vlan_rx_add_vid) and forcibly removes VLAN
from ALE table in .ndo_vlan_rx_kill_vid(). This works as for AM65x CPSW2G
(which has only one Ext. Port) as for legacy CPSW devices (which can't
support same VLAN on more then one Port in multi mac (dual-mac) mode). But
it doesn't work for the new J721E and AM64x multi port CPSWxG versions
doesn't have such restrictions and allow to offload the same VLAN on any
number of ports.

Now the attempt to add same VLAN on two (or more) K3 CPSWxG Ports will
cause:
 - VLAN members mask overwrite when VLAN is added
 - VLAN removal from ALE table when any Port removes VLAN

This patch fixes an issue by:
 - switching to use cpsw_ale_vlan_add_modify() instead of
   cpsw_ale_add_vlan() when VLAN is added to ALE table, so VLAN members
   mask will not be overwritten;
 - Updates cpsw_ale_del_vlan() as:
     if more than one ext. Port is in VLAN member mask
     then remove only current port from VLAN member mask
     else remove VLAN ALE entry

 Example:
  add: P1 | P0 (Host) -> members mask: P1 | P0
  add: P2 | P0        -> members mask: P2 | P1 | P0
  rem: P1 | P0        -> members mask: P2 | P0
  rem: P2 | P0        -> members mask: -

The VLAN is forcibly removed if port_mask=0 passed to cpsw_ale_del_vlan()
to preserve existing legacy CPSW drivers functionality.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00
Grygorii Strashko
82882bd56a net: ethernet: ti: cpsw_ale: add cpsw_ale_vlan_del_modify()
Add/export cpsw_ale_vlan_del_modify() and use it in cpsw_switchdev instead
of generic cpsw_ale_del_vlan() to avoid mixing 8021Q and switchdev VLAN
offload. This is preparation patch equired by follow up changes.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00
Grygorii Strashko
6a40e2890e net: ethernet: ti: am65-cpsw: use cppi5_desc_is_tdcm()
Use cppi5_desc_is_tdcm() helper for teardown indicator detection instead of
hard-coded value.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00
Grygorii Strashko
c6275c02a0 net: ethernet: ti: am65-cpsw: move free desc queue mode selection in pdata
In preparation of adding more multi-port K3 CPSW versions move free
descriptor queue mode selection in am65_cpsw_pdata, so it can be selected
basing on DT compatibility property.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00
Grygorii Strashko
7747d4b72f net: ethernet: ti: am65-cpsw: move ale selection in pdata
In preparation of adding more multi-port K3 CPSW versions move ALE
selection in am65_cpsw_pdata, so it can be selected basing on DT
compatibility property.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 16:41:07 -08:00