Commit graph

1138964 commits

Author SHA1 Message Date
Michal Wilczynski
04d674f04e devlink: Allow for devlink-rate nodes parent reassignment
Currently it's not possible to reassign the parent of the node using one
command. As the previous commit introduced a way to export entire
hierarchy from the driver, being able to modify and reassign parents
become important. This way user might easily change QoS settings without
interrupting traffic.

Example command:
devlink port function rate set pci/0000:4b:00.0/1 parent node_custom_1

This reassigns leaf node parent to node_custom_1.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:41:26 -08:00
Michal Wilczynski
caba177d7f devlink: Enable creation of the devlink-rate nodes from the driver
Intel 100G card internal firmware hierarchy for Hierarchicial QoS is very
rigid and can't be easily removed. This requires an ability to export
default hierarchy to allow user to modify it. Currently the driver is
only able to create the 'leaf' nodes, which usually represent the vport.
This is not enough for HQoS implemented in Intel hardware.

Introduce new function devl_rate_node_create() that allows for creation
of the devlink-rate nodes from the driver.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:41:26 -08:00
Michal Wilczynski
6e2d7e84fc devlink: Introduce new attribute 'tx_weight' to devlink-rate
To fully utilize offload capabilities of Intel 100G card QoS capabilities
new attribute 'tx_weight' needs to be introduced. This attribute allows
for usage of Weighted Fair Queuing arbitration scheme among siblings.
This arbitration scheme can be used simultaneously with the strict
priority.

Introduce new attribute in devlink-rate that will allow for configuration
of Weighted Fair Queueing. New attribute is optional.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:41:26 -08:00
Michal Wilczynski
cd50223683 devlink: Introduce new attribute 'tx_priority' to devlink-rate
To fully utilize offload capabilities of Intel 100G card QoS capabilities
new attribute 'tx_priority' needs to be introduced. This attribute allows
for usage of strict priority arbiter among siblings. This arbitration
scheme attempts to schedule nodes based on their priority as long as the
nodes remain within their bandwidth limit.

Introduce new attribute in devlink-rate that will allow for configuration
of strict priority. New attribute is optional.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:41:25 -08:00
Jakub Kicinski
4ab45e973c Merge branch 'autoload-dsa-tagging-driver-when-dynamically-changing-protocol'
Vladimir Oltean says:

====================
Autoload DSA tagging driver when dynamically changing protocol

This patch set solves the issue reported by Michael and Heiko here:
https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/
making full use of Michael's suggestion of having two modaliases: one
gets used for loading the tagging protocol when it's the default one
reported by the switch driver, the other gets loaded at user's request,
by name.

  # modinfo tag_ocelot
  filename:       /lib/modules/6.1.0-rc4+/kernel/net/dsa/tag_ocelot.ko
  license:        GPL v2
  alias:          dsa_tag:seville
  alias:          dsa_tag:id-21
  alias:          dsa_tag:ocelot
  alias:          dsa_tag:id-15
  depends:        dsa_core
  intree:         Y
  name:           tag_ocelot
  vermagic:       6.1.0-rc4+ SMP preempt mod_unload modversions aarch64

Tested on NXP LS1028A-RDB with the following device tree addition:

&mscc_felix_port4 {
	dsa-tag-protocol = "ocelot-8021q";
};

&mscc_felix_port5 {
	dsa-tag-protocol = "ocelot-8021q";
};

CONFIG_NET_DSA and everything that depends on it is built as module.
Everything auto-loads, and "cat /sys/class/net/eno2/dsa/tagging" shows
"ocelot-8021q". Traffic works as well. Furthermore, "echo ocelot-8021q"
into the aforementioned sysfs file now auto-loads the driver for it.
====================

Link: https://lore.kernel.org/r/20221115011847.2843127-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:16:47 -08:00
Vladimir Oltean
0184c07a11 net: dsa: autoload tag driver module on tagging protocol change
Issue a request_module() call when an attempt to change the tagging
protocol is made, either by sysfs or by device tree. In the case of
ocelot (the only driver for which the default and the alternative
tagging protocol are compiled as different modules), the user is now no
longer required to insert tag_ocelot_8021q.ko manually.

In the particular case of ocelot, this solves a problem where
tag_ocelot_8021q.ko is built as module, and this is present in the
device tree:

&mscc_felix_port4 {
	dsa-tag-protocol = "ocelot-8021q";
};

&mscc_felix_port5 {
	dsa-tag-protocol = "ocelot-8021q";
};

Because no one attempts to load the module into the kernel at boot time,
the switch driver will fail to probe (actually forever defer) until
someone manually inserts tag_ocelot_8021q.ko. This is now no longer
necessary and happens automatically.

Rename dsa_find_tagger_by_name() to denote the change in functionality:
there is now feature parity with dsa_tag_driver_get_by_id(), i.o.w. we
also load the module if it's missing.

Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/
Suggested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:16:42 -08:00
Vladimir Oltean
54c087e839 net: dsa: rename dsa_tag_driver_get() to dsa_tag_driver_get_by_id()
A future patch will introduce one more way of getting a reference on a
tagging protocl driver (by name). Rename the current method to "by_id".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:16:42 -08:00
Vladimir Oltean
e8666130b9 net: dsa: strip sysfs "tagging" string of trailing newline
Currently, dsa_find_tagger_by_name() uses sysfs_streq() which works both
with strings that contain \n at the end (echo ocelot > .../dsa/tagging)
and with strings that don't (printf ocelot > .../dsa/tagging).

There will be a problem once we'll want to construct the modalias string
based on which we auto-load the protocol kernel module. If the sysfs
buffer ends in a newline, we need to strip it first. This is a
preparatory patch specifically for that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:16:42 -08:00
Vladimir Oltean
94793a56b3 net: dsa: provide a second modalias to tag proto drivers based on their name
Currently, tagging protocol drivers have a modalias of
"dsa_tag:id-<number>", where the number is one of DSA_TAG_PROTO_*_VALUE.

This modalias makes it possible for the request_module() call in
dsa_tag_driver_get() to work, given the input it has - an integer
returned by ds->ops->get_tag_protocol().

It is also possible to change tagging protocols at (pseudo-)runtime, via
sysfs or via device tree, and this works via the name string of the
tagging protocol rather than via its id (DSA_TAG_PROTO_*_VALUE).

In the latter case, there is no request_module() call, because there is
no association that the DSA core has between the string name and the ID,
to construct the modalias. The module is simply assumed to have been
inserted. This is actually slightly problematic when the tagging
protocol change should take place at probe time, since it's expected
that the dependency module should get autoloaded.

For this purpose, let's introduce a second modalias, so that the DSA
core can call request_module() by name. There is no reason to make the
modalias by name optional, so just modify the MODULE_ALIAS_DSA_TAG_DRIVER()
macro to take both the ID and the name as arguments, and generate two
modaliases behind the scenes.

Suggested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:16:41 -08:00
Vladimir Oltean
2610937d7e net: dsa: rename tagging protocol driver modalias
It's autumn cleanup time, and today's target are modaliases.

Michael says that for users of modinfo, "dsa_tag-20" is not the most
suggestive name, and recommends a change to "dsa_tag-id-20".

Andrew points out that other modaliases have a prefix delimited by
colons, so he recommends "dsa_tag:20" instead of "dsa_tag-20".

To satisfy both proposals, Florian recommends "dsa_tag:id-20".

The modaliases are not stable ABI, and the essential information
(protocol ID) is still conveyed in the new string, which
request_module() must be adapted to form.

Link: 20221027210830.3577793-1-vladimir.oltean@nxp.com
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Michael Walle <michael@walle.cc>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:16:41 -08:00
Vladimir Oltean
9999f85ba3 net: dsa: stop exposing tag proto module helpers to the world
The DSA tagging protocol driver macros are in the public include/net/dsa.h
probably because that's also where the DSA_TAG_PROTO_*_VALUE macros are
(MODULE_ALIAS_DSA_TAG_DRIVER hinges on those macro definitions).

But there is no reason to expose these helpers to <net/dsa.h>. That
header is shared between switch drivers (drivers/net/dsa/), tagging
protocol drivers (net/dsa/tag_*.c), the DSA core (net/dsa/ sans tag_*.c),
and the rest of the world (DSA master drivers, network stack, etc).
Too much exposure.

On the other hand, net/dsa/dsa_priv.h is included only by the DSA core
and by DSA tagging protocol drivers (or IOW, "friend" modules). Also a
bit too much exposure - I've contemplated creating a new header which is
only included by tagging protocol drivers, but completely separating a
new dsa_tag_proto.h from dsa_priv.h is not immediately trivial - for
example dsa_slave_to_port() is used both from the fast path and from the
control path.

So for now, move these definitions to dsa_priv.h which at least hides
them from the world.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:16:40 -08:00
Robert Marko
4a8c14384f dt-bindings: net: ipq4019-mdio: document required clock-names
IPQ5018, IPQ6018 and IPQ8074 require clock-names to be set as driver is
requesting the clock based on it and not index, so document that and make
it required for the listed SoC-s.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-4-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:14:38 -08:00
Robert Marko
e50c50367d dt-bindings: net: ipq4019-mdio: require and validate clocks
Now that we can match the platforms requiring clocks by compatible start
using those to allow clocks per compatible and make them required.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-3-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:14:37 -08:00
Robert Marko
05c1cbb96f dt-bindings: net: ipq4019-mdio: add IPQ8074 compatible
Allow using IPQ8074 specific compatible along with the fallback IPQ4019
one in order to be able to specify which compatibles require clocks to
be able to validate them via schema.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-2-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:14:37 -08:00
Robert Marko
cbe5f7c0fb dt-bindings: net: ipq4019-mdio: document IPQ6018 compatible
Document IPQ6018 compatible that is already being used in the DTS along
with the fallback IPQ4019 compatible as driver itself only gets probed
on IPQ4019 and IPQ5018 compatibles.

This is also required in order to specify which platform require clock to
be defined and validate it in schema.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-1-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 21:14:36 -08:00
Jakub Kicinski
2ffff1449d Merge branch 'net-dsa-use-more-appropriate-net_name_-constants-for-user-ports'
Rasmus Villemoes says:

====================
net: dsa: use more appropriate NET_NAME_* constants for user ports

The intention of commit 685343fc3b ("net: add name_assign_type
netdev attribute") was clearly that drivers be switched over one by
one to select appropriate NET_NAME_* constants instead of
NET_NAME_UNKNOWN. This small series attempts to do that for DSA user
ports.

This is obviously and intentionally user-visible changes, so there's a
small chance that it could lead to a regression. To make it easy to
revert either of the "label in DT" and "fallback to eth%d" changes,
this is done as a refactoring which shouldn't introduce any functional
change (but by itself adds code which looks a little odd, with the two
identical assignments in the two branches), followed by changing the
constant used in each case in two different patches.
====================

Link: https://lore.kernel.org/r/20221116105205.1127843-1-linux@rasmusvillemoes.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 19:40:15 -08:00
Rasmus Villemoes
b8790661d9 net: dsa: set name_assign_type to NET_NAME_ENUM for enumerated user ports
When a user port does not have a label in device tree, and we thus
fall back to the eth%d scheme, the proper constant to use is
NET_NAME_ENUM. See also commit e9f656b7a2 ("net: ethernet: set
default assignment identifier to NET_NAME_ENUM"), which in turn quoted
commit 685343fc3b ("net: add name_assign_type netdev attribute"):

    ... when the kernel has given the interface a name using global
    device enumeration based on order of discovery (ethX, wlanY, etc)
    ... are labelled NET_NAME_ENUM.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.faineli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 19:40:08 -08:00
Rasmus Villemoes
6fdb038420 net: dsa: use NET_NAME_PREDICTABLE for user ports with name given in DT
When a user port has a label in device tree, the corresponding
netdevice is, to quote include/uapi/linux/netdevice.h, "predictably
named by the kernel". This is also explicitly one of the intended use
cases for NET_NAME_PREDICTABLE, quoting 685343fc3b ("net: add
name_assign_type netdev attribute"):

  NET_NAME_PREDICTABLE:
    The ifname has been assigned by the kernel in a predictable way
    [...] Examples include [...] and names deduced from hardware
    properties (including being given explicitly by the firmware).

Expose that information properly for the benefit of userspace tools
that make decisions based on the name_assign_type attribute,
e.g. a systemd-udev rule with "kernel" in NamePolicy.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.faineli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 19:40:08 -08:00
Rasmus Villemoes
0171a1d22b net: dsa: refactor name assignment for user ports
The following two patches each have a (small) chance of causing
regressions for userspace and will in that case of course need to be
reverted.

In order to prepare for that and make those two patches independent
and individually revertable, refactor the code which sets the names
for user ports by moving the "fall back to eth%d if no label is given
in device tree" to dsa_slave_create().

No functional change (at least none intended).

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.faineli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 19:40:07 -08:00
Vincent Mailhol
f20a0a0519 ethtool: doc: clarify what drivers can implement in their get_drvinfo()
Many of the drivers which implement ethtool_ops::get_drvinfo() will
prints the .driver, .version or .bus_info of struct ethtool_drvinfo.
To have a glance of current state, do:

  $ git grep -W "get_drvinfo(struct"

Printing in those three fields is useless because:

  - since [1], the driver version should be the kernel version (at
    least for upstream drivers). Arguably, out of tree drivers might
    still want to set a custom version, but out of tree is not our
    focus.

  - since [2], the core is able to provide default values for .driver
    and .bus_info.

In summary, drivers may provide .fw_version and .erom_version, the
rest is expected to be done by the core.

In struct ethtool_ops doc from linux/ethtool: rephrase field
get_drvinfo() doc to discourage developers from implementing this
callback.

In struct ethtool_drvinfo doc from uapi/linux/ethtool.h: remove the
paragraph mentioning what drivers should do. Rationale: no need to
repeat what is already written in struct ethtool_ops doc. But add a
note that .fw_version and .erom_version are driver defined.

Also update the dummy driver and simply remove the callback in order
not to confuse the newcomers: most of the drivers will not need this
callback function any more.

[1] commit 6a7e25c7fb ("net/core: Replace driver version to be
    kernel version")
Link: https://git.kernel.org/torvalds/linux/c/6a7e25c7fb48

[2] commit edaf5df22c ("ethtool: ethtool_get_drvinfo: populate
    drvinfo fields even if callback exits")
Link: https://git.kernel.org/netdev/net-next/c/edaf5df22cb8

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20221116171828.4093-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 19:26:02 -08:00
Jakub Kicinski
224b744abf Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/linux/bpf.h
  1f6e04a1c7 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value")
  aa3496accc ("bpf: Refactor kptr_off_tab into btf_record")
  f71b2f6417 ("bpf: Refactor map->off_arr handling")
https://lore.kernel.org/all/20221114095000.67a73239@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17 18:30:39 -08:00
Linus Torvalds
847ccab8fd Networking fixes for 6.1-rc6, including fixes from bpf
Current release - regressions:
 
   - tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init()
 
 Previous releases - regressions:
 
   - bridge: fix memory leaks when changing VLAN protocol
 
   - dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims
 
   - dsa: don't leak tagger-owned storage on switch driver unbind
 
   - eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6 is removed
 
   - eth: stmmac: ensure tx function is not running in stmmac_xdp_release()
 
   - eth: hns3: fix return value check bug of rx copybreak
 
 Previous releases - always broken:
 
   - kcm: close race conditions on sk_receive_queue
 
   - bpf: fix alignment problem in bpf_prog_test_run_skb()
 
   - bpf: fix writing offset in case of fault in strncpy_from_kernel_nofault
 
   - eth: macvlan: use built-in RCU list checking
 
   - eth: marvell: add sleep time after enabling the loopback bit
 
   - eth: octeon_ep: fix potential memory leak in octep_device_setup()
 
 Misc:
 
   - tcp: configurable source port perturb table size
 
   - bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmN2FlMSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkWAwQAJcV7XEB7bEssgabFkEmC4uvS/sFlyHC
 uSwFRn5ojaB2c56T1CnNYmitg9Wr4arC6Vca28iai6BgqB6t4qLRI/WWTsZiEPhi
 mt/pjNN2u9JMyaafHFHYfXnbSDWRF7kPMpNw4l3uL0vkGyjSI7LGAOP4Qh8C1h/d
 tNVSDZnj4Laj/3JtDf7Rk6ydCqPYnNdWxFfoZ/SQkjYZKD3Ze9tml7WJykAzCTLp
 yUiPC6TvHOnWIZYbB04sVVOQD4V+95TSOgEhB6wzs/CXB7iBEY+N+oCedjP9Xrfw
 n3ea4anBoTleDnJXJI57LhdJBkyoXncfbpbYLwXljyIgosr7XVTALvOG8XUhg/DW
 FzN5DWQ54jzTsx2eXFJzjQQcDIgyxazk9EdoHdqF8byCasP+fofq1JvzyqtvNSyh
 h8Ps6jdMZrWpXuFDVApXUhP32A/+9q+dFSYHJO681m6mf4CIaUXdm4aB1dkxDAvg
 PSlk797U94RQCzJgqxhrgsq1PGQPBb+qadZrAiD3aQi26g0NWCTg7uFpCeCEK2ZF
 fLwc2XxrwLQm1q7xQVoEg4UxPIIf0mUesvOD9sLDYop0rFIw8x0v7jdYM4kyhN3o
 6FWAXKxBe3LJ9jTTsVTbZbfHYpTnS8Q2KSclBN+/dZNHwwsUPHjy17Z2Ct3o3Jlm
 lNbiiD30BgsD
 =vVJk
 -----END PGP SIGNATURE-----

Merge tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf.

  Current release - regressions:

   - tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init()

  Previous releases - regressions:

   - bridge: fix memory leaks when changing VLAN protocol

   - dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims

   - dsa: don't leak tagger-owned storage on switch driver unbind

   - eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6
     is removed

   - eth: stmmac: ensure tx function is not running in
     stmmac_xdp_release()

   - eth: hns3: fix return value check bug of rx copybreak

  Previous releases - always broken:

   - kcm: close race conditions on sk_receive_queue

   - bpf: fix alignment problem in bpf_prog_test_run_skb()

   - bpf: fix writing offset in case of fault in
     strncpy_from_kernel_nofault

   - eth: macvlan: use built-in RCU list checking

   - eth: marvell: add sleep time after enabling the loopback bit

   - eth: octeon_ep: fix potential memory leak in octep_device_setup()

  Misc:

   - tcp: configurable source port perturb table size

   - bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)"

* tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
  net: use struct_group to copy ip/ipv6 header addresses
  net: usb: smsc95xx: fix external PHY reset
  net: usb: qmi_wwan: add Telit 0x103a composition
  netdevsim: Fix memory leak of nsim_dev->fa_cookie
  tcp: configurable source port perturb table size
  l2tp: Serialize access to sk_user_data with sk_callback_lock
  net: thunderbolt: Fix error handling in tbnet_init()
  net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start()
  net: lan966x: Fix potential null-ptr-deref in lan966x_stats_init()
  net: dsa: don't leak tagger-owned storage on switch driver unbind
  net/x25: Fix skb leak in x25_lapb_receive_frame()
  net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open()
  bridge: switchdev: Fix memory leaks when changing VLAN protocol
  net: hns3: fix setting incorrect phy link ksettings for firmware in resetting process
  net: hns3: fix return value check bug of rx copybreak
  net: hns3: fix incorrect hw rss hash type of rx packet
  net: phy: marvell: add sleep time after enabling the loopback bit
  net: ena: Fix error handling in ena_init()
  kcm: close race conditions on sk_receive_queue
  net: ionic: Fix error handling in ionic_init_module()
  ...
2022-11-17 08:58:36 -08:00
Dan Carpenter
b4b221bd79 net: ethernet: renesas: Fix return type in rswitch_etha_wait_link_verification()
The rswitch_etha_wait_link_verification() is supposed to return zero
on success or negative error codes.  Unfortunately it is declared as a
bool so the caller treats everything as success.

Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/Y3OPo6AOL6PTvXFU@kili
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-17 14:29:52 +01:00
Kalle Valo
e7e40cc655 iwlwifi patches intended for v6.2
* iwlmei fixes
 * Debug mechanism update for new devices (BZ)
 * Checksum offload fix for the new devices (BZ)
 * A few rate scale fixes and cleanups
 * A fix for iwlwifi debug mechanism
 * Start of MLO preparations - supporting new key API
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEE9cg2NujikJ5EMZusCDCCYA5zdzwFAmNs5fsbHGdyZWdvcnku
 Z3JlZW5tYW5AaW50ZWwuY29tAAoJEAgwgmAOc3c8HycP/29xLeQi5PDCuU4Y1oDq
 vp6ddZUwvfgNaeGn+hv4nbob/gpXzaWgXxum3QrE2AwB95U6UNjNLjCAKh+2HS3V
 tzClp5kNU6TWIwgUnfWfuGWQjqC5X9YOG+LLjKpDXZgm4StfMn5vg7U3temxIpye
 OA1Mdk992eJ+vp38zJmYIofh5GiZ/TXgxDoGMm5B2CeHyRt6WlUZh+9lFLXZ5maw
 JLfLqoAhQHs0WGGDSNjCCImmspCDiueiX07O3W9EO9A9Hmv6iSb1PDsxOcvcuxml
 YoeqgALtz3i6LIjeEtMi50dinlSr3THbxdBWQZF5QTpWZqvXvH5Zq9zq6ymU7f0Q
 tN35uIDgbY5W0ByXf021rTmifhHMwhiSp0WHdEgo/tKt9g1Zkr4FtoRL+JUbmRd9
 CCGErUb/7nCAtSi/FKqvF1xUouFws38i4xbvBI1Vg9lfAwqx6MEfkxwrREW78jur
 byl5HYtBfASAPt0oWaSUMBCNQwPGdj0rmgQNAkR/bMgS+9YZdicG4uuSVJQp77/P
 unE0eOyBCIcZBQji9l3V8kuZuvm7mZrAJp5POlRyvhtd4szCboq/OF7ke7LKYJqo
 tsuO/ouvkFjIJXtYUtNOxU51lBWoACPYJBTXe/4qc3O2cSl3KDXMQ7HKU8eGK42V
 sE3yKbbwuIZAm+VYTgnY0Uy4
 =ggBO
 -----END PGP SIGNATURE-----

Merge tag 'iwlwifi-next-for-kalle-2022-11-06-v2' of http://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next

iwlwifi patches intended for v6.2

* iwlmei fixes
* Debug mechanism update for new devices (BZ)
* Checksum offload fix for the new devices (BZ)
* A few rate scale fixes and cleanups
* A fix for iwlwifi debug mechanism
* Start of MLO preparations - supporting new key API
2022-11-17 14:53:45 +02:00
Dmitry Vyukov
b2e44aac91 NFC: nci: Allow to create multiple virtual nci devices
The current virtual nci driver is great for testing and fuzzing.
But it allows to create at most one "global" device which does not allow
to run parallel tests and harms fuzzing isolation and reproducibility.
Restructure the driver to allow creation of multiple independent devices.
This should be backwards compatible for existing tests.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20221115100017.787929-1-dvyukov@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-17 13:48:01 +01:00
Colin Ian King
710cfc6ab4 sundance: remove unused variable cnt
Variable cnt is just being incremented and it's never used
anywhere else. The variable and the increment are redundant so
remove it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221115093137.144002-1-colin.i.king@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-17 13:18:30 +01:00
Li zeming
b0798310f8 sctp: sm_statefuns: Remove pointer casts of the same type
The subh.addip_hdr pointer is also of type (struct sctp_addiphdr *), so
it does not require a cast.

Signed-off-by: Li zeming <zeming@nfschina.com>
Link: https://lore.kernel.org/r/20221115020705.3220-1-zeming@nfschina.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-17 13:04:37 +01:00
Hangbin Liu
58e0be1ef6 net: use struct_group to copy ip/ipv6 header addresses
kernel test robot reported warnings when build bonding module with
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/:

                 from ../drivers/net/bonding/bond_main.c:35:
In function ‘fortify_memcpy_chk’,
    inlined from ‘iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2,
    inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
  413 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘fortify_memcpy_chk’,
    inlined from ‘iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2,
    inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
  413 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is because we try to copy the whole ip/ip6 address to the flow_key,
while we only point the to ip/ip6 saddr. Note that since these are UAPI
headers, __struct_group() is used to avoid the compiler warnings.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: c3f8324188 ("net: Add full IPv6 addresses to flow_keys")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-17 10:42:45 +01:00
Alexandru Tachici
809ff97a67 net: usb: smsc95xx: fix external PHY reset
An external PHY needs settling time after power up or reset.
In the bind() function an mdio bus is registered. If at this point
the external PHY is still initialising, no valid PHY ID will be
read and on phy_find_first() the bind() function will fail.

If an external PHY is present, wait the maximum time specified
in 802.3 45.2.7.1.1.

Fixes: 05b35e7eb9 ("smsc95xx: add phylib support")
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221115114434.9991-2-alexandru.tachici@analog.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-17 10:38:37 +01:00
Enrico Sau
e103ba3399 net: usb: qmi_wwan: add Telit 0x103a composition
Add the following Telit LE910C4-WWX composition:

0x103a: rmnet

Signed-off-by: Enrico Sau <enrico.sau@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20221115105859.14324-1-enrico.sau@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-17 10:13:40 +01:00
Wang Yufen
064bc7312b netdevsim: Fix memory leak of nsim_dev->fa_cookie
kmemleak reports this issue:

unreferenced object 0xffff8881bac872d0 (size 8):
  comm "sh", pid 58603, jiffies 4481524462 (age 68.065s)
  hex dump (first 8 bytes):
    04 00 00 00 de ad be ef                          ........
  backtrace:
    [<00000000c80b8577>] __kmalloc+0x49/0x150
    [<000000005292b8c6>] nsim_dev_trap_fa_cookie_write+0xc1/0x210 [netdevsim]
    [<0000000093d78e77>] full_proxy_write+0xf3/0x180
    [<000000005a662c16>] vfs_write+0x1c5/0xaf0
    [<000000007aabf84a>] ksys_write+0xed/0x1c0
    [<000000005f1d2e47>] do_syscall_64+0x3b/0x90
    [<000000006001c6ec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

nsim_dev_trap_fa_cookie_write()
  kmalloc() fa_cookie
  nsim_dev->fa_cookie = fa_cookie
..
nsim_drv_remove()

The fa_cookie allocked in nsim_dev_trap_fa_cookie_write() is not freed. To
fix, add kfree(nsim_dev->fa_cookie) to nsim_drv_remove().

Fixes: d3cbb907ae ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Link: https://lore.kernel.org/r/1668504625-14698-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-16 12:20:57 -08:00
Jacob Keller
d82303df06 mlxsw: update adjfine to use adjust_by_scaled_ppm
The mlxsw adjfine implementation in the spectrum_ptp.c file converts
scaled_ppm into ppb before updating a cyclecounter multiplier using the
standard "base * ppb / 1billion" calculation.

This can be re-written to use adjust_by_scaled_ppm, directly using the
scaled parts per million and reducing the amount of code required to
express this calculation.

We still calculate the parts per billion for passing into
mlxsw_sp_ptp_phc_adjfreq because this function requires the input to be in
parts per billion.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20221114213701.815132-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-16 12:17:44 -08:00
Linus Torvalds
cc675d22e4 xen: branch for v6.1-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY3TN0wAKCRCAXGG7T9hj
 voAOAP4i3FjRj/ilXohox3F7iyPsRbFrGnayYcHRPeFF8UPz8QEAzyLP/FBGbmho
 sSuhcmb6r9foGKri7zyTKHIA4bkz4Qo=
 =/KaG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two trivial cleanups, and three simple fixes"

* tag 'for-linus-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/platform-pci: use define instead of literal number
  xen/platform-pci: add missing free_irq() in error path
  xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too
  xen/pcpu: fix possible memory leak in register_pcpu()
  x86/xen: Use kstrtobool() instead of strtobool()
2022-11-16 10:49:06 -08:00
Linus Torvalds
31c9c4c54e Pin control fixes for the v6.1 kernel:
- Fix a potential NULL dereference in the core!
 
 - Fix all pin mux routes in the Rockchop PX30 driver.
 
 - Fix the UFS pins in the Qualcomm SC8280XP driver.
 
 - Fix bias disabling in the Mediatek driver.
 
 - Fix debounce time settings in the Mediatek driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmN093oACgkQQRCzN7AZ
 XXOheA/+PDWZ3iC1AllzYU67T6i5dgNKWSEy26m5rvi9/1MG5i7HFf2RMu3H1Ake
 CCYqy5UKzVCdX95dID/PKrJaC6bcRBjE5LTf5IqLGTSLQUeCbvmAaNb2BdB9DNFu
 9TBxXpsrV3W1uZQU/31pZxuFVQWwsX2K9WMumzTz5y9mJZMffi0/2Dt2w8KGIT3J
 fIcZCLdEsBW7jbx5K6jqo8btnWWyJYr4DQrR1bf2DosHGKR2oTtm1wLm2lXdWrBr
 Ahg6fsFup4tH1L6cNAF2e+StlCu1C0Iq7/Im4uDkJBTrdpboxmXzDLcn4qEAOUVF
 sjH/mFMHCJDeRbpqH9spEpDJd5Q1Ho2ueWxiMNT5Gb17I8ZBpBfX7R9OckpnehVe
 lYJlHMXByB7mvWodZ9NrhfV+SeW+qqCePnpeB9IrkGaQ/3ac/nx46wa6NuYKlBlg
 sWBa+9wFjFTCu2ErB8zpXXs+cX9SzZ1AGIeUwIzM0S15eJy6Gw6k8iLhMyaUr9w4
 o+GAfLPe/Aenc80SVvrCUF+PgDsG9UHbcWx7a5spkN7SiEH8fnIlVqZEV1oqpFeP
 P5uzNUFRj4XiSAelY41YLNH2uE2jyQBbEZUEZjnHuDWFiY3fHipOP+xBzkNqG02u
 Wvtz17IWRqmE7BSZ4okJqxHByABAx5UGl1twBiIUweiYO6qdK7Q=
 =6q+9
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Aere is a hopefully final round of pin control fixes. Nothing special,
  driver fixes and we caught a potential NULL pointer exception.

   - Fix a potential NULL dereference in the core!

   - Fix all pin mux routes in the Rockchop PX30 driver

   - Fix the UFS pins in the Qualcomm SC8280XP driver

   - Fix bias disabling in the Mediatek driver

   - Fix debounce time settings in the Mediatek driver"

* tag 'pinctrl-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: mediatek: Export debounce time tables
  pinctrl: mediatek: Fix EINT pins input debounce time configuration
  pinctrl: devicetree: fix null pointer dereferencing in pinctrl_dt_to_map
  pinctrl: mediatek: common-v2: Fix bias-disable for PULL_PU_PD_RSEL_TYPE
  pinctrl: qcom: sc8280xp: Rectify UFS reset pins
  pinctrl: rockchip: list all pins in a possible mux route for PX30
2022-11-16 10:40:00 -08:00
Linus Torvalds
941209ef89 platform-drivers-x86 for v6.1-4
Highlights:
  -  Surface Pro 9 and Surface Laptop 5 kbd, battery, etc. support
     (this is just a few hw-id additions)
  -  A couple of other hw-id / DMI-quirk additions
  -  A few small bug fixes + 1 build fix
 
 The following is an automated git shortlog grouped by driver:
 
 acer-wmi:
  -  Enable SW_TABLET_MODE on Switch V 10 (SW5-017)
 
 asus-wmi:
  -  add missing pci_dev_put() in asus_wmi_set_xusb2pr()
 
 hp-wmi:
  -  Ignore Smart Experience App event
 
 ideapad-laptop:
  -  Add module parameters to match DMI quirk tables
  -  Fix interrupt storm on fn-lock toggle on some Yoga laptops
 
 platform/surface:
  -  aggregator_registry: Add support for Surface Laptop 5
  -  aggregator_registry: Add support for Surface Pro 9
  -  aggregator: Do not check for repeated unsequenced packets
 
 platform/x86/amd:
  -  pmc: Add new ACPI ID AMDI0009
  -  pmc: Remove more CONFIG_DEBUG_FS checks
 
 platform/x86/intel:
  -  pmc: Don't unconditionally attach Intel PMC when virtualized
 
 thinkpad_acpi:
  -  Enable s2idle quirk for 21A1 machine type
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmN0vq4UHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yRQQgAmget34TuVhzBTUmXLxwSGbgcjbSb
 GN6ir81p6R0XJ4/lHT3xKmfU3KXOd3CxGcIXAnyhmFKQxcwUAlWmzvQQja9Gz5oR
 7eg45Hd10Hi4iswlIvSDejYToQRPcE5POW4FbHsYvqh8jWaYuDSlw0KekLwDZWnQ
 leRtM+YzYCt3jDaOeFYfb4NtAU9lDzALeCA2myqXLfS5X1X+fKwsbsM0vZS5T/C5
 YF/fdSqpHXssVRsggPTrNeHhOb3v9M5UjVt8apqR5D+4cmQxnMizpF/rYmI7P3fZ
 OJCwv/3XvN6RecSMS5LK4/4fOvCM57/W8oO3YEmc6xNR4/em34Sm5dTgRg==
 =iPTF
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - Surface Pro 9 and Surface Laptop 5 kbd, battery, etc support (this
   is just a few hw-id additions)

 - A couple of other hw-id / DMI-quirk additions

 - A few small bug fixes + 1 build fix

* tag 'platform-drivers-x86-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: ideapad-laptop: Add module parameters to match DMI quirk tables
  platform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some Yoga laptops
  platform/x86: hp-wmi: Ignore Smart Experience App event
  platform/surface: aggregator_registry: Add support for Surface Laptop 5
  platform/surface: aggregator_registry: Add support for Surface Pro 9
  platform/surface: aggregator: Do not check for repeated unsequenced packets
  platform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017)
  platform/x86: asus-wmi: add missing pci_dev_put() in asus_wmi_set_xusb2pr()
  platform/x86/intel: pmc: Don't unconditionally attach Intel PMC when virtualized
  platform/x86: thinkpad_acpi: Enable s2idle quirk for 21A1 machine type
  platform/x86/amd: pmc: Add new ACPI ID AMDI0009
  platform/x86/amd: pmc: Remove more CONFIG_DEBUG_FS checks
2022-11-16 10:36:13 -08:00
Eric Dumazet
bf36267e3a tcp: annotate data-race around queue->synflood_warned
Annotate the lockless read of queue->synflood_warned.

Following xchg() has the needed data-race resolution.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 13:32:53 +00:00
Li zeming
1d7322f28f ax25: af_ax25: Remove unnecessary (void*) conversions
The valptr pointer is of (void *) type, so other pointers need not be
forced to assign values to it.

Signed-off-by: Li zeming <zeming@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 13:31:03 +00:00
Gleb Mazovetskiy
aeac4ec8f4 tcp: configurable source port perturb table size
On embedded systems with little memory and no relevant
security concerns, it is beneficial to reduce the size
of the table.

Reducing the size from 2^16 to 2^8 saves 255 KiB
of kernel RAM.

Makes the table size configurable as an expert option.

The size was previously increased from 2^8 to 2^16
in commit 4c2c8f03a5 ("tcp: increase source port perturb table to
2^16").

Signed-off-by: Gleb Mazovetskiy <glex.spb@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 13:02:04 +00:00
Jakub Sitnicki
b68777d54f l2tp: Serialize access to sk_user_data with sk_callback_lock
sk->sk_user_data has multiple users, which are not compatible with each
other. Writers must synchronize by grabbing the sk->sk_callback_lock.

l2tp currently fails to grab the lock when modifying the underlying tunnel
socket fields. Fix it by adding appropriate locking.

We err on the side of safety and grab the sk_callback_lock also inside the
sk_destruct callback overridden by l2tp, even though there should be no
refs allowing access to the sock at the time when sk_destruct gets called.

v4:
- serialize write to sk_user_data in l2tp sk_destruct

v3:
- switch from sock lock to sk_callback_lock
- document write-protection for sk_user_data

v2:
- update Fixes to point to origin of the bug
- use real names in Reported/Tested-by tags

Cc: Tom Parkin <tparkin@katalix.com>
Fixes: 3557baabf2 ("[L2TP]: PPP over L2TP driver core")
Reported-by: Haowei Yan <g1042620637@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:52:19 +00:00
David S. Miller
ca5ebbfec3 Merge branch 'net-atomic-dev-stats'
Eric Dumazet says:

====================
net: add atomic dev->stats infra

Long standing KCSAN issues are caused by data-race around
some dev->stats changes.

Most performance critical paths already use per-cpu
variables, or per-queue ones.

It is reasonable (and more correct) to use atomic operations
for the slow paths.

First patch adds the infrastructure, then three patches address
the most common paths that syzbot is playing with.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:48:44 +00:00
Eric Dumazet
c4794d2225 ipv4: tunnels: use DEV_STATS_INC()
Most of code paths in tunnels are lockless (eg NETIF_F_LLTX in tx).

Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:48:44 +00:00
Eric Dumazet
2fad1ba354 ipv6: tunnels: use DEV_STATS_INC()
Most of code paths in tunnels are lockless (eg NETIF_F_LLTX in tx).

Adopt SMP safe DEV_STATS_{INC|ADD}() to update dev->stats fields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:48:44 +00:00
Eric Dumazet
cb34b7cf17 ipv6/sit: use DEV_STATS_INC() to avoid data-races
syzbot/KCSAN reported that multiple cpus are updating dev->stats.tx_error
concurrently.

This is because sit tunnels are NETIF_F_LLTX, meaning their ndo_start_xmit()
is not protected by a spinlock.

While original KCSAN report was about tx path, rx path has the same issue.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:48:44 +00:00
Eric Dumazet
6c1c509778 net: add atomic_long_t to net_device_stats fields
Long standing KCSAN issues are caused by data-race around
some dev->stats changes.

Most performance critical paths already use per-cpu
variables, or per-queue ones.

It is reasonable (and more correct) to use atomic operations
for the slow paths.

This patch adds an union for each field of net_device_stats,
so that we can convert paths that are not yet protected
by a spinlock or a mutex.

netdev_stats_to_stats64() no longer has an #if BITS_PER_LONG==64

Note that the memcpy() we were using on 64bit arches
had no provision to avoid load-tearing,
while atomic_long_read() is providing the needed protection
at no cost.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:48:44 +00:00
David S. Miller
68d268d089 Merge branch 'net-try_cmpxchg-conversions'
Eric Dumazet says:

====================
net: more try_cmpxchg() conversions

Adopt try_cmpxchg() and friends in more places, as this
is preferred nowadays.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:42:01 +00:00
Eric Dumazet
4ebf802cf1 net: __sock_gen_cookie() cleanup
Adopt atomic64_try_cmpxchg() and remove the loop,
to make the intent more obvious.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:42:01 +00:00
Eric Dumazet
4ffa1d1c68 net: adopt try_cmpxchg() in napi_{enable|disable}()
This makes code a bit cleaner.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:42:01 +00:00
Eric Dumazet
1462160c74 net: adopt try_cmpxchg() in napi_schedule_prep() and napi_complete_done()
This makes the code slightly more efficient.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:42:01 +00:00
Eric Dumazet
6af645a5b2 net: net_{enable|disable}_timestamp() optimizations
Adopting atomic_try_cmpxchg() makes the code cleaner.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:42:00 +00:00
Eric Dumazet
30189806fb ipv6: fib6_new_sernum() optimization
Adopt atomic_try_cmpxchg() which is slightly more efficient.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 12:42:00 +00:00