Commit graph

904727 commits

Author SHA1 Message Date
David S. Miller
6e2345c197 Merge branch 'net-sched-expose-HW-stats-types-per-action-used-by-drivers'
Jiri Pirko says:

====================
net: sched: expose HW stats types per action used by drivers

The first patch is just adding a helper used by the second patch too.
The second patch is exposing HW stats types that are used by drivers.

Example:

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 10 sec used 10 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:06:49 -07:00
Jiri Pirko
93a129eb8c net: sched: expose HW stats types per action used by drivers
It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 10 sec used 10 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:06:49 -07:00
Jiri Pirko
8953b0770f net: introduce nla_put_bitfield32() helper and use it
Introduce a helper to pass value and selector to. The helper packs them
into struct and puts them into netlink message.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:06:49 -07:00
David S. Miller
acc086bfb9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-03-28

1) Use kmem_cache_zalloc() instead of kmem_cache_alloc()
   in xfrm_state_alloc(). From Huang Zijiang.

2) esp_output_fill_trailer() is the same in IPv4 and IPv6,
   so share this function to avoide code duplcation.
   From Raed Salem.

3) Add offload support for esp beet mode.
   From Xin Long.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:59:20 -07:00
Christophe JAILLET
ee91a83e08 net: dsa: Simplify 'dsa_tag_protocol_to_str()'
There is no point in preparing the module name in a buffer. The format
string can be passed diectly to 'request_module()'.

This axes a few lines of code and cleans a few things:
   - max len for a driver name is MODULE_NAME_LEN wich is ~ 60 chars,
     not 128. It would be down-sized in 'request_module()'
   - we should pass the total size of the buffer to 'snprintf()', not the
     size minus 1

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:54:57 -07:00
YueHaibing
32109c7065 net: ena: Make some functions static
Fix sparse warnings:

drivers/net/ethernet/amazon/ena/ena_netdev.c:460:6: warning: symbol 'ena_xdp_exchange_program_rx_in_range' was not declared. Should it be static?
drivers/net/ethernet/amazon/ena/ena_netdev.c:481:6: warning: symbol 'ena_xdp_exchange_program' was not declared. Should it be static?
drivers/net/ethernet/amazon/ena/ena_netdev.c:1555:5: warning: symbol 'ena_xdp_handle_buff' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:53:40 -07:00
YueHaibing
cd1ff94830 dpaa_eth: Make dpaa_a050385_wa static
Fix sparse warning:

drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:2065:5:
 warning: symbol 'dpaa_a050385_wa' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:53:09 -07:00
Rohit Maheshwari
3a0a978389 crypto/chtls: Fix chtls crash in connection cleanup
There is a possibility that cdev is removed before CPL_ABORT_REQ_RSS
is fully processed, so it's better to save it in skb.

Added checks in handling the flow correctly, which suggests connection reset
request is sent to HW, wait for HW to respond.

Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:44:08 -07:00
Rohit Maheshwari
e14394e656 crypto/chcr: fix incorrect ipv6 packet length
IPv6 header's payload length field shouldn't include IPv6 header length.

Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:40:12 -07:00
Wong Vee Khee
ed64639bc1 net: stmmac: Add support for VLAN Rx filtering
Add support for VLAN ID-based filtering by the MAC controller for MAC
drivers that support it. Only the 12-bit VID field is used.

Signed-off-by: Chuah Kim Tatt <kim.tatt.chuah@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:36:44 -07:00
David S. Miller
07c9f74a89 Merge branch 'crypto-chelsio-Fixes-issues-during-chcr-driver-registration'
Ayush Sawal says:

====================
crypto: chelsio: Fixes issues during chcr driver registration

Patch 1: Avoid the accessing of wrong u_ctx pointer.
Patch 2: Fixes a deadlock between rtnl_lock and uld_mutex.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:33:23 -07:00
Ayush Sawal
876aa9f527 Crypto: chelsio - Fixes a deadlock between rtnl_lock and uld_mutex
The locks are taken in this order during driver registration
(uld_mutex), at: cxgb4_register_uld.part.14+0x49/0xd60 [cxgb4]
(rtnl_mutex), at: rtnetlink_rcv_msg+0x2db/0x400
(uld_mutex), at: cxgb_up+0x3a/0x7b0 [cxgb4]
(rtnl_mutex), at: chcr_add_xfrmops+0x83/0xa0 [chcr](stucked here)

To avoid this now the netdev features are updated after the
cxgb4_register_uld function is completed.

Fixes: 6dad4e8ab3 ("chcr: Add support for Inline IPSec").

Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:33:23 -07:00
Ayush Sawal
ad59ddd02d Crypto: chelsio - Fixes a hang issue during driver registration
This issue occurs only when multiadapters are present. Hang
happens because assign_chcr_device returns u_ctx pointer of
adapter which is not yet initialized as for this adapter cxgb_up
is not been called yet.

The last_dev pointer is used to determine u_ctx pointer and it
is initialized two times in chcr_uld_add in chcr_dev_add respectively.

The fix here is don't initialize the last_dev pointer during
chcr_uld_add. Only assign to value to it when the adapter's
initialization is completed i.e in chcr_dev_add.

Fixes: fef4912b66 ("crypto: chelsio - Handle PCI shutdown event").

Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:33:23 -07:00
Matthieu Baerts
3aeaaa59fd selftests:mptcp: fix failure due to whitespace damage
'pm_nl_ctl' was adding a trailing whitespace after having printed the
IP. But at the end, the IP element is currently always the last one.

The bash script launching 'pm_nl_ctl' had trailing whitespaces in the
expected result on purpose. But these whitespaces have been removed when
the patch has been applied upstream. To avoid trailing whitespaces in
the bash code, 'pm_nl_ctl' and expected results have now been adapted.

The MPTCP PM selftest can now pass again.

Fixes: eedbc68532 (selftests: add PM netlink functional tests)
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:25:34 -07:00
Colin Ian King
76dcbd2370 net: ethernet: ti: fix spelling mistake "rundom" -> "random"
There is a spelling mistake in a dev_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:18:52 -07:00
David S. Miller
aba6d497c8 mlx5-updates-2020-03-29
1) mlx5 core level updates for mkey APIs + migrate some code to mlx5_ib
 
 2) Use a separate work queue for fib event handling
 
 3) Move new eswitch chains files to a new directory, to prepare for
 upcoming E-Switch and offloads features.
 
 4) Support indr block setup (TC_SETUP_FT) in Flow Table mode.
 
 --
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl6BlNUACgkQSD+KveBX
 +j4LMAf/QszF52vJ0CmJQCuuaor2W1SGhluT183uU8Eac9GIq0XXHwD81O3FC01+
 qk1NaVcULqt9eGwZq5WOQIF+pw1gcnlZlg/UCaQBWWkCmdTbrJIwRTou9VwYR+hi
 i9QB3/WKhpm9c/u/7MI6YZkQmjTLNDxYyF6/dqT3TyrRxwDRejeRtsUDdQTwswBV
 p2Jw3nyLbviAr2g7MAF/CvVtjsl7rDUuJkdWvgvCIgsJzLMtIuduuTf7Q3dsvKUe
 GZNWhp+9lKJajdHbQaKjOK5XODoQAbRdPeMV6umV3lWmgRU7Bc4s8/jxChX5NnX9
 yieLnsqwqw1aJbh35yMtuVRtVsp+HA==
 =ZhTN
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-03-29

1) mlx5 core level updates for mkey APIs + migrate some code to mlx5_ib

2) Use a separate work queue for fib event handling

3) Move new eswitch chains files to a new directory, to prepare for
upcoming E-Switch and offloads features.

4) Support indr block setup (TC_SETUP_FT) in Flow Table mode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 10:14:36 -07:00
wenxu
07c264ab8e net/mlx5e: add mlx5e_rep_indr_setup_ft_cb support
Add mlx5e_rep_indr_setup_ft_cb to support indr block setup
in FT mode.
Both tc rules and flow table rules are of the same format,
It can re-use tc parsing for that, and move the flow table rules
to their steering domain(the specific chain_index), the indr
block offload in FT also follow this scenario.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:27 -07:00
wenxu
5a37a8df80 net/mlx5e: refactor indr setup block
Refactor indr setup block for support ft indr setup in the
next patch. The function mlx5e_rep_indr_offload exposes
'flags' in order set additional flag for FT in next patch.
Rename mlx5e_rep_indr_setup_tc_block to mlx5e_rep_indr_setup_block
and add flow_setup_cb_t callback parameters in order set the
specific callback for FT in next patch.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:24 -07:00
Saeed Mahameed
49964352ca net/mlx5: E-Switch: Move eswitch chains to a new directory
eswitch_offloads_chains.{c,h} were just introduced this kernel release
cycle, eswitch is in high development demand right now and many
features are planned to be added to it. eswitch deserves its own
directory and here we move these new files to there, in preparation for
upcoming eswitch features and new files.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2020-03-29 23:42:22 -07:00
Mark Zhang
6838a35a45 net/mlx5: Use a separate work queue for fib event handling
In VF lag mode when remove the bonding module without bring down the
bond device first, we could potentially have circular dependency when we
unload IB devices and also handle fib events:
1. The bond work starts first;
2. The "modprobe -rv bonding" process tries to release the bond device,
   with the "pernet_ops_rwsem" lock hold;
3. The bond work blocks in unregister_netdevice_notifier() and waits for
the lock because fib event came right before;
4. The kernel fib module tries to free all the fib entries by broadcasting
   the "FIB_EVENT_NH_DEL" event;
5. Upon the fib event this lag_mp module holds the fib lock and queue a
   fib work.
So:
   bond work -> modprobe task -> kernel fib module -> lag_mp -> bond work

Today we either reload IB devices in roce lag in nic mode or either handle
fib events in switchdev mode, but a new feature could change that we'll
need to reload IB devices also in switchdev mode so this is a future proof
fix as one may not notice this later.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:20 -07:00
Saeed Mahameed
e999a7343d Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  mlx5: Remove uninitialized use of key in mlx5_core_create_mkey
  {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
  {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
  {IB,net}/mlx5: Setup mkey variant before mr create command invocation

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:11 -07:00
David S. Miller
c13b5adb06 Merge branch 'ethtool-netlink-interface-part-4'
Michal Kubecek says:

====================
ethtool netlink interface, part 4

Implementation of more netlink request types:

  - coalescing (ethtool -c/-C, patches 2-4)
  - pause parameters (ethtool -a/-A, patches 5-7)
  - EEE settings (--show-eee / --set-eee, patches 8-10)
  - timestamping info (-T, patches 11-12)

Patch 1 is a fix for netdev reference leak similar to commit 2f599ec422
("ethtool: fix reference leak in some *_SET handlers") but fixing a code

Changes in v3
  - change "one-step-*" Tx type names to "onestep-*", (patch 11, suggested
    by Richard Cochran
  - use "TSINFO" rather than "TIMESTAMP" for timestamping information
    constants and adjust symbol names (patch 12, suggested by Richard
    Cochran)

Changes in v2:
  - fix compiler warning in net_hwtstamp_validate() (patch 11)
  - fix follow-up lines alignment (whitespace only, patches 3 and 8)
which is only in net-next tree at the moment.
====================

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:33:03 -07:00
Michal Kubecek
5b071c59ed ethtool: provide timestamping information with TSINFO_GET request
Implement TSINFO_GET request to get timestamping information for a network
device. This is traditionally available via ETHTOOL_GET_TS_INFO ioctl
request.

Move part of ethtool_get_ts_info() into common.c so that ioctl and netlink
code use the same logic to get timestamping information from the device.

v3: use "TSINFO" rather than "TIMESTAMP", suggested by Richard Cochran

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:37 -07:00
Michal Kubecek
f76510b458 ethtool: add timestamping related string sets
Add three string sets related to timestamping information:

  ETH_SS_SOF_TIMESTAMPING: SOF_TIMESTAMPING_* flags
  ETH_SS_TS_TX_TYPES:      timestamping Tx types
  ETH_SS_TS_RX_FILTERS:    timestamping Rx filters

These will be used for TIMESTAMP_GET request.

v2: avoid compiler warning ("enumeration value not handled in switch")
    in net_hwtstamp_validate()

v3: omit dash in Tx type names ("one-step-*" -> "onestep-*"), suggested by
    Richard Cochran

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
6c5bc8fe4e ethtool: add EEE_NTF notification
Send ETHTOOL_MSG_EEE_NTF notification whenever EEE settings of a network
device are modified using ETHTOOL_MSG_EEE_SET netlink message or
ETHTOOL_SEEE ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
fd77be7bd4 ethtool: set EEE settings with EEE_SET request
Implement EEE_SET netlink request to set EEE settings of a network device.
These are traditionally set with ETHTOOL_SEEE ioctl request.

The netlink interface allows setting the EEE status for all link modes
supported by kernel but only first 32 link modes can be set at the moment
as only those are supported by the ethtool_ops callback.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
b7eeefe72e ethtool: provide EEE settings with EEE_GET request
Implement EEE_GET request to get EEE settings of a network device. These
are traditionally available via ETHTOOL_GEEE ioctl request.

The netlink interface allows reporting EEE status for all link modes
supported by kernel but only first 32 link modes are provided at the moment
as only those are reported by the ethtool_ops callback and drivers.

v2: fix alignment (whitespace only)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
bf37faa386 ethtool: add PAUSE_NTF notification
Send ETHTOOL_MSG_PAUSE_NTF notification whenever pause parameters of
a network device are modified using ETHTOOL_MSG_PAUSE_SET netlink message
or ETHTOOL_SPAUSEPARAM ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
3ab879933d ethtool: set pause parameters with PAUSE_SET request
Implement PAUSE_SET netlink request to set pause parameters of a network
device. Thease are traditionally set with ETHTOOL_SPAUSEPARAM ioctl
request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
7f59fb32b0 ethtool: provide pause parameters with PAUSE_GET request
Implement PAUSE_GET request to get pause parameters of a network device.
These are traditionally available via ETHTOOL_GPAUSEPARAM ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
0cf3eac8c9 ethtool: add COALESCE_NTF notification
Send ETHTOOL_MSG_COALESCE_NTF notification whenever coalescing parameters
of a network device are modified using ETHTOOL_MSG_COALESCE_SET netlink
message or ETHTOOL_SCOALESCE ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
9881418c75 ethtool: set coalescing parameters with COALESCE_SET request
Implement COALESCE_SET netlink request to set coalescing parameters of
a network device. These are traditionally set with ETHTOOL_SCOALESCE ioctl
request. This commit adds only support for device coalescing parameters,
not per queue coalescing parameters.

Like the ioctl implementation, the generic ethtool code checks if only
supported parameters are modified; if not, first offending attribute is
reported using extack.

v2: fix alignment (whitespace only)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
217275453b ethtool: provide coalescing parameters with COALESCE_GET request
Implement COALESCE_GET request to get coalescing parameters of a network
device. These are traditionally available via ETHTOOL_GCOALESCE ioctl
request. This commit adds only support for device coalescing parameters,
not per queue coalescing parameters.

Omit attributes with zero values unless they are declared as supported
(i.e. the corresponding bit in ethtool_ops::supported_coalesce_params is
set).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
b51fb7711a ethtool: fix reference leak in ethnl_set_privflags()
Andrew noticed that some handlers for *_SET commands leak a netdev
reference if required ethtool_ops callbacks do not exist. One of them is
ethnl_set_privflags(), a simple reproducer would be e.g.

  ip link add veth1 type veth peer name veth2
  ethtool --set-priv-flags veth1 foo on
  ip link del veth1

Make sure dev_put() is called when ethtool_ops check fails.

Fixes: f265d79959 ("ethtool: set device private flags with PRIVFLAGS_SET request")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
David S. Miller
96376cad35 Merge branch 'ipv6-add-rpl-source-routing'
Alexander Aring says:

====================
net: ipv6: add rpl source routing

This patch series will add handling for RPL source routing handling
and insertion (implement as lwtunnel)! I did an example prototype
implementation in rpld for using this implementation in non-storing mode:

https://github.com/linux-wpan/rpld/tree/nonstoring_mode

I will also present a talk at netdev about it:

https://netdevconf.info/0x14/session.html?talk-extend-segment-routing-for-RPL

In receive handling I add handling for IPIP encapsulation as RFC6554
describes it as possible. For reasons I didn't implemented it yet for
generating such packets because I am not really sure how/when this
should happen. So far I understand there exists a draft yet which
describes the cases (inclusive a Hop-by-Hop option which we also not
support yet).

https://tools.ietf.org/html/draft-ietf-roll-useofrplinfo-35

This is just the beginning to start implementation everything for yet,
step by step. It works for my use cases yet to have it running on a
6LOWPAN _only_ network.

I have some patches for iproute2 as well.

A sidenote: I check on local addresses if they are part of segment
routes, this is just to avoid stupid settings. A use can add addresses
afterwards what I cannot control anymore but then it's users fault to
make such thing. The receive handling checks for this as well which is
required by RFC6554, so the next hops or when it comes back should drop
it anyway.

To make this possible I added functionality to pass the net structure to
the build_state of lwtunnel (I hope I caught all lwtunnels).

Another sidenote: I set the headroom value to 0 as I figured out it will
break on interfaces with IPv6 min mtu if set to non zero for tunnels on
L3.

- Alex

changes since v3:
 - use parse_nested which isn't deprecated - Thanks David Ahern
 - change to return -1 instead errno in exthdr handling to unify
   error code
 - change function name from ipv6_rpl_srh_decompress_size to
   ipv6_rpl_srh_size

changes since v2:
 - add additional segdata length in lwtunnel build_state
 - fix build_state patch by not catching one inline noop function
   if LWTUNNEL is disabled

Alexander Aring (5):
  include: uapi: linux: add rpl sr header definition
  addrconf: add functionality to check on rpl requirements
  net: ipv6: add support for rpl sr exthdr
  net: add net available in build_state
  net: ipv6: add rpl sr tunnel
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
a7a29f9c36 net: ipv6: add rpl sr tunnel
This patch adds functionality to configure routes for RPL source routing
functionality. There is no IPIP functionality yet implemented which can
be added later when the cases when to use IPv6 encapuslation comes more
clear.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
faee676944 net: add net available in build_state
The build_state callback of lwtunnel doesn't contain the net namespace
structure yet. This patch will add it so we can check on specific
address configuration at creation time of rpl source routes.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
8610c7c6e3 net: ipv6: add support for rpl sr exthdr
This patch adds rpl source routing receive handling. Everything works
only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly
the same behaviour as IPv6 segmentation routing. To handle compression
and uncompression a rpl.c file is created which contains the necessary
functionality. The receive handling will also care about IPv6
encapsulated so far it's specified as possible nexthdr in RFC 6554.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
f37c605936 addrconf: add functionality to check on rpl requirements
This patch adds a functionality to addrconf to check on a specific RPL
address configuration. According to RFC 6554:

To detect loops in the SRH, a router MUST determine if the SRH
includes multiple addresses assigned to any interface on that
router. If such addresses appear more than once and are separated by
at least one address not assigned to that router.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
cfa933d938 include: uapi: linux: add rpl sr header definition
This patch adds a uapi header for rpl struct definition. The segments
data can be accessed over rpl_segaddr or rpl_segdata macros. In case of
compri and compre is zero the segment data is not compressed and can be
accessed by rpl_segaddr. In the other case the compressed data can be
accessed by rpl_segdata and interpreted as byte array.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
David S. Miller
c189b5483c Merge branch 'mptcp-multiple-subflows-path-management'
Mat Martineau says:

====================
Multipath TCP part 3: Multiple subflows and path management

v2 -> v3: Remove 'inline' in .c files, fix uapi bit macros, and rebase.

v1 -> v2: Rebase on current net-next, fix for netlink limit setting,
and update .gitignore for selftest.

This patch set allows more than one TCP subflow to be established and
used for a multipath TCP connection. Subflows are added to an existing
connection using the MP_JOIN option during the 3-way handshake. With
multiple TCP subflows available, sent data is now stored in the MPTCP
socket so it may be retransmitted on any TCP subflow if there is no
DATA_ACK before a timeout. If an MPTCP-level timeout occurs, data is
retransmitted using an available subflow. Storing this sent data
requires the addition of memory accounting at the MPTCP level, which was
previously delegated to the single subflow. Incoming DATA_ACKs now free
data from the MPTCP-level retransmit buffer.

IP addresses available for new subflow connections can now be advertised
and received with the ADD_ADDR option, and the corresponding REMOVE_ADDR
option likewise advertises that an address is no longer available.

The MPTCP path manager netlink interface has commands to set in-kernel
limits for the number of concurrent subflows and control the
advertisement of IP addresses between peers.

To track and debug MPTCP connections there are new MPTCP MIB counters,
and subflow context can be requested using inet_diag. The MPTCP
self-tests now validate multiple-subflow operation and the netlink path
manager interface.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:15:17 -07:00
Paolo Abeni
b08fbf2410 selftests: add test-cases for MPTCP MP_JOIN
Use the pm netlink to configure the creation of several
subflows, and verify that via MIB counters.

Update the mptcp_connect program to allow reliable MP_JOIN
handshake even on small data file

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Paolo Abeni
eedbc68532 selftests: add PM netlink functional tests
This introduces basic self-tests for the PM netlink,
checking the basic APIs and possible exceptional
values.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Paolo Abeni
01cacb00b3 mptcp: add netlink-based PM
Expose a new netlink family to userspace to control the PM, setting:

 - list of local addresses to be signalled.
 - list of local addresses used to created subflows.
 - maximum number of add_addr option to react

When the msk is fully established, the PM netlink attempts to
announce the 'signal' list via the ADD_ADDR option. Since we
currently lack the ADD_ADDR echo (and related event) only the
first addr is sent.

After exhausting the 'announce' list, the PM tries to create
subflow for each addr in 'local' list, waiting for each
connection to be completed before attempting the next one.

Idea is to add an additional PM hook for ADD_ADDR echo, to allow
the PM netlink announcing multiple addresses, in sequence.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Florian Westphal
fc518953bc mptcp: add and use MIB counter infrastructure
Exported via same /proc file as the Linux TCP MIB counters, so "netstat -s"
or "nstat" will show them automatically.

The MPTCP MIB counters are allocated in a distinct pcpu area in order to
avoid bloating/wasting TCP pcpu memory.

Counters are allocated once the first MPTCP socket is created in a
network namespace and free'd on exit.

If no sockets have been allocated, all-zero mptcp counters are shown.

The MIB counter list is taken from the multipath-tcp.org kernel, but
only a few counters have been picked up so far.  The counter list can
be increased at any time later on.

v2 -> v3:
 - remove 'inline' in foo.c files (David S. Miller)

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Davide Caratti
5147dfb508 mptcp: allow dumping subflow context to userspace
add ulp-specific diagnostic functions, so that subflow information can be
dumped to userspace programs like 'ss'.

v2 -> v3:
- uapi: use bit macros appropriate for userspace

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Paolo Abeni
3b1d6210a9 mptcp: implement and use MPTCP-level retransmission
On timeout event, schedule a work queue to do the retransmission.
Retransmission code closely resembles the sendmsg() implementation and
re-uses mptcp_sendmsg_frag, providing a dummy msghdr - for flags'
sake - and peeking the relevant dfrag from the rtx head.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Paolo Abeni
3f8e0aae17 mptcp: rework mptcp_sendmsg_frag to accept optional dfrag
This will simplify mptcp-level retransmission implementation
in the next patch. If dfrag is provided by the caller, skip
kernel space memory allocation and use data and metadata
provided by the dfrag itself.

Because a peer could ack data at TCP level but refrain from
sending mptcp-level ACKs, we could grow the mptcp socket
backlog indefinitely.

We should thus block mptcp_sendmsg until the peer has acked some of the
sent data.

In order to be able to do so, increment the mptcp socket wmem_queued
counter on memory allocation and decrement it when releasing the memory
on mptcp-level ack reception.

Because TCP performns sndbuf auto-tuning up to tcp_wmem_max[2], make
this the mptcp sk_sndbuf limit.

In the future we could add experiment with autotuning as TCP does in
tcp_sndbuf_expand().

v2 -> v3:
 - remove 'inline' in foo.c files (David S. Miller)

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Florian Westphal
7948f6cc99 mptcp: allow partial cleaning of rtx head dfrag
After adding wmem accounting for the mptcp socket we could get
into a situation where the mptcp socket can't transmit more data,
and mptcp_clean_una doesn't reduce wmem even if snd_una has advanced
because it currently will only remove entire dfrags.

Allow advancing the dfrag head sequence and reduce wmem,
even though this isn't correct (as we can't release the page).

Because we will soon block on mptcp sk in case wmem is too large,
call sk_stream_write_space() in case we reduced the backlog so
userspace task blocked in sendmsg or poll will be woken up.

This isn't an issue if the send buffer is large, but it is when
SO_SNDBUF is used to reduce it to a lower value.

Note we can still get a deadlock for low SO_SNDBUF values in
case both sides of the connection write to the socket: both could
be blocked due to wmem being too small -- and current mptcp stack
will only increment mptcp ack_seq on recv.

This doesn't happen with the selftest as it uses poll() and
will always call recv if there is data to read.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Paolo Abeni
d027236c41 mptcp: implement memory accounting for mptcp rtx queue
Charge the data on the rtx queue to the master MPTCP socket, too.
Such memory in uncharged when the data is acked/dequeued.

Also account mptcp sockets inuse via a protocol specific pcpu
counter.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00