linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-03 07:38:10 +00:00

Author	SHA1	Message	Date
Heiner Kallweit	4f447d2969	r8169: drop member pll_power_ops from struct rtl8169_private After merging r810x_pll_power_down/up and r8168_pll_power_down/up we don't need member pll_power_ops any longer and can drop it, thus simplifying the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	73570bf19f	r8169: merge r810x_pll_power_down/up into r8168_pll_power_down/up r810x_pll_power_down/up and r8168_pll_power_down/up have a lot in common, so we can simplify the code by merging the former into the latter. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	40242e232e	r8169: remove 810x_phy_power_up/down The functionality of 810x_phy_power_up/down is covered by the default clause in 8168_phy_power_up/down. Therefore we don't need these functions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	6851d025e5	r8169: remove unneeded check in r8168_pll_power_down RTL_GIGA_MAC_VER_23/24 are configured by rtl_hw_start_8168cp_2() and rtl_hw_start_8168cp_3() respectively which both apply CPCMD_QUIRK_MASK, thus clearing bit ASF. Bit ASF isn't set at any other place in the driver, therefore this check can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
David S. Miller	77ec3a0e2d	Merge branch 'net-smc-small-features' Ursula Braun says: ==================== net/smc: small features 2018/04/30 here are 4 smc patches for net-next covering small new features in different areas: * link health check * diagnostics for IPv6 smc sockets * ioctl * improvement for vlan determination v2 changes: * better title * patch 2 - remove compile problem for disabled CONFIG_IPV6 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:13 -04:00
Ursula Braun	cb9d43f677	net/smc: determine vlan_id of stacked net_device An SMC link group is bound to a specific vlan_id. Its link uses the RoCE-GIDs established for the specific vlan_id. This patch makes sure the appropriate vlan_id is determined for stacked scenarios like for instance a master bonding device with vlan devices enslaved. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:12 -04:00
Ursula Braun	9b67e26f93	net/smc: handle ioctls SIOCINQ, SIOCOUTQ, and SIOCOUTQNSD SIOCINQ returns the amount of unread data in the RMB. SIOCOUTQ returns the amount of unsent or unacked sent data in the send buffer. SIOCOUTQNSD returns the amount of data prepared for sending, but not yet sent. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:12 -04:00
Karsten Graul	ed75986f4a	net/smc: ipv6 support for smc_diag.c Update smc_diag.c to support ipv6 addresses on the diagnosis interface. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:12 -04:00
Karsten Graul	877ae5be42	net/smc: periodic testlink support Add periodic LLC testlink support to ensure the link is still active. The interval time is initialized using the value of sysctl_tcp_keepalive_time. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:12 -04:00
David S. Miller	e90c1a1090	Merge branch 'mlxsw-Reject-unsupported-FIB-configurations' Ido Schimmel says: ==================== mlxsw: Reject unsupported FIB configurations Recently it became possible for listeners of the FIB notification chain to veto operations such as addition of routes and rules. Adjust the mlxsw driver to take advantage of it and return an error for unsupported FIB rules and for routes configured after the abort mechanism was triggered (due to exceeded resources for example). v2: * Change error code in first patch to -EOPNOTSUPP (David Ahern). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:15:18 -04:00
Ido Schimmel	50d10711cf	mlxsw: spectrum_router: Return an error for routes added after abort We currently do not perform accounting in the driver and thus can't reject routes before resources are exceeded. However, in order to make users aware of the fact that routes are no longer offloaded we can return an error for routes configured after the abort mechanism was triggered. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:15:17 -04:00
Ido Schimmel	6290182b2b	mlxsw: spectrum_router: Return an error for non-default FIB rules Since commit `9776d32537` ("net: Move call_fib_rule_notifiers up in fib_nl_newrule") it is possible to forbid the installation of unsupported FIB rules. Have mlxsw return an error for non-default FIB rules in addition to the existing extack message. Example: # ip rule add from 198.51.100.1 table 10 Error: mlxsw_spectrum: FIB rules not supported. Note that offload is only aborted when non-default FIB rules are already installed and merely replayed during module initialization. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:15:17 -04:00
Ganesh Goudar	794451c1b5	cxgb4: add new T5 device id's Add device id's 0x5019, 0x501a and 0x501b for T5 cards. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:03:37 -04:00
Kees Cook	8ac60ffb9a	net: stmmac: Avoid VLA usage In the quest to remove all stack VLAs from the kernel[1], this switches the "status" stack buffer to use the existing small (8) upper bound on how many queues can be checked for DMA, and adds a sanity-check just to make sure it doesn't operate under pathological conditions. [1] http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 11:11:30 -04:00
Raghu Vatsavayi	795d8098d3	liquidio VF: indicate that disabling rx vlan offload is not allowed NIC firmware does not support disabling rx vlan offload, but the VF driver incorrectly indicates that it is supported. The PF driver already does the correct indication by clearing the NETIF_F_HW_VLAN_CTAG_RX bit in its netdev->hw_features. So just do the same thing in the VF. Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Acked-by: Prasad Kanneganti <prasad.kanneganti@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 11:07:22 -04:00
Sean Tranchetti	6c035ba7e7	udp: Complement partial checksum for GSO packet Using the udp_v4_check() function to calculate the pseudo header for the newly segmented UDP packets results in assigning the complement of the value to the UDP header checksum field. Always undo the complement the partial checksum value in order to match the case where GSO is not used on the UDP transmit path. Fixes: `ee80d1ebe5` ("udp: add udp gso") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 10:59:32 -04:00
Soheil Hassas Yeganeh	702353b538	selftest: add test for TCP_INQ Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 18:56:29 -04:00
Soheil Hassas Yeganeh	b75eba76d3	tcp: send in-queue bytes in cmsg upon read Applications with many concurrent connections, high variance in receive queue length and tight memory bounds cannot allocate worst-case buffer size to drain sockets. Knowing the size of receive queue length, applications can optimize how they allocate buffers to read from the socket. The number of bytes pending on the socket is directly available through ioctl(FIONREAD/SIOCINQ) and can be approximated using getsockopt(MEMINFO) (rmem_alloc includes skb overheads in addition to application data). But, both of these options add an extra syscall per recvmsg. Moreover, ioctl(FIONREAD/SIOCINQ) takes the socket lock. Add the TCP_INQ socket option to TCP. When this socket option is set, recvmsg() relays the number of bytes available on the socket for reading to the application via the TCP_CM_INQ control message. Calculate the number of bytes after releasing the socket lock to include the processed backlog, if any. To avoid an extra branch in the hot path of recvmsg() for this new control message, move all cmsg processing inside an existing branch for processing receive timestamps. Since the socket lock is not held when calculating the size of receive queue, TCP_INQ is a hint. For example, it can overestimate the queue size by one byte, if FIN is received. With this method, applications can start reading from the socket using a small buffer, and then use larger buffers based on the remaining data when needed. V3 change-log: As suggested by David Miller, added loads with barrier to check whether we have multiple threads calling recvmsg in parallel. When that happens we lock the socket to calculate inq. V4 change-log: Removed inline from a static function. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 18:56:29 -04:00
David S. Miller	ab85539eb3	Merge branch 'hns3-fixes' Salil Mehta says: ==================== Misc bug fixes for HNS3 Ethernet driver This patch-set presents some miscellaneous bug fixs and cleanups for HNS3 Ethernet Driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:09:02 -04:00
Xi Wang	dbecc7796c	net: hns3: Remove packet statistics in the range of 8192~12287 Because the current statistics for size 8192~12287 are only valid for GE, the ranges of 8192~9216 and 9217~12287 are valid only for LGE/CGE, and are always 0 for GE interfaces. it is easy to cause confusion when viewing the packet statistics using the command ethtool -S. This patch removes the 8192~12287 range of packet statistics and uses the 8192~9216 and 9217~12287 ranges for statistics. This change depends on the firmware upgrade. Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:38 -04:00
Yunsheng Lin	dc8131d846	net: hns3: Fix for packet loss due wrong filter config in VLAN tbls There are two level of vlan tables in hardware, one is port vlan which is shared by all functions, the other one is function vlan table, each function has it's own function vlan table. Currently, PF sets the port vlan table, and vf sets the function vlan table, which will cause packet lost problem. This patch fixes this problem by setting both vlan table, and use hdev->vlan_table to manage thet port vlan table. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:38 -04:00
Huazhong Tan	3ff504908f	net: hns3: fix a dead loop in hclge_cmd_csq_clean If head has invlid value then a dead loop can be triggered in hclge_cmd_csq_clean. This patch adds sanity check for this case. Fixes: `68c0a5c706` ("net: hns3: Add HNS3 IMP(Integrated Mgmt Proc) Cmd Interface Support") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:37 -04:00
Fuyun Liang	0c963e8c20	net: hns3: Fix to support autoneg only for port attached with phy This patch adds a check to support autoneg(ethtool -A) only when PHY is attached with the port. Fixes: `e2cb1dec97` ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:37 -04:00
Huazhong Tan	c5ef83cbb1	net: hns3: fix for phy_addr error in hclge_mac_mdio_config When phy exists, phy_addr must less than PHY_MAX_ADDR. If not, hclge_mac_mdio_config should return error. And for fiber(phy_addr=0xff), it does not need hclge_mac_mdio_config. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:37 -04:00
Huazhong Tan	ffd5656e18	net: hns3: Fixes the error legs in hclge_init_ae_dev function This patch fixes some of the missed error legs in the initialization function of the ae device. This might cause leaks in case of failure. Fixes: `46a3df9f97` ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:37 -04:00
Huazhong Tan	38e62046d4	net: hns3: Fixes the out of bounds access in hclge_map_tqp This patch fixes the handling of the check when number of vports are detected to be more than available TPQs. Current handling causes an out of bounds access in hclge_map_tqp(). Fixes: `7df7dad633` ("net: hns3: Refactor the mapping of tqp to vport") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:37 -04:00
Huazhong Tan	35f58fd792	net: hns3: fix to correctly fetch l4 protocol outer header This patch fixes the function being used to fetch L4 protocol outer header. Mistakenly skb_inner_transport_header API was being used earlier. Fixes: `76ad4f0ee7` ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:37 -04:00
Yunsheng Lin	206703289a	net: hns3: Remove error log when getting pfc stats fails When mac supports DCB, but is in GE mode, it does not support querying pfc stats, firmware returns error when trying to query the pfc stats. this creates a lot of noise in the kernel log when it prints the error log. This patch fixes it by removing the error log, because it already return the error to the user space, so the user should be aware of the error. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:36 -04:00
Stefan Strogin	b086ff8725	connector: add parent pid and tgid to coredump and exit events The intention is to get notified of process failures as soon as possible, before a possible core dumping (which could be very long) (e.g. in some process-manager). Coredump and exit process events are perfect for such use cases (see `2b5faa4c55` "connector: Added coredumping event to the process connector"). The problem is that for now the process-manager cannot know the parent of a dying process using connectors. This could be useful if the process-manager should monitor for failures only children of certain parents, so we could filter the coredump and exit events by parent process and/or thread ID. Add parent pid and tgid to coredump and exit process connectors event data. Signed-off-by: Stefan Strogin <sstrogin@cisco.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:25:37 -04:00
Florian Fainelli	e283de3a4f	net: core: Inline netdev_features_size_check() We do not require this inline function to be used in multiple different locations, just inline it where it gets used in register_netdevice(). Suggested-by: David Miller <davem@davemloft.net> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:24:19 -04:00
Willem de Bruijn	a8c744a8b4	udp: disable gso with no_check_tx Syzbot managed to send a udp gso packet without checksum offload into the gso stack by disabling tx checksum (UDP_NO_CHECK6_TX). This triggered the skb_warn_bad_offload. RIP: 0010:skb_warn_bad_offload+0x2bc/0x600 net/core/dev.c:2658 skb_gso_segment include/linux/netdevice.h:4038 [inline] validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3120 __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3577 dev_queue_xmit+0x17/0x20 net/core/dev.c:3618 UDP_NO_CHECK6_TX sets skb->ip_summed to CHECKSUM_NONE just after the udp gso integrity checks in udp_(v6_)send_skb. Extend those checks to catch and fail in this case. After the integrity checks jump directly to the CHECKSUM_PARTIAL case to avoid reading the no_check_tx flags again (a TOCTTOU race). Fixes: `bec1f6f697` ("udp: generate gso with UDP_SEGMENT") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:20:14 -04:00
Paul Blakey	05cd271fd6	cls_flower: Support multiple masks per priority Currently flower doesn't support inserting filters with different masks on a single priority, even if the actual flows (key + mask) inserted aren't overlapping, as with the use case of offloading openvswitch datapath flows. Instead one must go up one level, and assign different priorities for each mask, which will create a different flower instances. This patch opens flower to support more than one mask per priority, and a single flower instance. It does so by adding another hash table on top of the existing one which will store the different masks, and the filters that share it. The user is left with the responsibility of ensuring non overlapping flows, otherwise precedence is not guaranteed. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:14:15 -04:00
David S. Miller	9908b3630f	Merge branch 'sctp-unify-sctp_make_op_error_fixed-and-sctp_make_op_error_space' Marcelo Ricardo Leitner says: ==================== sctp: unify sctp_make_op_error_fixed and sctp_make_op_error_space These two variants are very close to each other and can be merged to avoid code duplication. That's what this patchset does. First, we allow sctp_init_cause to return errors, which then allow us to add sctp_make_op_error_limited that handles both situations. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 12:09:36 -04:00
Marcelo Ricardo Leitner	8914f4bace	sctp: add sctp_make_op_error_limited and reuse inner functions The idea is quite similar to the old functions, but note that the _fixed function wasn't "fixed" as in that it would generate a packet with a fixed size, but rather limited/bounded to PMTU. Also, now with sctp_mtu_payload(), we have a more accurate limit. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 12:09:35 -04:00
Marcelo Ricardo Leitner	6d3e8aa876	sctp: allow sctp_init_cause to return errors And do so if the skb doesn't have enough space for the payload. This is a preparation for the next patch. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 12:09:35 -04:00
David S. Miller	065662d941	Merge branch 'net-stmmac-dwmac-meson-100M-phy-mode-support-for-AXG-SoC' Yixun Lan says: ==================== net: stmmac: dwmac-meson: 100M phy mode support for AXG SoC Due to the dwmac glue layer register changed, we need to introduce a new compatible name for the Meson-AXG SoC to support for the RMII 100M ethernet PHY. Change since v1 at [1]: - implement set_phy_mode() for each SoC [1] https://lkml.kernel.org/r/20180426160508.29380-1-yixun.lan@amlogic.com ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 11:30:00 -04:00
Yixun Lan	efacb568c9	net: stmmac: dwmac-meson: extend phy mode setting In the Meson-AXG SoC, the phy mode setting of PRG_ETH0 in the glue layer is extended from bit[0] to bit[2:0]. There is no problem if we configure it to the RGMII 1000M PHY mode, since the register setting is coincidentally compatible with previous one, but for the RMII 100M PHY mode, the configuration need to be changed to value - b100. This patch was verified with a RTL8201F 100M ethernet PHY. Signed-off-by: Yixun Lan <yixun.lan@amlogic.com> Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 11:29:59 -04:00
Yixun Lan	7e5d05e18b	dt-bindings: net: meson-dwmac: new compatible name for AXG SoC We need to introduce a new compatible name for the Meson-AXG SoC in order to support the RMII 100M ethernet PHY, since the PRG_ETH0 register of the dwmac glue layer is changed from previous old SoC. Signed-off-by: Yixun Lan <yixun.lan@amlogic.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 11:29:59 -04:00
David S. Miller	90d52d4fd8	Merge branch 'netns-uevent-filtering' Christian Brauner says: ==================== netns: uevent filtering This is the new approach to uevent filtering as discussed (see the threads in [1], [2], and [3]). It only contains non-functional changes. This series deals with with fixing up uevent filtering logic: - uevent filtering logic is simplified - locking time on uevent_sock_list is minimized - tagged and untagged kobjects are handled in separate codepaths - permissions for userspace are fixed for network device uevents in network namespaces owned by non-initial user namespaces Udev is now able to see those events correctly which it wasn't before. For example, moving a physical device into a network namespace not owned by the initial user namespaces before gave: root@xen1:~# udevadm --debug monitor -k calling: monitor monitor will print the received events for: KERNEL - the kernel uevent sender uid=65534, message ignored sender uid=65534, message ignored sender uid=65534, message ignored sender uid=65534, message ignored sender uid=65534, message ignored and now after the discussion and solution in [3] correctly gives: root@xen1:~# udevadm --debug monitor -k calling: monitor monitor will print the received events for: KERNEL - the kernel uevent KERNEL[625.301042] add /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net) KERNEL[625.301109] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net) KERNEL[625.301138] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net) KERNEL[655.333272] remove /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net) Thanks! Christian [1]: https://lkml.org/lkml/2018/4/4/739 [2]: https://lkml.org/lkml/2018/4/26/767 [3]: https://lkml.org/lkml/2018/4/26/738 ==================== Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 10:22:41 -04:00
Christian Brauner	a3498436b3	netns: restrict uevents commit `07e98962fa` ("kobject: Send hotplug events in all network namespaces") enabled sending hotplug events into all network namespaces back in 2010. Over time the set of uevents that get sent into all network namespaces has shrunk. We have now reached the point where hotplug events for all devices that carry a namespace tag are filtered according to that namespace. Specifically, they are filtered whenever the namespace tag of the kobject does not match the namespace tag of the netlink socket. Currently, only network devices carry namespace tags (i.e. network namespace tags). Hence, uevents for network devices only show up in the network namespace such devices are created in or moved to. However, any uevent for a kobject that does not have a namespace tag associated with it will not be filtered and we will broadcast it into all network namespaces. This behavior stopped making sense when user namespaces were introduced. This patch simplifies and fixes couple of things: - Split codepath for sending uevents by kobject namespace tags: 1. Untagged kobjects - uevent_net_broadcast_untagged(): Untagged kobjects will be broadcast into all uevent sockets recorded in uevent_sock_list, i.e. into all network namespacs owned by the intial user namespace. 2. Tagged kobjects - uevent_net_broadcast_tagged(): Tagged kobjects will only be broadcast into the network namespace they were tagged with. Handling of tagged kobjects in 2. does not cause any semantic changes. This is just splitting out the filtering logic that was handled by kobj_bcast_filter() before. Handling of untagged kobjects in 1. will cause a semantic change. The reasons why this is needed and ok have been discussed in [1]. Here is a short summary: - Userspace ignores uevents from network namespaces that are not owned by the intial user namespace: Uevents are filtered by userspace in a user namespace because the received uid != 0. Instead the uid associated with the event will be 65534 == "nobody" because the global root uid is not mapped. This means we can safely and without introducing regressions modify the kernel to not send uevents into all network namespaces whose owning user namespace is not the initial user namespace because we know that userspace will ignore the message because of the uid anyway. I have a) verified that is is true for every udev implementation out there b) that this behavior has been present in all udev implementations from the very beginning. - Thundering herd: Broadcasting uevents into all network namespaces introduces significant overhead. All processes that listen to uevents running in non-initial user namespaces will end up responding to uevents that will be meaningless to them. Mainly, because non-initial user namespaces cannot easily manage devices unless they have a privileged host-process helping them out. This means that there will be a thundering herd of activity when there shouldn't be any. - Removing needless overhead/Increasing performance: Currently, the uevent socket for each network namespace is added to the global variable uevent_sock_list. The list itself needs to be protected by a mutex. So everytime a uevent is generated the mutex is taken on the list. The mutex is held from the creation of the uevent (memory allocation, string creation etc. until all uevent sockets have been handled. This is aggravated by the fact that for each uevent socket that has listeners the mc_list must be walked as well which means we're talking O(n^2) here. Given that a standard Linux workload usually has quite a lot of network namespaces and - in the face of containers - a lot of user namespaces this quickly becomes a performance problem (see "Thundering herd" above). By just recording uevent sockets of network namespaces that are owned by the initial user namespace we significantly increase performance in this codepath. - Injecting uevents: There's a valid argument that containers might be interested in receiving device events especially if they are delegated to them by a privileged userspace process. One prime example are SR-IOV enabled devices that are explicitly designed to be handed of to other users such as VMs or containers. This use-case can now be correctly handled since commit `692ec06d7c` ("netns: send uevent messages"). This commit introduced the ability to send uevents from userspace. As such we can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user namespace of the network namespace of the netlink socket) userspace process make a decision what uevents should be sent. This removes the need to blindly broadcast uevents into all user namespaces and provides a performant and safe solution to this problem. - Filtering logic: This patch filters by owning user namespace of the network namespace a given task resides in and not by user namespace of the task per se. This means if the user namespace of a given task is unshared but the network namespace is kept and is owned by the initial user namespace a listener that is opening the uevent socket in that network namespace can still listen to uevents. - Fix permission for tagged kobjects: Network devices that are created or moved into a network namespace that is owned by a non-initial user namespace currently are send with INVALID_{G,U}ID in their credentials. This means that all current udev implementations in userspace will ignore the uevent they receive for them. This has lead to weird bugs whereby new devices showing up in such network namespaces were not recognized and did not get IPs assigned etc. This patch adjusts the permission to the appropriate {g,u}id in the respective user namespace. This way udevd is able to correctly handle such devices. - Simplify filtering logic: do_one_broadcast() already ensures that only listeners in mc_list receive uevents that have the same network namespace as the uevent socket itself. So the filtering logic in kobj_bcast_filter is not needed (see [3]). This patch therefore removes kobj_bcast_filter() and replaces netlink_broadcast_filtered() with the simpler netlink_broadcast() everywhere. [1]: https://lkml.org/lkml/2018/4/4/739 [2]: https://lkml.org/lkml/2018/4/26/767 [3]: https://lkml.org/lkml/2018/4/26/738 Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 10:22:41 -04:00
Christian Brauner	26045a7b14	uevent: add alloc_uevent_skb() helper This patch adds alloc_uevent_skb() in preparation for follow up patches. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 10:22:40 -04:00
David S. Miller	e33200bc01	Merge branch 'tls-offload-netdev-and-mlx5-support' Boris Pismenny says: ==================== TLS offload, netdev & MLX5 support The following series provides TLS TX inline crypto offload. v1->v2: - Added IS_ENABLED(CONFIG_TLS_DEVICE) and a STATIC_KEY for icsk_clean_acked - File license fix - Fix spelling, comment by DaveW - Move memory allocations out of tls_set_device_offload and other misc fixes, comments by Kiril. v2->v3: - Reversed xmas tree where needed and style fixes - Removed the need for skb_page_frag_refill, per Eric's comment - IPv6 dependency fixes v3->v4: - Remove "inline" from functions in C files - Make clean_acked_data_enabled a static variable and add enable/disable functions to control it. - Remove unnecessary variable initialization mentioned by ShannonN - Rebase over TLS RX - Refactor the tls_software_fallback to reduce the number of variables mentioned by KirilT v4->v5: - Add missing CONFIG_TLS_DEVICE v5->v6: - Move changes to the software implementation into a seperate patch - Fix some checkpatch warnings - GPL export the enable/disable clean_acked_data functions v6->v7: - Use the dst_entry to obtain the netdev in dev_get_by_index - Remove the IPv6 patch since it is redundent now v7->v8: - Fix a merge conflict in mlx5 header v8->v9: - Fix false -Wmaybe-uninitialized warning - Fix empty space in the end of new files v9->v10: - Remove default "n" in net/Kconfig This series adds a generic infrastructure to offload TLS crypto to a network devices. It enables the kernel TLS socket to skip encryption and authentication operations on the transmit side of the data path. Leaving those computationally expensive operations to the NIC. The NIC offload infrastructure builds TLS records and pushes them to the TCP layer just like the SW KTLS implementation and using the same API. TCP segmentation is mostly unaffected. Currently the only exception is that we prevent mixed SKBs where only part of the payload requires offload. In the future we are likely to add a similar restriction following a change cipher spec record. The notable differences between SW KTLS and NIC offloaded TLS implementations are as follows: 1. The offloaded implementation builds "plaintext TLS record", those records contain plaintext instead of ciphertext and place holder bytes instead of authentication tags. 2. The offloaded implementation maintains a mapping from TCP sequence number to TLS records. Thus given a TCP SKB sent from a NIC offloaded TLS socket, we can use the tls NIC offload infrastructure to obtain enough context to encrypt the payload of the SKB. A TLS record is released when the last byte of the record is ack'ed, this is done through the new icsk_clean_acked callback. The infrastructure should be extendable to support various NIC offload implementations. However it is currently written with the implementation below in mind: The NIC assumes that packets from each offloaded stream are sent as plaintext and in-order. It keeps track of the TLS records in the TCP stream. When a packet marked for offload is transmitted, the NIC encrypts the payload in-place and puts authentication tags in the relevant place holders. The responsibility for handling out-of-order packets (i.e. TCP retransmission, qdisc drops) falls on the netdev driver. The netdev driver keeps track of the expected TCP SN from the NIC's perspective. If the next packet to transmit matches the expected TCP SN, the driver advances the expected TCP SN, and transmits the packet with TLS offload indication. If the next packet to transmit does not match the expected TCP SN. The driver calls the TLS layer to obtain the TLS record that includes the TCP of the packet for transmission. Using this TLS record, the driver posts a work entry on the transmit queue to reconstruct the NIC TLS state required for the offload of the out-of-order packet. It updates the expected TCP SN accordingly and transmit the now in-order packet. The same queue is used for packet transmission and TLS context reconstruction to avoid the need for flushing the transmit queue before issuing the context reconstruction request. Expected TCP SN is accessed without a lock, under the assumption that TCP doesn't transmit SKBs from different TX queue concurrently. If packets are rerouted to a different netdevice, then a software fallback routine handles encryption. Paper: https://www.netdevconf.org/1.2/papers/netdevconf-TLS.pdf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:48 -04:00
Boris Pismenny	f9c8141fc1	MAINTAINERS: Update TLS maintainers Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:48 -04:00
Boris Pismenny	a051505c7e	MAINTAINERS: Update mlx5 innova driver maintainers Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:48 -04:00
Ilya Lesokhin	43585a41bd	net/mlx5e: TLS, Add error statistics Add statistics for rare TLS related errors. Since the errors are rare we have a counter per netdev rather then per SQ. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:48 -04:00
Ilya Lesokhin	bf23974104	net/mlx5e: TLS, Add Innova TLS TX offload data path Implement the TLS tx offload data path according to the requirements of the TLS generic NIC offload infrastructure. Special metadata ethertype is used to pass information to the hardware. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	c83294b9ef	net/mlx5e: TLS, Add Innova TLS TX support Add NETIF_F_HW_TLS_TX capability and expose tlsdev_ops to work with the TLS generic NIC offload infrastructure. The NETIF_F_HW_TLS_TX capability will be added in the next patch. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	1ae1732284	net/mlx5: Accel, Add TLS tx offload interface Add routines for manipulating TLS TX offload contexts. In Innova TLS, TLS contexts are added or deleted via a command message over the SBU connection. The HW then sends a response message over the same connection. Add implementation for Innova TLS (FPGA-based) hardware. These routines will be used by the TLS offload support in a later patch mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs to work directly with mlx5_core rather than Innova FPGA or other mlx5 acceleration providers. In the future, when IPSec/TLS or any other acceleration gets integrated into ConnectX chip, mlx5/accel layer will provide the integrated acceleration, rather than the Innova one. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	bb9094161b	net/mlx5e: Move defines out of ipsec code The defines are not IPSEC specific. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	e8f6979981	net/tls: Add generic NIC offload infrastructure This patch adds a generic infrastructure to offload TLS crypto to a network device. It enables the kernel TLS socket to skip encryption and authentication operations on the transmit side of the data path. Leaving those computationally expensive operations to the NIC. The NIC offload infrastructure builds TLS records and pushes them to the TCP layer just like the SW KTLS implementation and using the same API. TCP segmentation is mostly unaffected. Currently the only exception is that we prevent mixed SKBs where only part of the payload requires offload. In the future we are likely to add a similar restriction following a change cipher spec record. The notable differences between SW KTLS and NIC offloaded TLS implementations are as follows: 1. The offloaded implementation builds "plaintext TLS record", those records contain plaintext instead of ciphertext and place holder bytes instead of authentication tags. 2. The offloaded implementation maintains a mapping from TCP sequence number to TLS records. Thus given a TCP SKB sent from a NIC offloaded TLS socket, we can use the tls NIC offload infrastructure to obtain enough context to encrypt the payload of the SKB. A TLS record is released when the last byte of the record is ack'ed, this is done through the new icsk_clean_acked callback. The infrastructure should be extendable to support various NIC offload implementations. However it is currently written with the implementation below in mind: The NIC assumes that packets from each offloaded stream are sent as plaintext and in-order. It keeps track of the TLS records in the TCP stream. When a packet marked for offload is transmitted, the NIC encrypts the payload in-place and puts authentication tags in the relevant place holders. The responsibility for handling out-of-order packets (i.e. TCP retransmission, qdisc drops) falls on the netdev driver. The netdev driver keeps track of the expected TCP SN from the NIC's perspective. If the next packet to transmit matches the expected TCP SN, the driver advances the expected TCP SN, and transmits the packet with TLS offload indication. If the next packet to transmit does not match the expected TCP SN. The driver calls the TLS layer to obtain the TLS record that includes the TCP of the packet for transmission. Using this TLS record, the driver posts a work entry on the transmit queue to reconstruct the NIC TLS state required for the offload of the out-of-order packet. It updates the expected TCP SN accordingly and transmits the now in-order packet. The same queue is used for packet transmission and TLS context reconstruction to avoid the need for flushing the transmit queue before issuing the context reconstruction request. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00

1 2 3 4 5 ...

752929 commits