linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-11 19:19:42 +00:00

Author	SHA1	Message	Date
Alexei Starovoitov	b5b6ff7302	Merge branch 'bpf-sockmap-fixes' John Fastabend says: ==================== When I added the test_sockmap to selftests I mistakenly changed the test logic a bit. The result of this was on redirect cases we ended up choosing the wrong sock from the BPF program and ended up sending to a socket that had no receive handler. The result was the actual receive handler, running on a different socket, is timing out and closing the socket. This results in errors (-EPIPE to be specific) on the sending side. Typically happening if the sender does not complete the send before the receive side times out. So depending on timing and the size of the send we may get errors. This exposed some bugs in the sockmap error path handling. This series fixes the errors. The primary issue is we did not do proper memory accounting in these cases which resulted in missing a sk_mem_uncharge(). This happened in the redirect path and in one case on the normal send path. See the three patches for the details. The other take-away from this is we need to fix the test_sockmap and also add more negative test cases. That will happen in bpf-next. Finally, I tested this using the existing test_sockmap program, the older sockmap sample test script, and a few real use cases with Cilium. All of these seem to be in working correctly. v2: fix compiler warning, drop iterator variable 'i' that is no longer used in patch 3. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-02 15:30:46 -07:00
John Fastabend	abaeb096ca	bpf: sockmap, fix error handling in redirect failures When a redirect failure happens we release the buffers in-flight without calling a sk_mem_uncharge(), the uncharge is called before dropping the sock lock for the redirecte, however we missed updating the ring start index. When no apply actions are in progress this is OK because we uncharge the entire buffer before the redirect. But, when we have apply logic running its possible that only a portion of the buffer is being redirected. In this case we only do memory accounting for the buffer slice being redirected and expect to be able to loop over the BPF program again and/or if a sock is closed uncharge the memory at sock destruct time. With an invalid start index however the program logic looks at the start pointer index, checks the length, and when seeing the length is zero (from the initial release and failure to update the pointer) aborts without uncharging/releasing the remaining memory. The fix for this is simply to update the start index. To avoid fixing this error in two locations we do a small refactor and remove one case where it is open-coded. Then fix it in the single function. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-02 15:30:45 -07:00
John Fastabend	fec51d40ea	bpf: sockmap, zero sg_size on error when buffer is released When an error occurs during a redirect we have two cases that need to be handled (i) we have a cork'ed buffer (ii) we have a normal sendmsg buffer. In the cork'ed buffer case we don't currently support recovering from errors in a redirect action. So the buffer is released and the error should _not_ be pushed back to the caller of sendmsg/sendpage. The rationale here is the user will get an error that relates to old data that may have been sent by some arbitrary thread on that sock. Instead we simple consume the data and tell the user that the data has been consumed. We may add proper error recovery in the future. However, this patch fixes a bug where the bytes outstanding counter sg_size was not zeroed. This could result in a case where if the user has both a cork'ed action and apply action in progress we may incorrectly call into the BPF program when the user expected an old verdict to be applied via the apply action. I don't have a use case where using apply and cork at the same time is valid but we never explicitly reject it because it should work fine. This patch ensures the sg_size is zeroed so we don't have this case. In the normal sendmsg buffer case (no cork data) we also do not zero sg_size. Again this can confuse the apply logic when the logic calls into the BPF program when the BPF programmer expected the old verdict to remain. So ensure we set sg_size to zero here as well. And additionally to keep the psock state in-sync with the sk_msg_buff release all the memory as well. Previously we did this before returning to the user but this left a gap where psock and sk_msg_buff states were out of sync which seems fragile. No additional overhead is taken here except for a call to check the length and realize its already been freed. This is in the error path as well so in my opinion lets have robust code over optimized error paths. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-02 15:30:45 -07:00
John Fastabend	3cc9a472d6	bpf: sockmap, fix scatterlist update on error path in send with apply When the call to do_tcp_sendpage() fails to send the complete block requested we either retry if only a partial send was completed or abort if we receive a error less than or equal to zero. Before returning though we must update the scatterlist length/offset to account for any partial send completed. Before this patch we did this at the end of the retry loop, but this was buggy when used while applying a verdict to fewer bytes than in the scatterlist. When the scatterlist length was being set we forgot to account for the apply logic reducing the size variable. So the result was we chopped off some bytes in the scatterlist without doing proper cleanup on them. This results in a WARNING when the sock is tore down because the bytes have previously been charged to the socket but are never uncharged. The simple fix is to simply do the accounting inside the retry loop subtracting from the absolute scatterlist values rather than trying to accumulate the totals and subtract at the end. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-02 15:30:44 -07:00
Eric Dumazet	7df40c2673	net_sched: fq: take care of throttled flows before reuse Normally, a socket can not be freed/reused unless all its TX packets left qdisc and were TX-completed. However connect(AF_UNSPEC) allows this to happen. With commit `fc59d5bdf1` ("pkt_sched: fq: clear time_next_packet for reused flows") we cleared f->time_next_packet but took no special action if the flow was still in the throttled rb-tree. Since f->time_next_packet is the key used in the rb-tree searches, blindly clearing it might break rb-tree integrity. We need to make sure the flow is no longer in the rb-tree to avoid this problem. Fixes: `fc59d5bdf1` ("pkt_sched: fq: clear time_next_packet for reused flows") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:37:38 -04:00
Ido Schimmel	30ca22e4a5	ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6" This reverts commit `edd7ceb782` ("ipv6: Allow non-gateway ECMP for IPv6"). Eric reported a division by zero in rt6_multipath_rebalance() which is caused by above commit that considers identical local routes to be siblings. The division by zero happens because a nexthop weight is not set for local routes. Revert the commit as it does not fix a bug and has side effects. To reproduce: # ip -6 address add 2001:db8::1/64 dev dummy0 # ip -6 address add 2001:db8::1/64 dev dummy1 Fixes: `edd7ceb782` ("ipv6: Allow non-gateway ECMP for IPv6") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:33:02 -04:00
David S. Miller	5693ee4ba3	Merge branch 'r8169-series-with-further-improvements' Heiner Kallweit says: ==================== r8169: series with further improvements I thought I'm more or less done with the basic refactoring. But again I stumbled across things that can be improved / simplified. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:50 -04:00
Heiner Kallweit	4ff3646628	r8169: replace get_protocol with vlan_get_protocol This patch is basically the same as `6e74d1749a` ("r8152: replace get_protocol with vlan_get_protocol"). Use vlan_get_protocol instead of duplicating the functionality. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	353af85ed8	r8169: avoid potentially misaligned access when getting mac address Interpreting a member of an u16 array as u32 may result in a misaligned access. Also it's not really intuitive to define a mac address variable as array of three u16 words. Therefore use an array of six bytes that is properly aligned for 32 bit access. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	ff1d733155	r8169: improve PCI config space access Some chips have a non-zero function id, however instead of hardcoding the id's (CSIAR_FUNC_NIC and CSIAR_FUNC_NIC2) we can get them dynamically via PCI_FUNC(pci_dev->devfn). This way we can get rid of the csi_ops. In general csi is just a fallback mechanism for PCI config space access in case no native access is supported. Therefore let's try native access first. I checked with Realtek regarding the functionality of config space byte 0x070f and according to them it controls the L0s/L1 entrance latency. Currently ASPM is disabled in general and therefore this value isn't used. However we may introduce a whitelist for chips where ASPM is known to work, therefore let's keep this code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	eda40b8cbe	r8169: drop rtl_generic_op Only two places are left where rtl_generic_op() is used, so we can inline it and simplify the code a little. This change also avoids the overhead of unlocking/locking in case the respective operation isn't set. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	b2d43e6ecf	r8169: replace longer if statements with switch statements Some longer if statements can be simplified by using switch statements instead. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	2a71883c4f	r8169: simplify code by using ranges in switch clauses Several switch statements can be significantly simplified by using case ranges. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	4f447d2969	r8169: drop member pll_power_ops from struct rtl8169_private After merging r810x_pll_power_down/up and r8168_pll_power_down/up we don't need member pll_power_ops any longer and can drop it, thus simplifying the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	73570bf19f	r8169: merge r810x_pll_power_down/up into r8168_pll_power_down/up r810x_pll_power_down/up and r8168_pll_power_down/up have a lot in common, so we can simplify the code by merging the former into the latter. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	40242e232e	r8169: remove 810x_phy_power_up/down The functionality of 810x_phy_power_up/down is covered by the default clause in 8168_phy_power_up/down. Therefore we don't need these functions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Heiner Kallweit	6851d025e5	r8169: remove unneeded check in r8168_pll_power_down RTL_GIGA_MAC_VER_23/24 are configured by rtl_hw_start_8168cp_2() and rtl_hw_start_8168cp_3() respectively which both apply CPCMD_QUIRK_MASK, thus clearing bit ASF. Bit ASF isn't set at any other place in the driver, therefore this check can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 16:23:49 -04:00
Helge Deller	8d73b18079	parisc: Fix section mismatches Fix three section mismatches: 1) Section mismatch in reference from the function ioread8() to the function .init.text:pcibios_init_bridge() 2) Section mismatch in reference from the function free_initmem() to the function .init.text:map_pages() 3) Section mismatch in reference from the function ccio_ioc_init() to the function .init.text:count_parisc_driver() Signed-off-by: Helge Deller <deller@gmx.de>	2018-05-02 21:47:35 +02:00
Helge Deller	b819439fea	parisc: drivers.c: Fix section mismatches Fix two section mismatches in drivers.c: 1) Section mismatch in reference from the function alloc_tree_node() to the function .init.text:create_tree_node(). 2) Section mismatch in reference from the function walk_native_bus() to the function .init.text:alloc_pa_dev(). Signed-off-by: Helge Deller <deller@gmx.de>	2018-05-02 21:47:27 +02:00
Alexei Starovoitov	0f58e58e28	Merge branch 'x86-bpf-jit-fixes' Daniel Borkmann says: ==================== Fix two memory leaks in x86 JIT. For details, please see individual patches in this series. Thanks! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-02 12:35:48 -07:00
Daniel Borkmann	39f56ca945	bpf, x64: fix memleak when not converging on calls The JIT logic in jit_subprogs() is as follows: for all subprogs we allocate a bpf_prog_alloc(), populate it (prog->is_func = 1 here), and pass it to bpf_int_jit_compile(). If a failure occurred during JIT and prog->jited is not set, then we bail out from attempting to JIT the whole program, and punt to the interpreter instead. In case JITing went successful, we fixup BPF call offsets and do another pass to bpf_int_jit_compile() (extra_pass is true at that point) to complete JITing calls. Given that requires to pass JIT context around addrs and jit_data from x86 JIT are freed in the extra_pass in bpf_int_jit_compile() when calls are involved (if not, they can be freed immediately). However, if in the original pass, the JIT image didn't converge then we leak addrs and jit_data since image itself is NULL, the prog->is_func is set and extra_pass is false in that case, meaning both will become unreachable and are never cleaned up, therefore we need to free as well on !image. Only x64 JIT is affected. Fixes: `1c2a088a66` ("bpf: x64: add JIT support for multi-function programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-02 12:35:47 -07:00
Daniel Borkmann	3aab8884c9	bpf, x64: fix memleak when not converging after image While reviewing x64 JIT code, I noticed that we leak the prior allocated JIT image in the case where proglen != oldproglen during the JIT passes. Prior to the commit `e0ee9c1215` ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") we would just break out of the loop, and using the image as the JITed prog since it could only shrink in size anyway. After `e0ee9c1215`, we would bail out to out_addrs label where we free addrs and jit_data but not the image coming from bpf_jit_binary_alloc(). Fixes: `e0ee9c1215` ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-02 12:35:47 -07:00
David S. Miller	77ec3a0e2d	Merge branch 'net-smc-small-features' Ursula Braun says: ==================== net/smc: small features 2018/04/30 here are 4 smc patches for net-next covering small new features in different areas: * link health check * diagnostics for IPv6 smc sockets * ioctl * improvement for vlan determination v2 changes: * better title * patch 2 - remove compile problem for disabled CONFIG_IPV6 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:13 -04:00
Ursula Braun	cb9d43f677	net/smc: determine vlan_id of stacked net_device An SMC link group is bound to a specific vlan_id. Its link uses the RoCE-GIDs established for the specific vlan_id. This patch makes sure the appropriate vlan_id is determined for stacked scenarios like for instance a master bonding device with vlan devices enslaved. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:12 -04:00
Ursula Braun	9b67e26f93	net/smc: handle ioctls SIOCINQ, SIOCOUTQ, and SIOCOUTQNSD SIOCINQ returns the amount of unread data in the RMB. SIOCOUTQ returns the amount of unsent or unacked sent data in the send buffer. SIOCOUTQNSD returns the amount of data prepared for sending, but not yet sent. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:12 -04:00
Karsten Graul	ed75986f4a	net/smc: ipv6 support for smc_diag.c Update smc_diag.c to support ipv6 addresses on the diagnosis interface. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:12 -04:00
Karsten Graul	877ae5be42	net/smc: periodic testlink support Add periodic LLC testlink support to ensure the link is still active. The interval time is initialized using the value of sysctl_tcp_keepalive_time. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:29:12 -04:00
Ursula Braun	784813aed6	net/smc: restrict non-blocking connect finish The smc_poll code tries to finish connect() if the socket is in state SMC_INIT and polling of the internal CLC-socket returns with EPOLLOUT. This makes sense for a select/poll call following a connect call, but not without preceding connect(). With this patch smc_poll starts connect logic only, if the CLC-socket is no longer in its initial state TCP_CLOSE. In addition, a poll error on the internal CLC-socket is always propagated to the SMC socket. With this patch the code path mentioned by syzbot https://syzkaller.appspot.com/bug?extid=03faa2dc16b8b64be396 is no longer possible. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reported-by: syzbot+03faa2dc16b8b64be396@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:27:19 -04:00
Ingo Molnar	af3e0fcf78	8139too: Use disable_irq_nosync() in rtl8139_poll_controller() Use disable_irq_nosync() instead of disable_irq() as this might be called in atomic context with netpoll. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:22:06 -04:00
David S. Miller	e90c1a1090	Merge branch 'mlxsw-Reject-unsupported-FIB-configurations' Ido Schimmel says: ==================== mlxsw: Reject unsupported FIB configurations Recently it became possible for listeners of the FIB notification chain to veto operations such as addition of routes and rules. Adjust the mlxsw driver to take advantage of it and return an error for unsupported FIB rules and for routes configured after the abort mechanism was triggered (due to exceeded resources for example). v2: * Change error code in first patch to -EOPNOTSUPP (David Ahern). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:15:18 -04:00
Ido Schimmel	50d10711cf	mlxsw: spectrum_router: Return an error for routes added after abort We currently do not perform accounting in the driver and thus can't reject routes before resources are exceeded. However, in order to make users aware of the fact that routes are no longer offloaded we can return an error for routes configured after the abort mechanism was triggered. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:15:17 -04:00
Ido Schimmel	6290182b2b	mlxsw: spectrum_router: Return an error for non-default FIB rules Since commit `9776d32537` ("net: Move call_fib_rule_notifiers up in fib_nl_newrule") it is possible to forbid the installation of unsupported FIB rules. Have mlxsw return an error for non-default FIB rules in addition to the existing extack message. Example: # ip rule add from 198.51.100.1 table 10 Error: mlxsw_spectrum: FIB rules not supported. Note that offload is only aborted when non-default FIB rules are already installed and merely replayed during module initialization. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:15:17 -04:00
Ganesh Goudar	794451c1b5	cxgb4: add new T5 device id's Add device id's 0x5019, 0x501a and 0x501b for T5 cards. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 13:03:37 -04:00
Quentin Monnet	6f96674dbd	bpf: relax constraints on formatting for eBPF helper documentation The Python script used to parse and extract eBPF helpers documentation from include/uapi/linux/bpf.h expects a very specific formatting for the descriptions (single dot represents a space, '>' stands for a tab): /* ... .int bpf_helper(list of arguments) .> Description .> > Start of description .> > Another line of description .> > And yet another line of description .> Return .> > 0 on success, or a negative error in case of failure ... / This is too strict, and painful for developers who wants to add documentation for new helpers. Worse, it is extremely difficult to check that the formatting is correct during reviews. Change the format expected by the script and make it more flexible. The script now works whether or not the initial space (right after the star) is present, and accepts both tabs and white spaces (or a combination of both) for indenting description sections and contents. Concretely, something like the following would now be supported: /* ... int bpf_helper(list of arguments) ......Description .> > Start of description... > > Another line of description ..............And yet another line of description > Return .> ........0 on success, or a negative error in case of failure ... / While at it, remove unnecessary carets from each regex used with match() in the script. They are redundant, as match() tries to match from the beginning of the string by default. v2: Remove unnecessary caret when a regex is used with match(). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-05-02 17:46:50 +02:00
Xin Long	ce402f044e	sctp: fix the issue that the cookie-ack with auth can't get processed When auth is enabled for cookie-ack chunk, in sctp_inq_pop, sctp processes auth chunk first, then continues to the next chunk in this packet if chunk_end + chunk_hdr size < skb_tail_pointer(). Otherwise, it will go to the next packet or discard this chunk. However, it missed the fact that cookie-ack chunk's size is equal to chunk_hdr size, which couldn't match that check, and thus this chunk would not get processed. This patch fixes it by changing the check to chunk_end + chunk_hdr size <= skb_tail_pointer(). Fixes: `26b87c7881` ("net: sctp: fix remote memory pressure from excessive queueing") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 11:15:33 -04:00
Xin Long	46e16d4b95	sctp: use the old asoc when making the cookie-ack chunk in dupcook_d When processing a duplicate cookie-echo chunk, for case 'D', sctp will not process the param from this chunk. It means old asoc has nothing to be updated, and the new temp asoc doesn't have the complete info. So there's no reason to use the new asoc when creating the cookie-ack chunk. Otherwise, like when auth is enabled for cookie-ack, the chunk can not be set with auth, and it will definitely be dropped by peer. This issue is there since very beginning, and we fix it by using the old asoc instead. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 11:15:33 -04:00
Xin Long	4842a08fb8	sctp: init active key for the new asoc in dupcook_a and dupcook_b When processing a duplicate cookie-echo chunk, for case 'A' and 'B', after sctp_process_init for the new asoc, if auth is enabled for the cookie-ack chunk, the active key should also be initialized. Otherwise, the cookie-ack chunk made later can not be set with auth shkey properly, and a crash can even be caused by this, as after Commit `1b1e0bc994` ("sctp: add refcnt support for sh_key"), sctp needs to hold the shkey when making control chunks. Fixes: `1b1e0bc994` ("sctp: add refcnt support for sh_key") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 11:15:32 -04:00
Neal Cardwell	e6e6a278b1	tcp_bbr: fix to zero idle_restart only upon S/ACKed data Previously the bbr->idle_restart tracking was zeroing out the bbr->idle_restart bit upon ACKs that did not SACK or ACK anything, e.g. receiving incoming data or receiver window updates. In such situations BBR would forget that this was a restart-from-idle situation, and if the min_rtt had expired it would unnecessarily enter PROBE_RTT (even though we were actually restarting from idle but had merely forgotten that fact). The fix is simple: we need to remember we are restarting from idle until we receive a S/ACK for some data (a S/ACK for the first flight of data we send as we are restarting). This commit is a stable candidate for kernels back as far as 4.9. Fixes: `0f8782ea14` ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 11:12:32 -04:00
Kees Cook	8ac60ffb9a	net: stmmac: Avoid VLA usage In the quest to remove all stack VLAs from the kernel[1], this switches the "status" stack buffer to use the existing small (8) upper bound on how many queues can be checked for DMA, and adds a sanity-check just to make sure it doesn't operate under pathological conditions. [1] http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 11:11:30 -04:00
Grygorii Strashko	5e5add172e	net: ethernet: ti: cpsw: fix packet leaking in dual_mac mode In dual_mac mode packets arrived on one port should not be forwarded by switch hw to another port. Only Linux Host can forward packets between ports. The below test case (reported in [1]) shows that packet arrived on one port can be leaked to anoter (reproducible with dual port evms): - connect port 1 (eth0) to linux Host 0 and run tcpdump or Wireshark - connect port 2 (eth1) to linux Host 1 with vlan 1 configured - ping <IPx> from Host 1 through vlan 1 interface. ARP packets will be seen on Host 0. Issue happens because dual_mac mode is implemnted using two vlans: 1 (Port 1+Port 0) and 2 (Port 2+Port 0), so there are vlan records created for for each vlan. By default, the ALE will find valid vlan record in its table when vlan 1 tagged packet arrived on Port 2 and so forwards packet to all ports which are vlan 1 members (like Port. To avoid such behaviorr the ALE VLAN ID Ingress Check need to be enabled for each external CPSW port (ALE_PORTCTLn.VID_INGRESS_CHECK) so ALE will drop ingress packets if Rx port is not VLAN member. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 11:08:23 -04:00
Raghu Vatsavayi	795d8098d3	liquidio VF: indicate that disabling rx vlan offload is not allowed NIC firmware does not support disabling rx vlan offload, but the VF driver incorrectly indicates that it is supported. The PF driver already does the correct indication by clearing the NETIF_F_HW_VLAN_CTAG_RX bit in its netdev->hw_features. So just do the same thing in the VF. Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Acked-by: Prasad Kanneganti <prasad.kanneganti@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 11:07:22 -04:00
Sean Tranchetti	6c035ba7e7	udp: Complement partial checksum for GSO packet Using the udp_v4_check() function to calculate the pseudo header for the newly segmented UDP packets results in assigning the complement of the value to the UDP header checksum field. Always undo the complement the partial checksum value in order to match the case where GSO is not used on the UDP transmit path. Fixes: `ee80d1ebe5` ("udp: add udp gso") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 10:59:32 -04:00
Michael S. Tsirkin	c818aa88d2	Revert "vhost: make msg padding explicit" This reverts commit 93c0d549c4c5a7382ad70de6b86610b7aae57406. Unfortunately the padding will break 32 bit userspace. Ouch. Need to add some compat code, revert for now. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-02 10:27:18 -04:00
Ingo Molnar	a2c7a98301	x86/bpf: Clean up non-standard comments, to make the code more readable So by chance I looked into x86 assembly in arch/x86/net/bpf_jit_comp.c and noticed the weird and inconsistent comment style it mistakenly learned from the networking code: /* Multi-line comment ... * ... looks like this. / Fix this to use the standard comment style specified in Documentation/CodingStyle and used in arch/x86/ as well: / * Multi-line comment ... * ... looks like this. / Also, to quote Linus's ... more explicit views about this: http://article.gmane.org/gmane.linux.kernel.cryptoapi/21066 > But no, the networking code picked none* of the above sane formats. > Instead, it picked these two models that are just half-arsed > shit-for-brains: > > (no) > /* This is disgusting drug-induced > * crap, and should die > / > > (no-no-no) > / This is also very nasty > * and visually unbalanced / > > Please. The networking code actually has the worst* possible comment > style. You can literally find that (no-no-no) style, which is just > really horribly disgusting and worse than the otherwise fairly similar > (d) in pretty much every way. Also improve the comments and some other details while at it: - Don't mix same-line and previous-line comment style on otherwise identical code patterns within the same function, - capitalize 'BPF' and x86 register names consistently, - capitalize sentences consistently, - instead of 'x64' use 'x86-64': x64 is a Microsoft specific term, - use more consistent punctuation, - use standard coding style in macros as well, - fix typos and a few other minor details. Consistent coding style is not optional, at least in arch/x86/. No change in functionality. ( In case this commit causes conflicts with pending development code I'll be glad to help resolve any conflicts! ) Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@fb.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-05-02 16:12:02 +02:00
Michel Dänzer	892a0be43e	swiotlb: fix inversed DMA_ATTR_NO_WARN test The result was printing the warning only when we were explicitly asked not to. Cc: stable@vger.kernel.org Fixes: `0176adb004` "swiotlb: refactor coherent buffer allocation" Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-02 14:48:55 +02:00
Linus Torvalds	2d618bdf71	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel Pull hexagon fixes from Richard Kuo: "Some small fixes for module compilation" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel: hexagon: export csum_partial_copy_nocheck hexagon: add memset_io() helper	2018-05-01 19:54:22 -07:00
John Hurley	50a5852a65	nfp: flower: set tunnel ttl value to net default Firmware requires that the ttl value for an encapsulating ipv4 tunnel header be included as an action field. Prior to the support of Geneve tunnel encap (when ttl set was removed completely), ttl value was extracted from the tunnel key. However, tests have shown that this can still produce a ttl of 0. Fix the issue by setting the namespace default value for each new tunnel. Follow up patch for net-next will do a full route lookup. Fixes: `3ca3059dc3` ("nfp: flower: compile Geneve encap actions") Fixes: `b27d6a95a7` ("nfp: compile flower vxlan tunnel set actions") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 18:59:57 -04:00
Dave Watson	c212d2c7fc	net/tls: Don't recursively call push_record during tls_write_space callbacks It is reported that in some cases, write_space may be called in do_tcp_sendpages, such that we recursively invoke do_tcp_sendpages again: [ 660.468802] ? do_tcp_sendpages+0x8d/0x580 [ 660.468826] ? tls_push_sg+0x74/0x130 [tls] [ 660.468852] ? tls_push_record+0x24a/0x390 [tls] [ 660.468880] ? tls_write_space+0x6a/0x80 [tls] ... tls_push_sg already does a loop over all sending sg's, so ignore any tls_write_space notifications until we are done sending. We then have to call the previous write_space to wake up poll() waiters after we are done with the send loop. Reported-by: Andre Tomt <andre@tomt.net> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 18:57:52 -04:00
Soheil Hassas Yeganeh	702353b538	selftest: add test for TCP_INQ Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 18:56:29 -04:00
Soheil Hassas Yeganeh	b75eba76d3	tcp: send in-queue bytes in cmsg upon read Applications with many concurrent connections, high variance in receive queue length and tight memory bounds cannot allocate worst-case buffer size to drain sockets. Knowing the size of receive queue length, applications can optimize how they allocate buffers to read from the socket. The number of bytes pending on the socket is directly available through ioctl(FIONREAD/SIOCINQ) and can be approximated using getsockopt(MEMINFO) (rmem_alloc includes skb overheads in addition to application data). But, both of these options add an extra syscall per recvmsg. Moreover, ioctl(FIONREAD/SIOCINQ) takes the socket lock. Add the TCP_INQ socket option to TCP. When this socket option is set, recvmsg() relays the number of bytes available on the socket for reading to the application via the TCP_CM_INQ control message. Calculate the number of bytes after releasing the socket lock to include the processed backlog, if any. To avoid an extra branch in the hot path of recvmsg() for this new control message, move all cmsg processing inside an existing branch for processing receive timestamps. Since the socket lock is not held when calculating the size of receive queue, TCP_INQ is a hint. For example, it can overestimate the queue size by one byte, if FIN is received. With this method, applications can start reading from the socket using a small buffer, and then use larger buffers based on the remaining data when needed. V3 change-log: As suggested by David Miller, added loads with barrier to check whether we have multiple threads calling recvmsg in parallel. When that happens we lock the socket to calculate inq. V4 change-log: Removed inline from a static function. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 18:56:29 -04:00

... 2 3 4 5 6 ...

753636 commits