Commit graph

692254 commits

Author SHA1 Message Date
Eric Dumazet
2dda640040 net: fix keepalive code vs TCP_FASTOPEN_CONNECT
syzkaller was able to trigger a divide by 0 in TCP stack [1]

Issue here is that keepalive timer needs to be updated to not attempt
to send a probe if the connection setup was deferred using
TCP_FASTOPEN_CONNECT socket option added in linux-4.11

[1]
 divide error: 0000 [#1] SMP
 CPU: 18 PID: 0 Comm: swapper/18 Not tainted
 task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000
 RIP: 0010:[<ffffffff8409cc0d>]  [<ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160
 Call Trace:
  <IRQ>
  [<ffffffff8409d951>] tcp_transmit_skb+0x11/0x20
  [<ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0
  [<ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160
  [<ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230
  [<ffffffff83b3f799>] call_timer_fn+0x39/0xf0
  [<ffffffff83b40797>] run_timer_softirq+0x1d7/0x280
  [<ffffffff83a04ddb>] __do_softirq+0xcb/0x257
  [<ffffffff83ae03ac>] irq_exit+0x9c/0xb0
  [<ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80
  [<ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90
  <EOI>
  [<ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0
  [<ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0

Tested:

Following packetdrill no longer crashes the kernel

`echo 0 >/proc/sys/net/ipv4/tcp_timestamps`

// Cache warmup: send a Fast Open cookie request
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
   +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress)
   +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop>
 +.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop>
   +0 > . 1:1(0) ack 1
   +0 close(3) = 0
   +0 > F. 1:1(0) ack 1
   +0 < F. 1:1(0) ack 2 win 92
   +0 > .  2:2(0) ack 2

   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
   +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
   +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
   +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
 +.01 connect(4, ..., ...) = 0
   +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0
   +10 close(4) = 0

`echo 1 >/proc/sys/net/ipv4/tcp_timestamps`

Fixes: 19f6d3f3c8 ("net/tcp-fastopen: Add new API support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:34:51 -07:00
David S. Miller
4d2bbb0ef8 Here is a batman-adv bugfix:
- fix TT sync flag inconsistency problems, which can lead to excess packets,
    by Linus Luessing
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlmB3ikWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoUCUEADIvPfBxzon12xBV4JbBpTMnIwl
 09heUwE1FxCj2UoTV2QJ+O9ZQb1yc7ARJksymP30T96q9bNlDiNFVquXcuQLA+ba
 aoSowejLfTHLvOlP3Wrik5PVLBdKqeBpg7zYNtAuNI09DSryey1ipaC4Xemq9ZmC
 8Hjam9RAQh5FV3RCeviTzpusXt31/xjkTx3vrf1WGx2uY/MSKNwwfPcRAyfG+Ye5
 9kxpR8sBwRxJ+6+MEWRBjqi2Lk/BuD4urViAwGj88uDkKKByXT/pfULD3foTgx0d
 smZWWMun9jQl4XDk321oOp5HOqg29IFcF/mwuWQIvBDO8AJBRndYHw07lx0cB5tD
 I8PzTG5+kE3QXBeTBEUAuszaavDI9DKVdnr6h/8VbeEXIr73WHA0VixnTIv3zgI0
 AiYHIrvUstDB7Cl5457blKbljgzXMN1ssBtD/UMNFhrEUKet9qIZskwlLyhPt2s3
 qti70ZoURPz9ve57pNgZDZUR5lvvUhuSNUD4ZfGt5BC+YErX59jAswrRXHU5Zx/i
 V5pSsRQ2BfxZUZrnmKONg3RW+hmsJJYOSzNXiO5/0zeVuorju7C+0L2goONX2bpr
 fEzICfDJi+IXmYke0AAjiJq7KwobU703Nd3rT8eovjVlSSJj+mdyYSVYQF9dD2uE
 DdtwpJh37PwbHOeQqg==
 =a4HS
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-for-davem-20170802' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is a batman-adv bugfix:

 - fix TT sync flag inconsistency problems, which can lead to excess packets,
   by Linus Luessing
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:23:11 -07:00
Yuchung Cheng
ed254971ed tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states
If the sender switches the congestion control during ECN-triggered
cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to
the ssthresh value calculated by the previous congestion control. If
the previous congestion control is BBR that always keep ssthresh
to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe
step is to avoid assigning invalid ssthresh value when recovery ends.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:51:07 -07:00
Thomas Falcon
4d96f12a07 ibmvnic: Initialize SCRQ's during login renegotiation
SCRQ resources are freed during renegotiation, but they are not
re-allocated afterwards due to some changes in the initialization
process. Fix that by re-allocating the memory after renegotation.

SCRQ's can also be freed if a server capabilities request fails.
If this were encountered during a device reset for example,
SCRQ's may not be re-allocated. This operation is not necessary
anymore so remove it.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:47:45 -07:00
Hector Martin
bed9ff1659 usb: qmi_wwan: add D-Link DWM-222 device ID
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:46:47 -07:00
David S. Miller
db803a46ae Merge branch 'mlx4-misc-fixes'
Tariq Toukan says:

====================
mlx4 misc fixes

This patchset contains misc bug fixes from the team
to the mlx4 Core and Eth drivers.

Patch 1 by Inbar fixes a wrong ethtool indication for Wake-on-LAN.
The other 3 patches by Jack add a missing capability description,
and fixes the off-by-1 misalignment for the following capabilities
descriptions.

Series generated against net commit:
cc75f8514d samples/bpf: fix bpf tunnel cleanup
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:10 -07:00
Jack Morgenstein
bff0c6840c net/mlx4_core: Fixes missing capability bit in flags2 capability dump
The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:

    QUERY_DEV_CAP_DIAG_RPRT_PER_PORT

However, it failed to introduce a corresponding entry in function
dump_dev_cap_flags2() for outputting a line in the message log
when this capability bit is set.

The change here fixes that omission.

Fixes: c7c122ed67 ("net/mlx4: Add diagnostic counters capability bit")
Reported-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:09 -07:00
Jack Morgenstein
f9fb9d0b85 net/mlx4_core: Fix namespace misalignment in QinQ VST support commit
The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:

    MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP

However the value of MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP needs to stay
consistent with the value used in another namespace in
function dump_dev_cap_flags2(), which is manually kept in sync.
The change here restores that consistency.

Fixes: 7c3d21c815 ("net/mlx4_core: Preparation for VF vlan protocol 802.1ad")
Reported-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:09 -07:00
Jack Morgenstein
5886259c12 net/mlx4_core: Fix sl_to_vl_change bit offset in flags2 dump
The index value in function dump_dev_cap_flags2() for outputting
"sl to vl mapping table change event support" needs to be
consistent with the value of the enumerated constant
MLX4_DEV_CAP_FLAG2_SL_TO_VL_CHANGE_EVENT defined in file
include/linux/mlx4_device.h

The change here restores that consistency.

Fixes: fd10ed8e6f ("IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets")
Reported-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:09 -07:00
Inbar Karmy
c994f778bb net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support
Currently when WoL is supported but disabled, ethtool reports:
"Supports Wake-on: d".
Fix the indication of Wol support, so that the indication
remains "g" all the time if the NIC supports WoL.

Tested:
As accepted, when NIC supports WoL- ethtool reports:
	Supports Wake-on: g
	Wake-on: d
when NIC doesn't support WoL- ethtool reports:
        Supports Wake-on: d
        Wake-on: d

Fixes: 14c07b1358 ("mlx4: Wake on LAN support")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:09 -07:00
David S. Miller
9075bd206c Merge branch 'lan78xx-Fixes'
Nisar Sayed says:

====================
lan78xx: Fixes to lan78xx driver

This series of patches are for lan78xx driver.

These patches fixes potential issues associated with lan78xx driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:39:58 -07:00
Nisar Sayed
0573f94b1a lan78xx: Fix to handle hard_header_len update
Fix to handle hard_header_len update

When ifconfig up/down sequence is initiated hard_header_len
get updated incrementally for each ifconfig up /down sequence,
this leads invalid hard_header_len, moving to lan78xx_bind
to have one time update of hard_header_len addresses the issue.

Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:39:58 -07:00
Nisar Sayed
fb52c3b597 lan78xx: USB fast connect/disconnect crash fix
USB fast connect/disconnect crash fix

When USB plugged/unplugged at fast rate,
lan78xx_mdio_init() in lan78xx_bind() failing case is not handled.
Whenever  lan78xx_mdio_init() failed, dev->mdiobus will be freed, however
since lan78xx_bind() not consider as error and try to proceed for
further initialization in lan78xx_probe() which leads system hung/crash.
Also when register_netdev() failed, netdev is freed without calling lan78xx_unbind().
Hence halting the failed cases right manner fixes the system crash/hung issue.

Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:39:58 -07:00
David S. Miller
12c5d0c048 Merge branch 'net-Fix-64-bit-statistics-seqcount-init'
Florian Fainelli says:

====================
drivers: net: Fix 64-bit statistics seqcount init

This patch series fixes a bunch of drivers to have their 64-bit statistics
seqcount cookie be initialized correctly. Most of these drivers (except b44,
gtp) are probably used on 64-bit only hosts and so the lockdep splat might have
never been seen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 20:06:07 -07:00
Florian Fainelli
87173cd6cf ipvlan: Fix 64-bit statistics seqcount initialization
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that by using the proper helper function: netdev_alloc_pcpu_stats().

Fixes: 2ad7bf3638 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 20:06:07 -07:00
Florian Fainelli
4a0dee1ffe netvsc: Initialize 64-bit stats seqcount
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that. In commit 6c80f3fc23 ("netvsc: report per-channel stats in
ethtool statistics") netdev_alloc_pcpu_stats() was removed in favor of
open-coding the 64-bits statistics, except that u64_stats_init() was
missed.

Fixes: 6c80f3fc23 ("netvsc: report per-channel stats in ethtool statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 20:06:07 -07:00
Florian Fainelli
790cb2ebb3 gtp: Initialize 64-bit per-cpu stats correctly
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that by using netdev_alloc_pcpu_stats() instead of an open coded
allocation.

Fixes: 459aa660eb ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 20:06:07 -07:00
Florian Fainelli
7ceb781a1a nfp: Initialize RX and TX ring 64-bit stats seqcounts
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: 4c3523623d ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 20:06:07 -07:00
Florian Fainelli
7c3a4626eb ixgbe: Initialize 64-bit stats seqcounts
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: 4197aa7bb8 ("ixgbevf: provide 64 bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 20:06:07 -07:00
Florian Fainelli
7d6d067790 i40e: Initialize 64-bit statistics TX ring seqcount
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: 980e9b1186 ("i40e: Add support for 64 bit netstats")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 20:06:06 -07:00
Florian Fainelli
e43c9f23ef b44: Initialize 64-bit stats seqcount
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: eeda858552 ("b44: add 64 bit stats")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 20:06:06 -07:00
K. Den
1bff8a0c1f gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP
In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.

However, since b7fe10e5eb ("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.

This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.

Fixes: b7fe10e5eb ("gro: Fix remcsum offload to deal with frags in GRO")
Signed-off-by: Koichiro Den <den@klaipeden.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 16:09:14 -07:00
K. Den
be73b3043b vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP
In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.

However, since b7fe10e5ebac("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.

This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.

Fixes: b7fe10e5eb ("gro: Fix remcsum offload to deal with frags in GRO")
Signed-off-by: Koichiro Den <den@klaipeden.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 16:09:14 -07:00
yujuan.qi
40413955ee Cipso: cipso_v4_optptr enter infinite loop
in for(),if((optlen > 0) && (optptr[1] == 0)), enter infinite loop.

Test: receive a packet which the ip length > 20 and the first byte of ip option is 0, produce this issue

Signed-off-by: yujuan.qi <yujuan.qi@mediatek.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:31:23 -07:00
David S. Miller
fdaa419bb2 Merge branch 'ethernet-ti-cpts-fix-tx-timestamping-timeout'
Grygorii Strashko says:

====================
net: ethernet: ti: cpts: fix tx timestamping timeout

With the low Ethernet connection speed cpdma notification about packet
processing can be received before CPTS TX timestamp event, which is set
when packet actually left CPSW while cpdma notification is sent when packet
pushed in CPSW fifo. As result, when connection is slow and CPU is fast
enough TX timestamping is not working properly.
Issue was discovered using timestamping tool on am57x boards with Ethernet link
speed forced to 100M and on am335x-evm with Ethernet link speed forced to 10M.

Patch3 - This series fixes it by introducing TX SKB queue to store PTP SKBs for
which Ethernet Transmit Event hasn't been received yet and then re-check this
queue with new Ethernet Transmit Events by scheduling CPTS overflow
work more often until TX SKB queue is not empty.

Patch 1,2 - As CPTS overflow work is time critical task it important to ensure
that its scheduling is not delayed. Unfortunately, There could be significant
delay in CPTS work schedule under high system load and on -RT which could cause
CPTS misbehavior due to internal counter overflow and there is no way to tune
CPTS overflow work execution policy and priority manually. The kthread_worker
can be used instead of workqueues, as it creates separate named kthread for
each worker and its its execution policy and priority can be configured
using chrt tool. Instead of modifying CPTS driver itself it was proposed to
it was proposed to add PTP auxiliary worker to the PHC subsystem [1], so
other drivers can benefit from this feature also.

[1] https://www.spinics.net/lists/netdev/msg445392.html

changes in v4:
- fixed memleak in ptp_clock_register()
- undocumented change in cpts_find_ts() moved to separate patch (minor fix)

changes in v3:
- patch 1: added proper error handling in ptp_clock_register.
  minor comments applied.

changes in v2:
- added PTP auxiliary worker to the PHC subsystem

Links
v3: https://www.spinics.net/lists/netdev/msg446058.html
v2: https://www.spinics.net/lists/netdev/msg445859.html
v1: https://www.spinics.net/lists/netdev/msg445387.html
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:22:56 -07:00
Grygorii Strashko
a93439cce2 net: ethernet: ti: cpts: fix fifo read in cpts_find_ts
Now the call chain
 cpts_find_ts()
  |- cpts_fifo_read(cpts, CPTS_EV_PUSH)

will stop reading CPTS FIFO if PUSH event is found. But this is not
expected and CPTS FIFI should be completely drained here. This is most
probably copy-paste error and it has no negative impact as CPTS_EV_PUSH
should not be present in FIFO without TS_PUSH request and
cpts_systim_read() and cpts_find_ts() synchronized by spin_lock.

Correct above by calling cpts_fifo_read() with -1 parameter, so it will
read all CPTS event from FIFO.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:22:55 -07:00
Grygorii Strashko
0d5f54fec0 net: ethernet: ti: cpts: fix tx timestamping timeout
With the low speed Ethernet connection CPDMA notification about packet
processing can be received before CPTS TX timestamp event, which is set
when packet actually left CPSW while cpdma notification is sent when packet
pushed in CPSW fifo.  As result, when connection is slow and CPU is fast
enough TX timestamping is not working properly.

Fix it, by introducing TX SKB queue to store PTP SKBs for which Ethernet
Transmit Event hasn't been received yet and then re-check this queue
with new Ethernet Transmit Events by scheduling CPTS overflow
work more often (every 1 jiffies) until TX SKB queue is not empty.

Side effect of this change is:
 - User space tools require to take into account possible delay in TX
timestamp processing (for example ptp4l works with tx_timestamp_timeout=400
under net traffic and tx_timestamp_timeout=25 in idle).

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:22:55 -07:00
Grygorii Strashko
999f129289 net: ethernet: ti: cpts: convert to use ptp auxiliary worker
There could be significant delay in CPTS work schedule under high system
load and on -RT which could cause CPTS misbehavior due to internal counter
overflow. Usage of own kthread_worker allows to avoid such kind of issues
and makes it possible to tune priority of CPTS kthread_worker thread on -RT
(thread name "cpts").

Hence, the CPTS driver is converted to use PTP auxiliary worker as PHC
subsystem implements such functionality in a generic way now.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:22:55 -07:00
Grygorii Strashko
d9535cb7b7 ptp: introduce ptp auxiliary worker
Many PTP drivers required to perform some asynchronous or periodic work,
like periodically handling PHC counter overflow or handle delayed timestamp
for RX/TX network packets. In most of the cases, such work is implemented
using workqueues. Unfortunately, Kernel workqueues might introduce
significant delay in work scheduling under high system load and on -RT,
which could cause misbehavior of PTP drivers due to internal counter
overflow, for example, and there is no way to tune its execution policy and
priority manuallly.

Hence, The kthread_worker can be used insted of workqueues, as it create
separte named kthread for each worker and its its execution policy and
priority can be configured using chrt tool.

This prblem was reported for two drivers TI CPSW CPTS and dp83640, so
instead of modifying each of these driver it was proposed to add PTP
auxiliary worker to the PHC subsystem.

The patch adds PTP auxiliary worker in PHC subsystem using kthread_worker
and kthread_delayed_work and introduces two new PHC subsystem APIs:

- long (*do_aux_work)(struct ptp_clock_info *ptp) callback in
ptp_clock_info structure, which driver should assign if it require to
perform asynchronous or periodic work. Driver should return the delay of
the PTP next auxiliary work scheduling time (>=0) or negative value in case
further scheduling is not required.

- int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) which
allows schedule PTP auxiliary work.

The name of kthread_worker thread corresponds PTP PHC device name "ptp%d".

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:22:55 -07:00
Linus Torvalds
bc78d646e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle notifier registry failures properly in tun/tap driver, from
    Tonghao Zhang.

 2) Fix bpf verifier handling of subtraction bounds and add a testcase
    for this, from Edward Cree.

 3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.

 4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
    from Cong Wang.

 5) Fix SElinux regression due to recent UDP optimizations, from Paolo
    Abeni.

 6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
    paths, fix from Stefano Brivio.

 7) Fix some mem leaks in dccp, from Xin Long.

 8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
    Bergmann.

 9) Mac address length check and buffer size fixes from Cong Wang.

10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.

11) Fix return value when copy_from_user() fails in
    bpf_prog_get_info_by_fd(), from Daniel Borkmann.

12) Handle PHY_HALTED properly in phy library state machine, from
    Florian Fainelli.

13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.

14) Fix truesize calculation in virtio_net which led to performance
    regressions, from Michael S Tsirkin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  samples/bpf: fix bpf tunnel cleanup
  udp6: fix jumbogram reception
  ppp: Fix a scheduling-while-atomic bug in del_chan
  Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
  virtio_net: fix truesize for mergeable buffers
  mv643xx_eth: fix of_irq_to_resource() error check
  MAINTAINERS: Add more files to the PHY LIBRARY section
  ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
  net: phy: Correctly process PHY_HALTED in phy_stop_machine()
  sunhme: fix up GREG_STAT and GREG_IMASK register offsets
  bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
  tcp: avoid bogus gcc-7 array-bounds warning
  net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
  bpf: don't indicate success when copy_from_user fails
  udp6: fix socket leak on early demux
  net: thunderx: Fix BGX transmit stall due to underflow
  Revert "vhost: cache used event for better performance"
  team: use a larger struct for mac address
  net: check dev->addr_len for dev_set_mac_address()
  phy: bcm-ns-usb3: fix MDIO_BUS dependency
  ...
2017-07-31 22:36:42 -07:00
William Tu
cc75f8514d samples/bpf: fix bpf tunnel cleanup
test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
next geneve tunnelling test case fails.  In addition, the geneve reserved bit
in tcbpf2_kern.c should be zero, according to the RFC.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 22:02:47 -07:00
Paolo Abeni
cb891fa6a1 udp6: fix jumbogram reception
Since commit 67a51780ae ("ipv6: udp: leverage scratch area
helpers") udp6_recvmsg() read the skb len from the scratch area,
to avoid a cache miss.
But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
length exceeds the 16 bits available in the scratch area. As a side
effect the length returned by recvmsg() is:
<ingress datagram len> % (1<<16)

This commit addresses the issue allocating one more bit in the
IP6CB flags field and setting it for incoming jumbograms.
Such field is still in the first cacheline, so at recvmsg()
time we can check it and fallback to access skb->len if
required, without a measurable overhead.

Fixes: 67a51780ae ("ipv6: udp: leverage scratch area helpers")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 22:01:21 -07:00
Gao Feng
ddab82821f ppp: Fix a scheduling-while-atomic bug in del_chan
The PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would
trigger this bug when __sk_free is invoked in atomic context, because of
the call path pptp_sock_destruct->del_chan->synchronize_rcu.

Now move the synchronize_rcu to pptp_release from del_chan. This is the
only one case which would free the sock and need the synchronize_rcu.

The following is the panic I met with kernel 3.3.8, but this issue should
exist in current kernel too according to the codes.

BUG: scheduling while atomic
__schedule_bug+0x5e/0x64
__schedule+0x55/0x580
? ppp_unregister_channel+0x1cd5/0x1de0 [ppp_generic]
? dev_hard_start_xmit+0x423/0x530
? sch_direct_xmit+0x73/0x170
__cond_resched+0x16/0x30
_cond_resched+0x22/0x30
wait_for_common+0x18/0x110
? call_rcu_bh+0x10/0x10
wait_for_completion+0x12/0x20
wait_rcu_gp+0x34/0x40
? wait_rcu_gp+0x40/0x40
synchronize_sched+0x1e/0x20
0xf8417298
0xf8417484
? sock_queue_rcv_skb+0x109/0x130
__sk_free+0x16/0x110
? udp_queue_rcv_skb+0x1f2/0x290
sk_free+0x16/0x20
__udp4_lib_rcv+0x3b8/0x650

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 21:59:01 -07:00
Florian Fainelli
00d51094b8 Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
This reverts commit 28b45910cc ("net: bcmgenet: Remove init parameter
from bcmgenet_mii_config") because in the process of moving from
dev_info() to dev_info_once() we essentially lost the helpful printed
messages once the second instance of the driver is loaded.
dev_info_once() does not actually print the message once per device
instance, but once period.

Fixes: 28b45910cc ("net: bcmgenet: Remove init parameter from bcmgenet_mii_config")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 18:03:36 -07:00
Michael S. Tsirkin
1daa8790d0 virtio_net: fix truesize for mergeable buffers
Seth Forshee noticed a performance degradation with some workloads.
This turns out to be due to packet drops.  Euan Kemp noticed that this
is because we drop all packets where length exceeds the truesize, but
for some packets we add in extra memory without updating the truesize.
This in turn was kept around unchanged from ab7db91705 ("virtio-net:
auto-tune mergeable rx buffer size for improved performance").  That
commit had an internal reason not to account for the extra space: not
enough bits to do it.  No longer true so let's account for the allocated
length exactly.

Many thanks to Seth Forshee for the report and bisecting and Euan Kemp
for debugging the issue.

Fixes: 680557cf79 ("virtio_net: rework mergeable buffer handling")
Reported-by: Euan Kemp <euan.kemp@coreos.com>
Tested-by: Euan Kemp <euan.kemp@coreos.com>
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 18:02:46 -07:00
Sergei Shtylyov
cfbcb61f6c mv643xx_eth: fix of_irq_to_resource() error check
of_irq_to_resource() has recently been  fixed to return negative error #'s
along with 0 in case of failure,  however the Marvell MV643xx Ethernet
driver still only regards 0  as invalid IRQ -- fix it up.

Fixes: 7a4228bbff ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 17:56:47 -07:00
Florian Fainelli
13332db54f MAINTAINERS: Add more files to the PHY LIBRARY section
Include missing files that are provided by, used, or directly maintained
within the PHY LIBRARY, this include uapi header, header files used by
Device Tree code etc.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 17:52:47 -07:00
Ido Schimmel
71ed7ee35a ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
Michał reported a NULL pointer deref during fib_sync_down_dev() when
unregistering a netdevice. The problem is that we don't check for
'in_dev' being NULL, which can happen in very specific cases.

Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or
the inetaddr notification chains. However, if an interface isn't
configured with any IP address, then it's possible for host routes to be
flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in
inetdev_destroy().

To reproduce:
$ ip link add type dummy
$ ip route add local 1.1.1.0/24 dev dummy0
$ ip link del dev dummy0

Fix this by checking for the presence of 'in_dev' before referencing it.

Fixes: 982acb9756 ("ipv4: fib: Notify about nexthop status changes")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 17:51:11 -07:00
Florian Fainelli
7ad813f208 net: phy: Correctly process PHY_HALTED in phy_stop_machine()
Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Fixes: a390d1f379 ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 17:27:10 -07:00
Mark Cave-Ayland
96a734bae0 sunhme: fix up GREG_STAT and GREG_IMASK register offsets
Update the values to match those from the STP2002QFP documentation.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 16:23:05 -07:00
Linus Torvalds
2e7ca2064c Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Several cgroup bug fixes.

   - cgroup core was calling a migration callback on empty migrations,
     which could make cpuset crash.

   - There was a very subtle bug where the controller interface files
     aren't created directly when cgroup2 is mounted. Because later
     operations create them, this bug didn't get noticed earlier.

   - Failed writes to cgroup.subtree_control were incorrectly returning
     zero"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix error return value from cgroup_subtree_control()
  cgroup: create dfl_root files on subsys registration
  cgroup: don't call migration methods if there are no tasks to migrate
2017-07-31 14:03:05 -07:00
Linus Torvalds
ff2620f778 Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
 "Two notable fixes.

   - While adding NUMA affinity support to unbound workqueues, the
     assumption that an unbound workqueue with max_active == 1 is
     ordered was broken.

     The plan was to use explicit alloc_ordered_workqueue() for those
     cases. Unfortunately, I forgot to update the documentation properly
     and we grew a handful of use cases which depend on that assumption.

     While we want to convert them to alloc_ordered_workqueue(), we
     don't really lose anything by enforcing ordered execution on
     unbound max_active == 1 workqueues and it doesn't make sense to
     risk subtle bugs. Restore the assumption.

   - Workqueue assumes that CPU <-> NUMA node mapping remains static.

     This is a general assumption - we don't have any synchronization
     mechanism around CPU <-> node mapping. Unfortunately, powerpc may
     change the mapping dynamically leading to crashes. Michael added a
     workaround so that we at least don't crash while powerpc hotplug
     code gets updated"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Work around edge cases for calc of pool's cpumask
  workqueue: implicit ordered attribute should be overridable
  workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
2017-07-31 13:37:28 -07:00
Linus Torvalds
3dcc4c7d42 Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
 "Dan found a really old bug where libata hotplug code wasn't sanitizing
  index value from userland and may end up indexing with a negative
  number. It is scary but fortunately can only be triggered by root.

  Other than that, minor fixes"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: fix a couple of doc build warnings
  libata: array underflow in ata_find_dev()
  ata: sata_rcar: add gen[23] fallback compatibility strings
  libata: remove unused rc in ata_eh_handle_port_resume
  libata: Cleanup ata_read_log_page()
  ata: fix gemini Kconfig dependencies
2017-07-31 13:33:21 -07:00
Jonathan Corbet
2f60e1ab2f libata: fix a couple of doc build warnings
The kerneldoc comments for a couple of functions in drivers/ata/libata-eh.c
had fallen behind the current implementation, resulting in these doc build
warnings:

  ./drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
  ./drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
  ./drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
  ./drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'

Update the comments and make the warnings go away.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-31 08:03:06 -07:00
Linus Lüssing
54e22f265e batman-adv: fix TT sync flag inconsistencies
This patch fixes an issue in the translation table code potentially
leading to a TT Request + Response storm. The issue may occur for nodes
involving BLA and an inconsistent configuration of the batman-adv AP
isolation feature. However, since the new multicast optimizations, a
single, malformed packet may lead to a mesh-wide, persistent
Denial-of-Service, too.

The issue occurs because nodes are currently OR-ing the TT sync flags of
all originators announcing a specific MAC address via the
translation table. When an intermediate node now receives a TT Request
and wants to answer this on behalf of the destination node, then this
intermediate node now responds with an altered flag field and broken
CRC. The next OGM of the real destination will lead to a CRC mismatch
and triggering a TT Request and Response again.

Furthermore, the OR-ing is currently never undone as long as at least
one originator announcing the according MAC address remains, leading to
the potential persistency of this issue.

This patch fixes this issue by storing the flags used in the CRC
calculation on a a per TT orig entry basis to be able to respond with
the correct, original flags in an intermediate TT Response for one
thing. And to be able to correctly unset sync flags once all nodes
announcing a sync flag vanish for another.

Fixes: e9c00136a4 ("batman-adv: fix tt_global_entries flags update")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Antonio Quartulli <a@unstable.cc>
[sw: typo in commit message]
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-07-31 11:17:38 +02:00
Linus Torvalds
16f73eb02d Linux 4.13-rc3 2017-07-30 12:40:36 -07:00
Linus Torvalds
f137e0b0c5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of x86 fixes:

   - prevent the kernel from using the EFI reboot method when EFI is
     disabled.

   - two patches addressing clang issues"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Disable the address-of-packed-member compiler warning
  x86/efi: Fix reboot_mode when EFI runtime services are disabled
  x86/boot: #undef memcpy() et al in string.c
2017-07-30 12:19:35 -07:00
Linus Torvalds
e4776b8ccb Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "Two patches addressing build warnings caused by inconsistent kernel
  doc comments"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/wait: Clean up some documentation warnings
  sched/core: Fix some documentation build warnings
2017-07-30 11:54:08 -07:00
Linus Torvalds
dbc52a8030 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A couple of fixes for performance counters and kprobes:

   - a series of small patches which make the uncore performance
     counters on Skylake server systems work correctly

   - add a missing instruction slot release to the failure path of
     kprobes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes/x86: Release insn_slot in failure path
  perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs
  perf/x86/intel/uncore: Fix SKX CHA event extra regs
  perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field
  perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask
  perf/x86/intel/uncore: Fix Skylake server PCU PMU event format
  perf/x86/intel/uncore: Fix Skylake UPI PMU event masks
2017-07-30 11:52:15 -07:00
Linus Torvalds
06efc7df37 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
 "Fix for a regression caused by the conversion of x86 to the generic
  hotplug code.

  Instead of doing a plain single line revert, this adds a pile of
  comments so the semantics of the force argument are clear"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
2017-07-30 11:27:33 -07:00