Commit graph

710920 commits

Author SHA1 Message Date
Jiri Pirko
73cb791fe4 mlxsw: spectrum_router: Fix error path in mlxsw_sp_vr_create
[ Upstream commit 0f2d2b2736 ]

Since mlxsw_sp_fib_create() and mlxsw_sp_mr_table_create()
use ERR_PTR macro to propagate int err through return of a pointer,
the return value is not NULL in case of failure. So if one
of the calls fails, one of vr->fib4, vr->fib6 or vr->mr4_table
is not NULL and mlxsw_sp_vr_is_used wrongly assumes
that vr is in use which leads to crash like following one:

[ 1293.949291] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c9
[ 1293.952729] IP: mlxsw_sp_mr_table_flush+0x15/0x70 [mlxsw_spectrum]

Fix this by using local variables to hold the pointers and set vr->*
only in case everything went fine.

Fixes: 76610ebbde ("mlxsw: spectrum_router: Refactor virtual router handling")
Fixes: a3d9bc506d ("mlxsw: spectrum_router: Extend virtual routers with IPv6 support")
Fixes: d42b0965b1 ("mlxsw: spectrum_router: Add multicast routes notification handling functionality")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:15 -08:00
Yuchung Cheng
0ab87ec99e tcp: revert F-RTO extension to detect more spurious timeouts
[ Upstream commit fc68e171d3 ]

This reverts commit 89fe18e44f.

While the patch could detect more spurious timeouts, it could cause
poor TCP performance on broken middle-boxes that modifies TCP packets
(e.g. receive window, SACK options). Since the performance gain is
much smaller compared to the potential loss. The best solution is
to fully revert the change.

Fixes: 89fe18e44f ("tcp: extend F-RTO to catch more spurious timeouts")
Reported-by: Teodor Milkov <tm@del.bg>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:14 -08:00
Yuchung Cheng
cc8dadb8c0 tcp: revert F-RTO middle-box workaround
[ Upstream commit d4131f0977 ]

This reverts commit cc663f4d4c. While fixing
some broken middle-boxes that modifies receive window fields, it does not
address middle-boxes that strip off SACK options. The best solution is
to fully revert this patch and the root F-RTO enhancement.

Fixes: cc663f4d4c ("tcp: restrict F-RTO to work-around broken middle-boxes")
Reported-by: Teodor Milkov <tm@del.bg>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:14 -08:00
Xin Long
36728a6b39 sctp: do not pr_err for the duplicated node in transport rhlist
[ Upstream commit 27af86bb03 ]

The pr_err in sctp_hash_transport was supposed to report a sctp bug
for using rhashtable/rhlist.

The err '-EEXIST' introduced in Commit cd2b708750 ("sctp: check
duplicate node before inserting a new transport") doesn't belong
to that case.

So just return -EEXIST back without pr_err any kmsg.

Fixes: cd2b708750 ("sctp: check duplicate node before inserting a new transport")
Reported-by: Wei Chen <weichen@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:14 -08:00
Ivan Vecera
54d6bc97b4 net/sched: cls_u32: fix cls_u32 on filter replace
[ Upstream commit eb53f7af6f ]

The following sequence is currently broken:

 # tc qdisc add dev foo ingress
 # tc filter replace dev foo protocol all ingress \
   u32 match u8 0 0 action mirred egress mirror dev bar1
 # tc filter replace dev foo protocol all ingress \
   handle 800::800 pref 49152 \
   u32 match u8 0 0 action mirred egress mirror dev bar2
 Error: cls_u32: Key node flags do not match passed flags.
 We have an error talking to the kernel, -1

The error comes from u32_change() when comparing new and
existing flags. The existing ones always contains one of
TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
These flags cannot be passed from userspace so the condition
(n->flags != flags) in u32_change() always fails.

Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
TCA_CLS_FLAGS_IN_HW are not taken into account.

Fixes: 24d3dc6d27 ("net/sched: cls_u32: Reflect HW offload status")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:14 -08:00
Eric Dumazet
a01550d778 net_sched: gen_estimator: fix broken estimators based on percpu stats
[ Upstream commit a5f7add332 ]

pfifo_fast got percpu stats lately, uncovering a bug I introduced last
year in linux-4.10.

I missed the fact that we have to clear our temporary storage
before calling __gnet_stats_copy_basic() in the case of percpu stats.

Without this fix, rate estimators (tc qd replace dev xxx root est 1sec
4sec pfifo_fast) are utterly broken.

Fixes: 1c0d32fde5 ("net_sched: gen_estimator: complete rewrite of rate estimators")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:13 -08:00
Inbar Karmy
5b5be45ed1 net/mlx5e: Fix loopback self test when GRO is off
[ Upstream commit ef7a3518f7 ]

When GRO is off, the transport header pointer in sk_buff is
initialized to network's header.

To find the udp header, instead of using udp_hdr() which assumes
skb_network_header was set, manually calculate the udp header offset.

Fixes: 0952da791c ("net/mlx5e: Add support for loopback selftest")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:13 -08:00
Tonghao Zhang
ff01f118d1 doc: Change the min default value of tcp_wmem/tcp_rmem.
[ Upstream commit a61a86f8db ]

The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096. And the
tcp_wmem/tcp_rmem min default values are 4096.

Fixes: bd68a2a854 ("net: set SK_MEM_QUANTUM to 4096")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:13 -08:00
Eric Dumazet
d6a76199e8 tcp_bbr: better deal with suboptimal GSO
[ Upstream commit 350c9f484b ]

BBR uses tcp_tso_autosize() in an attempt to probe what would be the
burst sizes and to adjust cwnd in bbr_target_cwnd() with following
gold formula :

/* Allow enough full-sized skbs in flight to utilize end systems. */
cwnd += 3 * bbr->tso_segs_goal;

But GSO can be lacking or be constrained to very small
units (ip link set dev ... gso_max_segs 2)

What we really want is to have enough packets in flight so that both
GSO and GRO are efficient.

So in the case GSO is off or downgraded, we still want to have the same
number of packets in flight as if GSO/TSO was fully operational, so
that GRO can hopefully be working efficiently.

To fix this issue, we make tcp_tso_autosize() unaware of
sk->sk_gso_max_segs

Only tcp_tso_segs() has to enforce the gso_max_segs limit.

Tested:

ethtool -K eth0 tso off gso off
tc qd replace dev eth0 root pfifo_fast

Before patch:
for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
    691  (ss -temoi shows cwnd is stuck around 6 )
    667
    651
    631
    517

After patch :
# for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
   1733 (ss -temoi shows cwnd is around 386 )
   1778
   1746
   1781
   1718

Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:13 -08:00
David Howells
f0a04a0e1a rxrpc: Fix send in rxrpc_send_data_packet()
[ Upstream commit 93c62c45ed ]

All the kernel_sendmsg() calls in rxrpc_send_data_packet() need to send
both parts of the iov[] buffer, but one of them does not.  Fix it so that
it does.

Without this, short IPv6 rxrpc DATA packets may be seen that have the rxrpc
header included, but no payload.

Fixes: 5a924b8951 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:12 -08:00
Ilya Lesokhin
17634603d4 tcp: Honor the eor bit in tcp_mtu_probe
[ Upstream commit 808cf9e38c ]

Avoid SKB coalescing if eor bit is set in one of the relevant
SKBs.

Fixes: c134ecb878 ("tcp: Make use of MSG_EOR in tcp_sendmsg")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:12 -08:00
Heiner Kallweit
dcb5da20ee net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT
[ Upstream commit 08f5138512 ]

This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added
long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates
also PHY state changes and we should do what the symbol says.

Fixes: 84a527a41f ("net: phylib: fix interrupts re-enablement in phy_start")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:12 -08:00
Gal Pressman
f26693d387 net/mlx5e: Specify numa node when allocating drop rq
[ Upstream commit 2f0db87901 ]

When allocating a drop rq, no numa node is explicitly set which means
allocations are done on node zero. This is not necessarily the nearest
numa node to the HCA, and even worse, might even be a memoryless numa
node.

Choose the numa_node given to us by the pci device in order to properly
allocate the coherent dma memory instead of assuming zero is valid.

Fixes: 556dd1b9c3 ("net/mlx5e: Set drop RQ's necessary parameters only")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:11 -08:00
Shalom Toledo
2229dd5dd6 mlxsw: spectrum_switchdev: Check success of FDB add operation
[ Upstream commit 0a8a1bf17e ]

Until now, we assumed that in case of error when adding FDB entries, the
write operation will fail, but this is not the case. Instead, we need to
check that the number of entries reported in the response is equal to
the number of entries specified in the request.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:11 -08:00
Tommi Rantala
9f02a069bf sctp: fix dst refcnt leak in sctp_v4_get_dst
[ Upstream commit 4a31a6b19f ]

Fix dst reference count leak in sctp_v4_get_dst() introduced in commit
410f03831 ("sctp: add routing output fallback"):

When walking the address_list, successive ip_route_output_key() calls
may return the same rt->dst with the reference incremented on each call.

The code would not decrement the dst refcount when the dst pointer was
identical from the previous iteration, causing the dst refcnt leak.

Testcase:
  ip netns add TEST
  ip netns exec TEST ip link set lo up
  ip link add dummy0 type dummy
  ip link add dummy1 type dummy
  ip link add dummy2 type dummy
  ip link set dev dummy0 netns TEST
  ip link set dev dummy1 netns TEST
  ip link set dev dummy2 netns TEST
  ip netns exec TEST ip addr add 192.168.1.1/24 dev dummy0
  ip netns exec TEST ip link set dummy0 up
  ip netns exec TEST ip addr add 192.168.1.2/24 dev dummy1
  ip netns exec TEST ip link set dummy1 up
  ip netns exec TEST ip addr add 192.168.1.3/24 dev dummy2
  ip netns exec TEST ip link set dummy2 up
  ip netns exec TEST sctp_test -H 192.168.1.2 -P 20002 -h 192.168.1.1 -p 20000 -s -B 192.168.1.3
  ip netns del TEST

In 4.4 and 4.9 kernels this results to:
  [  354.179591] unregister_netdevice: waiting for lo to become free. Usage count = 1
  [  364.419674] unregister_netdevice: waiting for lo to become free. Usage count = 1
  [  374.663664] unregister_netdevice: waiting for lo to become free. Usage count = 1
  [  384.903717] unregister_netdevice: waiting for lo to become free. Usage count = 1
  [  395.143724] unregister_netdevice: waiting for lo to become free. Usage count = 1
  [  405.383645] unregister_netdevice: waiting for lo to become free. Usage count = 1
  ...

Fixes: 410f03831 ("sctp: add routing output fallback")
Fixes: 0ca50d12f ("sctp: fix src address selection if using secondary addresses")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:11 -08:00
Gal Pressman
bf014cc18a net/mlx5e: Fix TCP checksum in LRO buffers
[ Upstream commit 8babd44d20 ]

When receiving an LRO packet, the checksum field is set by the hardware
to the checksum of the first coalesced packet. Obviously, this checksum
is not valid for the merged LRO packet and should be fixed.  We can use
the CQE checksum which covers the checksum of the entire merged packet
TCP payload to help us calculate the checksum incrementally.

Tested by sending IPv4/6 traffic with LRO enabled, RX checksum disabled
and watching nstat checksum error counters (in addition to the obvious
bandwidth drop caused by checksum errors).

This bug is usually "hidden" since LRO packets would go through the
CHECKSUM_UNNECESSARY flow which does not validate the packet checksum.

It's important to note that previous to this patch, LRO packets provided
with CHECKSUM_UNNECESSARY are indeed packets with a correct validated
checksum (even though the checksum inside the TCP header is incorrect),
since the hardware LRO aggregation is terminated upon receiving a packet
with bad checksum.

Fixes: e586b3b0ba ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:11 -08:00
Alexey Kodanev
fecb84a83f udplite: fix partial checksum initialization
[ Upstream commit 15f35d49c9 ]

Since UDP-Lite is always using checksum, the following path is
triggered when calculating pseudo header for it:

  udp4_csum_init() or udp6_csum_init()
    skb_checksum_init_zero_check()
      __skb_checksum_validate_complete()

The problem can appear if skb->len is less than CHECKSUM_BREAK. In
this particular case __skb_checksum_validate_complete() also invokes
__skb_checksum_complete(skb). If UDP-Lite is using partial checksum
that covers only part of a packet, the function will return bad
checksum and the packet will be dropped.

It can be fixed if we skip skb_checksum_init_zero_check() and only
set the required pseudo header checksum for UDP-Lite with partial
checksum before udp4_csum_init()/udp6_csum_init() functions return.

Fixes: ed70fcfcee ("net: Call skb_checksum_init in IPv4")
Fixes: e4f45b7f40 ("net: Call skb_checksum_init in IPv6")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:10 -08:00
Alexey Kodanev
1fc74a57a8 sctp: verify size of a new chunk in _sctp_make_chunk()
[ Upstream commit 07f2c7ab6f ]

When SCTP makes INIT or INIT_ACK packet the total chunk length
can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when
transmitting these packets, e.g. the crash on sending INIT_ACK:

[  597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168
               put:120156 head:000000007aa47635 data:00000000d991c2de
               tail:0x1d640 end:0xfec0 dev:<NULL>
...
[  597.976970] ------------[ cut here ]------------
[  598.033408] kernel BUG at net/core/skbuff.c:104!
[  600.314841] Call Trace:
[  600.345829]  <IRQ>
[  600.371639]  ? sctp_packet_transmit+0x2095/0x26d0 [sctp]
[  600.436934]  skb_put+0x16c/0x200
[  600.477295]  sctp_packet_transmit+0x2095/0x26d0 [sctp]
[  600.540630]  ? sctp_packet_config+0x890/0x890 [sctp]
[  600.601781]  ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp]
[  600.671356]  ? sctp_cmp_addr_exact+0x3f/0x90 [sctp]
[  600.731482]  sctp_outq_flush+0x663/0x30d0 [sctp]
[  600.788565]  ? sctp_make_init+0xbf0/0xbf0 [sctp]
[  600.845555]  ? sctp_check_transmitted+0x18f0/0x18f0 [sctp]
[  600.912945]  ? sctp_outq_tail+0x631/0x9d0 [sctp]
[  600.969936]  sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp]
[  601.041593]  ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp]
[  601.104837]  ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp]
[  601.175436]  ? sctp_eat_data+0x1710/0x1710 [sctp]
[  601.233575]  sctp_do_sm+0x182/0x560 [sctp]
[  601.284328]  ? sctp_has_association+0x70/0x70 [sctp]
[  601.345586]  ? sctp_rcv+0xef4/0x32f0 [sctp]
[  601.397478]  ? sctp6_rcv+0xa/0x20 [sctp]
...

Here the chunk size for INIT_ACK packet becomes too big, mostly
because of the state cookie (INIT packet has large size with
many address parameters), plus additional server parameters.

Later this chunk causes the panic in skb_put_data():

  skb_packet_transmit()
      sctp_packet_pack()
          skb_put_data(nskb, chunk->skb->data, chunk->skb->len);

'nskb' (head skb) was previously allocated with packet->size
from u16 'chunk->chunk_hdr->length'.

As suggested by Marcelo we should check the chunk's length in
_sctp_make_chunk() before trying to allocate skb for it and
discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:10 -08:00
Guillaume Nault
5775f78764 ppp: prevent unregistered channels from connecting to PPP units
[ Upstream commit 77f840e3e5 ]

PPP units don't hold any reference on the channels connected to it.
It is the channel's responsibility to ensure that it disconnects from
its unit before being destroyed.
In practice, this is ensured by ppp_unregister_channel() disconnecting
the channel from the unit before dropping a reference on the channel.

However, it is possible for an unregistered channel to connect to a PPP
unit: register a channel with ppp_register_net_channel(), attach a
/dev/ppp file to it with ioctl(PPPIOCATTCHAN), unregister the channel
with ppp_unregister_channel() and finally connect the /dev/ppp file to
a PPP unit with ioctl(PPPIOCCONNECT).

Once in this situation, the channel is only held by the /dev/ppp file,
which can be released at anytime and free the channel without letting
the parent PPP unit know. Then the ppp structure ends up with dangling
pointers in its ->channels list.

Prevent this scenario by forbidding unregistered channels from
connecting to PPP units. This maintains the code logic by keeping
ppp_unregister_channel() responsible from disconnecting the channel if
necessary and avoids modification on the reference counting mechanism.

This issue seems to predate git history (successfully reproduced on
Linux 2.6.26 and earlier PPP commits are unrelated).

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:10 -08:00
Roman Kapl
795f3deff1 net: sched: report if filter is too large to dump
[ Upstream commit 5ae437ad5a ]

So far, if the filter was too large to fit in the allocated skb, the
kernel did not return any error and stopped dumping. Modify the dumper
so that it returns -EMSGSIZE when a filter fails to dump and it is the
first filter in the skb. If we are not first, we will get a next chance
with more room.

I understand this is pretty near to being an API change, but the
original design (silent truncation) can be considered a bug.

Note: The error case can happen pretty easily if you create a filter
with 32 actions and have 4kb pages. Also recent versions of iproute try
to be clever with their buffer allocation size, which in turn leads to

Signed-off-by: Roman Kapl <code@rkapl.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:10 -08:00
Nicolas Dichtel
60b28d5ef3 netlink: ensure to loop over all netns in genlmsg_multicast_allns()
[ Upstream commit cb9f7a9a5c ]

Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
case when commit 134e63756d was pushed.
However, there was no reason to stop the loop if a netns does not have
listeners.
Returns -ESRCH only if there was no listeners in all netns.

To avoid having the same problem in the future, I didn't take the
assumption that nlmsg_multicast() returns only 0 or -ESRCH.

Fixes: 134e63756d ("genetlink: make netns aware")
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:09 -08:00
Sabrina Dubroca
3bcf69f8e7 net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68
[ Upstream commit c7272c2f12 ]

According to RFC 1191 sections 3 and 4, ICMP frag-needed messages
indicating an MTU below 68 should be rejected:

    A host MUST never reduce its estimate of the Path MTU below 68
    octets.

and (talking about ICMP frag-needed's Next-Hop MTU field):

    This field will never contain a value less than 68, since every
    router "must be able to forward a datagram of 68 octets without
    fragmentation".

Furthermore, by letting net.ipv4.route.min_pmtu be set to negative
values, we can end up with a very large PMTU when (-1) is cast into u32.

Let's also make ip_rt_min_pmtu a u32, since it's only ever compared to
unsigned ints.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:09 -08:00
Jakub Kicinski
f80c28a585 net: fix race on decreasing number of TX queues
[ Upstream commit ac5b70198a ]

netif_set_real_num_tx_queues() can be called when netdev is up.
That usually happens when user requests change of number of
channels/rings with ethtool -L.  The procedure for changing
the number of queues involves resetting the qdiscs and setting
dev->num_tx_queues to the new value.  When the new value is
lower than the old one, extra care has to be taken to ensure
ordering of accesses to the number of queues vs qdisc reset.

Currently the queues are reset before new dev->num_tx_queues
is assigned, leaving a window of time where packets can be
enqueued onto the queues going down, leading to a likely
crash in the drivers, since most drivers don't check if TX
skbs are assigned to an active queue.

Fixes: e6484930d7 ("net: allocate tx queues in register_netdevice")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:09 -08:00
Grygorii Strashko
da260080c2 net: ethernet: ti: cpsw: fix net watchdog timeout
[ Upstream commit 62f94c2101 ]

It was discovered that simple program which indefinitely sends 200b UDP
packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network
watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog
timeout is triggered due to race between cpsw_ndo_start_xmit() and
cpsw_tx_handler() [NAPI]

cpsw_ndo_start_xmit()
	if (unlikely(!cpdma_check_free_tx_desc(txch))) {
		txq = netdev_get_tx_queue(ndev, q_idx);
		netif_tx_stop_queue(txq);

^^ as per [1] barier has to be used after set_bit() otherwise new value
might not be visible to other cpus
	}

cpsw_tx_handler()
	if (unlikely(netif_tx_queue_stopped(txq)))
		netif_tx_wake_queue(txq);

and when it happens ndev TX queue became disabled forever while driver's HW
TX queue is empty.

Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue()
calls and double check for free TX descriptors after stopping ndev TX queue
- if there are free TX descriptors wake up ndev TX queue.

[1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:08 -08:00
Wolfram Sang
94870df33c net: amd-xgbe: fix comparison to bitshift when dealing with a mask
[ Upstream commit a3276892db ]

Due to a typo, the mask was destroyed by a comparison instead of a bit
shift.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:08 -08:00
Arnd Bergmann
c3d7d3a099 ipv6 sit: work around bogus gcc-8 -Wrestrict warning
[ Upstream commit ca79bec237 ]

gcc-8 has a new warning that detects overlapping input and output arguments
in memcpy(). It triggers for sit_init_net() calling ipip6_tunnel_clone_6rd(),
which is actually correct:

net/ipv6/sit.c: In function 'sit_init_net':
net/ipv6/sit.c:192:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict]

The problem here is that the logic detecting the memcpy() arguments finds them
to be the same, but the conditional that tests for the input and output of
ipip6_tunnel_clone_6rd() to be identical is not a compile-time constant.

We know that netdev_priv(t->dev) is the same as t for a tunnel device,
and comparing "dev" directly here lets the compiler figure out as well
that 'dev == sitn->fb_tunnel_dev' when called from sit_init_net(), so
it no longer warns.

This code is old, so Cc stable to make sure that we don't get the warning
for older kernels built with new gcc.

Cc: Martin Sebor <msebor@gmail.com>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83456
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:08 -08:00
Denis Du
2cdc12a498 hdlc_ppp: carrier detect ok, don't turn off negotiation
[ Upstream commit b6c3bad1ba ]

Sometimes when physical lines have a just good noise to make the protocol
handshaking fail, but the carrier detect still good. Then after remove of
the noise, nobody will trigger this protocol to be start again to cause
the link to never come back. The fix is when the carrier is still on, not
terminate the protocol handshaking.

Signed-off-by: Denis Du <dudenis2000@yahoo.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:08 -08:00
Stefano Brivio
4a5048e7fd fib_semantics: Don't match route with mismatching tclassid
[ Upstream commit a8c6db1dfd ]

In fib_nh_match(), if output interface or gateway are passed in
the FIB configuration, we don't have to check next hops of
multipath routes to conclude whether we have a match or not.

However, we might still have routes with different realms
matching the same output interface and gateway configuration,
and this needs to cause the match to fail. Otherwise the first
route inserted in the FIB will match, regardless of the realms:

 # ip route add 1.1.1.1 dev eth0 table 1234 realms 1/2
 # ip route append 1.1.1.1 dev eth0 table 1234 realms 3/4
 # ip route list table 1234
 1.1.1.1 dev eth0 scope link realms 1/2
 1.1.1.1 dev eth0 scope link realms 3/4
 # ip route del 1.1.1.1 dev ens3 table 1234 realms 3/4
 # ip route list table 1234
 1.1.1.1 dev ens3 scope link realms 3/4

whereas route with realms 3/4 should have been deleted instead.

Explicitly check for fc_flow passed in the FIB configuration
(this comes from RTA_FLOW extracted by rtm_to_fib_config()) and
fail matching if it differs from nh_tclassid.

The handling of RTA_FLOW for multipath routes later in
fib_nh_match() is still needed, as we can have multiple RTA_FLOW
attributes that need to be matched against the tclassid of each
next hop.

v2: Check that fc_flow is set before discarding the match, so
    that the user can still select the first matching rule by
    not specifying any realm, as suggested by David Ahern.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:07 -08:00
Xin Long
4c13e689e5 bridge: check brport attr show in brport_show
[ Upstream commit 1b12580af1 ]

Now br_sysfs_if file flush doesn't have attr show. To read it will
cause kernel panic after users chmod u+r this file.

Xiong found this issue when running the commands:

  ip link add br0 type bridge
  ip link add type veth
  ip link set veth0 master br0
  chmod u+r /sys/devices/virtual/net/veth0/brport/flush
  timeout 3 cat /sys/devices/virtual/net/veth0/brport/flush

kernel crashed with NULL a pointer dereference call trace.

This patch is to fix it by return -EINVAL when brport_attr->show
is null, just the same as the check for brport_attr->store in
brport_store().

Fixes: 9cf637473c ("bridge: add sysfs hook to flush forwarding table")
Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:07 -08:00
Thomas Gleixner
71978491bb x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
commit 945fd17ab6 upstream.

The separation of the cpu_entry_area from the fixmap missed the fact that
on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
initial_page_table by the previous synchronizations.

This results in suspend/resume failures because 32bit utilizes initial page
table for resume. The absence of the cpu_entry_area mapping results in a
triple fault, aka. insta reboot.

With PAE enabled this works by chance because the PGD entry which covers
the fixmap and other parts incindentally provides the cpu_entry_area
mapping as well.

Synchronize the initial page table after setting up the cpu entry
area. Instead of adding yet another copy of the same code, move it to a
function and invoke it from the various places.

It needs to be investigated if the existing calls in setup_arch() and
setup_per_cpu_areas() can be replaced by the later invocation from
setup_cpu_entry_areas(), but that's beyond the scope of this fix.

Fixes: 92a0f81d89 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: Woody Suwalski <terraluna977@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Cc: William Grant <william.grant@canonical.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802282137290.1392@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:07 -08:00
Sebastian Panceac
f70befc397 x86/platform/intel-mid: Handle Intel Edison reboot correctly
commit 028091f82e upstream.

When the Intel Edison module is powered with 3.3V, the reboot command makes
the module stuck.  If the module is powered at a greater voltage, like 4.4V
(as the Edison Mini Breakout board does), reboot works OK.

The official Intel Edison BSP sends the IPCMSG_COLD_RESET message to the
SCU by default. The IPCMSG_COLD_BOOT which is used by the upstream kernel
is only sent when explicitely selected on the kernel command line.

Use IPCMSG_COLD_RESET unconditionally which makes reboot work independent
of the power supply voltage.

[ tglx: Massaged changelog ]

Fixes: bda7b072de ("x86/platform/intel-mid: Implement power off sequence")
Signed-off-by: Sebastian Panceac <sebastian@resin.io>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1519810849-15131-1-git-send-email-sebastian@resin.io
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:07 -08:00
Juergen Gross
e521a723fd x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend
commit 71c208dd54 upstream.

Older Xen versions (4.5 and before) might have problems migrating pv
guests with MSR_IA32_SPEC_CTRL having a non-zero value. So before
suspending zero that MSR and restore it after being resumed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: boris.ostrovsky@oracle.com
Link: https://lkml.kernel.org/r/20180226140818.4849-1-jgross@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:06 -08:00
Jan Kara
93e1f7fc77 direct-io: Fix sleep in atomic due to sync AIO
commit d9c10e5b88 upstream.

Commit e864f39569 "fs: add RWF_DSYNC aand RWF_SYNC" added additional
way for direct IO to become synchronous and thus trigger fsync from the
IO completion handler. Then commit 9830f4be15 "fs: Use RWF_* flags for
AIO operations" allowed these flags to be set for AIO as well. However
that commit forgot to update the condition checking whether the IO
completion handling should be defered to a workqueue and thus AIO DIO
with RWF_[D]SYNC set will call fsync() from IRQ context resulting in
sleep in atomic.

Fix the problem by checking directly iocb flags (the same way as it is
done in dio_complete()) instead of checking all conditions that could
lead to IO being synchronous.

CC: Christoph Hellwig <hch@lst.de>
CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
CC: stable@vger.kernel.org
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 9830f4be15
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:06 -08:00
Dan Williams
0ba6c33b32 dax: fix vma_is_fsdax() helper
commit 230f5a8969 upstream.

Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
device-dax instances when those are meant to be explicitly allowed.

Fixes: 2bb6d28370 ("mm: introduce get_user_pages_longterm")
Cc: <stable@vger.kernel.org>
Reported-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Jane Chu <jane.chu@oracle.com>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:06 -08:00
Viresh Kumar
3379a37a74 cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()
commit 0373ca7483 upstream.

commit a307a1e6bc "cpufreq: s3c: use cpufreq_generic_init()"
accidentally broke cpufreq on s3c2410 and s3c2412.

These two platforms don't have a CPU frequency table and used to skip
calling cpufreq_table_validate_and_show() for them.  But with the
above commit, we started calling it unconditionally and that will
eventually fail as the frequency table pointer is NULL.

Fix this by calling cpufreq_table_validate_and_show() conditionally
again.

Fixes: a307a1e6bc "cpufreq: s3c: use cpufreq_generic_init()"
Cc: 3.13+ <stable@vger.kernel.org> # v3.13+
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:06 -08:00
Dan Williams
d5168ce354 vfio: disable filesystem-dax page pinning
commit 94db151dc8 upstream.

Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding of filesystem operations indefinitely, disallow
'longterm' Filesystem-DAX mappings.

RDMA has the same conflict and the plan there is to add a 'with lease'
mechanism to allow the kernel to notify userspace that the mapping is
being torn down for block-map maintenance. Perhaps something similar can
be put in place for vfio.

Note that xfs and ext4 still report:

   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"

...at mount time, and resolving the dax-dma-vs-truncate problem is one
of the last hurdles to remove that designation.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
Fixes: d475c6346a ("dax,ext2: replace XIP read and write with DAX I/O")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:06 -08:00
Ming Lei
7f43f610c4 block: kyber: fix domain token leak during requeue
commit ba989a0146 upstream.

When requeuing request, the domain token should have been freed
before re-inserting the request to io scheduler. Otherwise, the
assigned domain token will be leaked, and IO hang can be caused.

Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:05 -08:00
Jiufei Xue
17644a0bbb block: fix the count of PGPGOUT for WRITE_SAME
commit 7c5a0dcf55 upstream.

The vm counters is counted in sectors, so we should do the conversation
in submit_bio.

Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Cc: stable@vger.kernel.org
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:05 -08:00
Anand Jain
eae6179f55 btrfs: use proper endianness accessors for super_copy
commit 3c181c12c4 upstream.

The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value.  This was missing in update_super_roots and in sysfs readers.

Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.

Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.

CC: stable@vger.kernel.org
Fixes: df93589a17 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:05 -08:00
John David Anglin
dabf89052e parisc: Fix ordering of cache and TLB flushes
commit 0adb24e03a upstream.

The change to flush_kernel_vmap_range() wasn't sufficient to avoid the
SMP stalls.  The problem is some drivers call these routines with
interrupts disabled.  Interrupts need to be enabled for flush_tlb_all()
and flush_cache_all() to work.  This version adds checks to ensure
interrupts are not disabled before calling routines that need IPI
interrupts.  When interrupts are disabled, we now drop into slower code.

The attached change fixes the ordering of cache and TLB flushes in
several cases.  When we flush the cache using the existing PTE/TLB
entries, we need to flush the TLB after doing the cache flush.  We don't
need to do this when we flush the entire instruction and data caches as
these flushes don't use the existing TLB entries.  The same is true for
tmpalias region flushes.

The flush_kernel_vmap_range() and invalidate_kernel_vmap_range()
routines have been updated.

Secondly, we added a new purge_kernel_dcache_range_asm() routine to
pacache.S and use it in invalidate_kernel_vmap_range().  Nominally,
purges are faster than flushes as the cache lines don't have to be
written back to memory.

Hopefully, this is sufficient to resolve the remaining problems due to
cache speculation.  So far, testing indicates that this is the case.  I
did work up a patch using tmpalias flushes, but there is a performance
hit because we need the physical address for each page, and we also need
to sequence access to the tmpalias flush code.  This increases the
probability of stalls.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:05 -08:00
Helge Deller
47e7fc96cd parisc: Reduce irq overhead when run in qemu
commit 636a415bcc upstream.

When run under QEMU, calling mfctl(16) creates some overhead because the
qemu timer has to be scaled and moved into the register. This patch
reduces the number of calls to mfctl(16) by moving the calls out of the
loops.

Additionally, increase the minimal time interval to 8000 cycles instead
of 500 to compensate possible QEMU delays when delivering interrupts.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.14+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:04 -08:00
Helge Deller
90c3f0d360 parisc: Use cr16 interval timers unconditionally on qemu
commit 5ffa851885 upstream.

When running on qemu we know that the (emulated) cr16 cpu-internal
clocks are syncronized. So let's use them unconditionally on qemu.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.14+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:04 -08:00
Lingutla Chandrasekhar
6b218ed6bd timers: Forward timer base before migrating timers
commit c52232a49e upstream.

On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a
live CPU. This happens from the control thread which initiated the unplug.

If the CPU on which the control thread runs came out from a longer idle
period then the base clock of that CPU might be stale because the control
thread runs prior to any event which forwards the clock.

In such a case the timers from the unplugged CPU are queued on the live CPU
based on the stale clock which can cause large delays due to increased
granularity of the outer timer wheels which are far away from base:;clock.

But there is a worse problem than that. The following sequence of events
illustrates it:

 - CPU0 timer1 is queued expires = 59969 and base->clk = 59131.

   The timer is queued at wheel level 2, with resulting expiry time = 60032
   (due to level granularity).

 - CPU1 enters idle @60007, with next timer expiry @60020.

 - CPU0 is hotplugged at @60009

 - CPU1 exits idle and runs the control thread which migrates the
   timers from CPU0

   timer1 is now queued in level 0 for immediate handling in the next
   softirq because the requested expiry time 59969 is before CPU1 base->clk
   60007

 - CPU1 runs code which forwards the base clock which succeeds because the
   next expiring timer. which was collected at idle entry time is still set
   to 60020.

   So it forwards beyond 60007 and therefore misses to expire the migrated
   timer1. That timer gets expired when the wheel wraps around again, which
   takes between 63 and 630ms depending on the HZ setting.

Address both problems by invoking forward_timer_base() for the control CPUs
timer base. All other places, which might run into a similar problem
(mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid
that.

[ tglx: Massaged comment and changelog ]

Fixes: a683f390b9 ("timers: Forward the wheel clock whenever possible")
Co-developed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180118115022.6368-1-clingutla@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:04 -08:00
Shawn Lin
ec274a6515 mmc: dw_mmc: Fix out-of-bounds access for slot's caps
commit 0d84b9e563 upstream.

Add num_caps field for dw_mci_drv_data to validate the controller
id from DT alias and non-DT ways.

Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Fixes: 800d78bfcc ("mmc: dw_mmc: add support for implementation specific callbacks")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:03 -08:00
Shawn Lin
e03d46a156 mmc: dw_mmc: Factor out dw_mci_init_slot_caps
commit a4faa4929e upstream.

Factor out dw_mci_init_slot_caps to consolidate parsing
all differents types of capabilities from host contrllers.
No functional change intended.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Fixes: 800d78bfcc ("mmc: dw_mmc: add support for implementation specific callbacks")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:03 -08:00
Shawn Lin
4d5123a0b3 mmc: dw_mmc: Avoid accessing registers in runtime suspended state
commit 5b43df8b4c upstream.

cat /sys/kernel/debug/mmc0/regs will hang up the system since
it's in runtime suspended state, so the genpd and biu_clk is
off. This patch fixes this problem by calling pm_runtime_get_sync
to wake it up before reading the registers.

Fixes: e9ed8835e9 ("mmc: dw_mmc: add runtime PM callback")
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:03 -08:00
Geert Uytterhoeven
cb65fc21f3 mmc: dw_mmc-k3: Fix out-of-bounds access through DT alias
commit 325501d936 upstream.

The hs_timing_cfg[] array is indexed using a value derived from the
"mshcN" alias in DT, which may lead to an out-of-bounds access.

Fix this by adding a range check.

Fixes: 361c7fe9b0 ("mmc: dw_mmc-k3: add sd support for hi3660")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:03 -08:00
Adrian Hunter
33b42aa617 mmc: sdhci-pci: Fix S0i3 for Intel BYT-based controllers
commit f8870ae6e2 upstream.

Tuning can leave the IP in an active state (Buffer Read Enable bit set)
which prevents the entry to low power states (i.e. S0i3). Data reset will
clear it.

Generally tuning is followed by a data transfer which will anyway sort out
the state, so it is rare that S0i3 is actually prevented.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:03 -08:00
Takashi Iwai
e2c3f72702 ALSA: hda - Fix pincfg at resume on Lenovo T470 dock
commit 71db96ddfa upstream.

We've added a quirk to enable the recent Lenovo dock support, where it
overwrites the pin configs of NID 0x17 and 19, not only updating the
pin config cache.  It works right after the boot, but the problem is
that the pin configs are occasionally cleared when the machine goes to
PM.  Meanwhile the quirk writes the pin configs only at the pre-probe,
so this won't be applied any longer.

For addressing that issue, this patch moves the code to overwrite the
pin configs into HDA_FIXUP_ACT_INIT section so that it's always
applied at both probe and resume time.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195161
Fixes: 61fcf8ece9 ("ALSA: hda/realtek - Enable Thinkpad Dock device for ALC298 platform")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:02 -08:00
Hans de Goede
34516912bf ALSA: hda: Add a power_save blacklist
commit 1ba8f9d308 upstream.

On some boards setting power_save to a non 0 value leads to clicking /
popping sounds when ever we enter/leave powersaving mode. Ideally we would
figure out how to avoid these sounds, but that is not always feasible.

This commit adds a blacklist for devices where powersaving is known to
cause problems and disables it on these devices.

Note I tried to put this blacklist in userspace first:
https://github.com/systemd/systemd/pull/8128

But the systemd maintainers rightfully pointed out that it would be
impossible to then later remove entries once we actually find a way to
make power-saving work on listed boards without issues. Having this list
in the kernel will allow removal of the blacklist entry in the same commit
which fixes the clicks / plops.

The blacklist only applies to the default power_save module-option value,
if a user explicitly sets the module-option then the blacklist is not
used.

[ added an ifdef CONFIG_PM for the build error -- tiwai]

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1525104
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=198611
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08 22:41:02 -08:00