linux-stable/net
Sunil Muthuswamy 49fb03de36 hvsock: fix epollout hang from race condition
[ Upstream commit cb359b6041 ]

Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
not return even when the hvsock socket is writable, under some race
condition. This can happen under the following sequence:
- fd = socket(hvsocket)
- fd_out = dup(fd)
- fd_in = dup(fd)
- start a writer thread that writes data to fd_out with a combination of
  epoll_wait(fd_out, EPOLLOUT) and
- start a reader thread that reads data from fd_in with a combination of
  epoll_wait(fd_in, EPOLLIN)
- On the host, there are two threads that are reading/writing data to the
  hvsocket

stack:
hvs_stream_has_space
hvs_notify_poll_out
vsock_poll
sock_poll
ep_poll

Race condition:
check for epollout from ep_poll():
	assume no writable space in the socket
	hvs_stream_has_space() returns 0
check for epollin from ep_poll():
	assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE)
	hvs_stream_has_space() will clear the channel pending send size
	host will not notify the guest because the pending send size has
		been cleared and so the hvsocket will never mark the
		socket writable

Now, the EPOLLOUT will never return even if the socket write buffer is
empty.

The fix is to set the pending size to the default size and never change it.
This way the host will always notify the guest whenever the writable space
is bigger than the pending size. The host is already optimized to *only*
notify the guest when the pending size threshold boundary is crossed and
not everytime.

This change also reduces the cpu usage somewhat since hv_stream_has_space()
is in the hotpath of send:
vsock_stream_sendmsg()->hv_stream_has_space()
Earlier hv_stream_has_space was setting/clearing the pending size on every
call.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:26:56 +02:00
..
6lowpan
9p 9p/virtio: Add cleanup path in p9_virtio_init 2019-07-26 09:14:25 +02:00
802
8021q vlan: disable SIOCSHWTSTAMP in container 2019-05-16 19:41:30 +02:00
appletalk appletalk: Fix use-after-free in atalk_proc_exit 2019-04-20 09:16:05 +02:00
atm net: atm: Fix potential Spectre v1 vulnerabilities 2019-04-27 09:36:30 +02:00
ax25 ax25: fix inconsistent lock state in ax25_destroy_timer 2019-06-22 08:15:13 +02:00
batman-adv batman-adv: Fix duplicated OGMs on NETDEV_UP 2019-07-26 09:14:03 +02:00
bluetooth Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bug 2019-07-26 09:14:30 +02:00
bpf
bpfilter net: bpfilter: use get_pid_task instead of pid_task 2018-10-17 22:03:40 -07:00
bridge net: bridge: stp: don't cache eth dest pointer before skb pull 2019-07-28 08:29:27 +02:00
caif Revert "net: simplify sock_poll_wait" 2018-11-04 14:50:51 +01:00
can can: af_can: Fix error path of can_init() 2019-07-14 08:11:07 +02:00
ceph libceph: wait for latest osdmap in ceph_monc_blacklist_add() 2019-03-27 14:14:39 +09:00
core tcp: fix tcp_set_congestion_control() use from bpf hook 2019-07-28 08:29:26 +02:00
dcb
dccp dccp: do not use ipv6 header for ipv4 flow 2019-04-03 06:26:15 +02:00
decnet decnet: fix using plain integer as NULL warning 2018-08-09 14:11:24 -07:00
dns_resolver
dsa net: dsa: Fix error cleanup path in dsa_init_module 2019-05-16 19:41:29 +02:00
ethernet
hsr net/hsr: fix possible crash in add_timer() 2019-03-19 13:12:38 +01:00
ieee802154 ieee802154: lowpan_header_create check must check daddr 2019-01-09 17:38:31 +01:00
ife
ipv4 tcp: Reset bytes_acked and bytes_received when disconnecting 2019-07-28 08:29:26 +02:00
ipv6 ipv6: Unlink sibling route in case of failure 2019-07-28 08:29:24 +02:00
iucv Revert "net: simplify sock_poll_wait" 2018-11-04 14:50:51 +01:00
kcm kcm: switch order of device registration to fix a crash 2019-04-17 08:38:40 +02:00
key af_key: fix leaks in key_pol_get_resp and dump_sp. 2019-07-26 09:14:01 +02:00
l2tp l2tp: use rcu_dereference_sk_user_data() in l2tp_udp_encap_recv() 2019-05-05 14:42:37 +02:00
l3mdev
lapb lapb: fixed leak of control-blocks. 2019-06-22 08:15:13 +02:00
llc llc: fix skb leak in llc_build_and_send_ui_pkt() 2019-06-04 08:02:31 +02:00
mac80211 mac80211: do not start any work during reconfigure flow 2019-07-14 08:11:11 +02:00
mac802154 net: mac802154: tx: expand tailroom if necessary 2018-08-06 11:21:37 +02:00
mpls mpls: Return error for RTA_GATEWAY attribute 2019-03-10 07:17:19 +01:00
ncsi net/ncsi: Fixup .dumpit message flags and ID check in Netlink handler 2018-08-22 21:39:08 -07:00
netfilter net: make skb_dst_force return true when dst is refcounted 2019-07-28 08:29:24 +02:00
netlabel netlabel: fix out-of-bounds memory accesses 2019-03-10 07:17:18 +01:00
netlink genetlink: Fix a memory leak on error path 2019-04-03 06:26:15 +02:00
netrom netrom: hold sock when setting skb->destructor 2019-07-28 08:29:27 +02:00
nfc nfc: fix potential illegal memory access 2019-07-28 08:29:25 +02:00
nsh
openvswitch net: openvswitch: fix csum updates for MPLS actions 2019-07-28 08:29:24 +02:00
packet net/packet: fix memory leak in packet_set_ring() 2019-07-03 13:14:47 +02:00
phonet phonet: fix building with clang 2019-03-23 20:09:51 +01:00
psample
qrtr
rds rds: Fix warning. 2019-07-10 09:53:46 +02:00
rfkill Here are quite a large number of fixes, notably: 2018-09-03 22:12:02 -07:00
rose net/rose: fix unbound loop in rose_loopback_timer() 2019-05-02 09:59:00 +02:00
rxrpc rxrpc: Fix send on a connected, but unbound socket 2019-07-28 08:29:25 +02:00
sched net: sched: verify that q!=NULL before setting q->flags 2019-07-28 08:29:30 +02:00
sctp sctp: not bind the socket in sctp_connect 2019-07-28 08:29:27 +02:00
smc net/smc: move unhash before release of clcsock 2019-07-10 09:53:44 +02:00
strparser net: strparser: partially revert "strparser: Call skb_unclone conditionally" 2019-05-16 19:41:27 +02:00
sunrpc net :sunrpc :clnt :Fix xps refcount imbalance on the error path 2019-07-14 08:11:15 +02:00
switchdev
tipc tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb 2019-07-03 13:14:49 +02:00
tls net/tls: make sure offload also gets the keys wiped 2019-07-28 08:29:27 +02:00
unix missing barriers in some of unix_sock ->addr and ->path accesses 2019-03-19 13:12:41 +01:00
vmw_vsock hvsock: fix epollout hang from race condition 2019-07-31 07:26:56 +02:00
wimax
wireless mac80211: fix rate reporting inside cfg80211_calculate_bitrate_he() 2019-07-14 08:11:04 +02:00
x25 net/x25: fix a race in x25_bind() 2019-03-19 13:12:40 +01:00
xdp xsk: Properly terminate assignment in xskq_produce_flush_desc 2019-07-26 09:14:12 +02:00
xfrm ipsec: select crypto ciphers for xfrm_algo 2019-07-26 09:14:10 +02:00
compat.c sock: Make sock->sk_stamp thread-safe 2019-01-09 17:38:33 +01:00
Kconfig
Makefile
socket.c net: socket: set sock->sk to NULL after calling proto_ops::release() 2019-03-10 07:17:18 +01:00
sysctl_net.c