Commit graph

276 commits

Author SHA1 Message Date
Stephen Hemminger
c0a41b887c hv_netvsc: move VF to same namespace as netvsc device
When VF is added, the paravirtual device is already present
and may have been moved to another network namespace. For example,
sometimes the management interface is put in another net namespace
in some environments.

The VF should get moved to where the netvsc device is when the
VF is discovered. The user can move it later (if desired).

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12 15:22:28 -07:00
Stephen Hemminger
7bf7bb37f1 hv_netvsc: fix network namespace issues with VF support
When finding the parent netvsc device, the search needs to be across
all netvsc device instances (independent of network namespace).

Find parent device of VF using upper_dev_get routine which
searches only adjacent list.

Fixes: e8ff40d4bf ("hv_netvsc: improve VF device matching")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

netns aware byref
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12 15:22:28 -07:00
Stephen Hemminger
8cde8f0c0c hv_netvsc: drop common code until callback model fixed
The callback model of handling network failover is not suitable
in the current form.
  1. It was merged without addressing all the review feedback.
  2. It was merged without approval of any of the netvsc maintainers.
  3. Design discussion on how to handle PV/VF fallback is still
     not complete.
  4. IMHO the code model using callbacks is trying to make
     something common which isn't.

Revert the netvsc specific changes for now. Does not impact ongoing
development of failover model for virtio.
Revisit this after a simpler library based failover kernel
routines are extracted.

This reverts
commit 9c6ffbacdb ("hv_netvsc: fix error return code in netvsc_probe()")
and
commit 1ff78076d8 ("netvsc: refactor notifier/event handling code to use the failover framework")

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12 15:22:28 -07:00
Linus Torvalds
f0dc7f9c6d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix several bpfilter/UMH bugs, in particular make the UMH build not
    depend upon X86 specific Kconfig symbols. From Alexei Starovoitov.

 2) Fix handling of modified context pointer in bpf verifier, from
    Daniel Borkmann.

 3) Kill regression in ifdown/ifup sequences for hv_netvsc driver, from
    Dexuan Cui.

 4) When the bonding primary member name changes, we have to re-evaluate
    the bond->force_primary setting, from Xiangning Yu.

 5) Eliminate possible padding beyone end of SKB in cdc_ncm driver, from
    Bjørn Mork.

 6) RX queue length reported for UDP sockets in procfs and socket diag
    are inaccurate, from Paolo Abeni.

 7) Fix br_fdb_find_port() locking, from Petr Machata.

 8) Limit sk_rcvlowat values properly in TCP, from Soheil Hassas
    Yeganeh.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
  tcp: limit sk_rcvlowat by the maximum receive buffer
  net: phy: dp83822: use BMCR_ANENABLE instead of BMSR_ANEGCAPABLE for DP83620
  socket: close race condition between sock_close() and sockfs_setattr()
  net: bridge: Fix locking in br_fdb_find_port()
  udp: fix rx queue len reported by diag and proc interface
  cdc_ncm: avoid padding beyond end of skb
  net/sched: act_simple: fix parsing of TCA_DEF_DATA
  net: fddi: fix a possible null-ptr-deref
  net: aquantia: fix unsigned numvecs comparison with less than zero
  net: stmmac: fix build failure due to missing COMMON_CLK dependency
  bpfilter: fix race in pipe access
  bpf, xdp: fix crash in xdp_umem_unaccount_pages
  xsk: Fix umem fill/completion queue mmap on 32-bit
  tools/bpf: fix selftest get_cgroup_id_user
  bpfilter: fix OUTPUT_FORMAT
  umh: fix race condition
  net: mscc: ocelot: Fix uninitialized error in ocelot_netdevice_event()
  bonding: re-evaluate force_primary when the primary slave name changes
  ip_tunnel: Fix name string concatenate in __ip_tunnel_create()
  hv_netvsc: Fix a network regression after ifdown/ifup
  ...
2018-06-10 19:25:23 -07:00
Linus Torvalds
5f85942c2e SCSI misc on 20180610
This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
 xfcp, hisi_sas, cxlflash, qla2xxx.  In the absence of Nic, we're also
 taking target updates which are mostly minor except for the tcmu
 refactor. The only real core change to worry about is the removal of
 high page bouncing (in sas, storvsc and iscsi).  This has been well
 tested and no problems have shown up so far.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWx1pbCYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishUucAP42pccS
 ziKyiOizuxv9fZ4Q+nXd1A9zhI5tqqpkHjcQegEA40qiZSi3EKGKR8W0UpX7Ntmo
 tqrZJGojx9lnrAM2RbQ=
 =NMXg
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
  xfcp, hisi_sas, cxlflash, qla2xxx.

  In the absence of Nic, we're also taking target updates which are
  mostly minor except for the tcmu refactor.

  The only real core change to worry about is the removal of high page
  bouncing (in sas, storvsc and iscsi). This has been well tested and no
  problems have shown up so far"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (268 commits)
  scsi: lpfc: update driver version to 12.0.0.4
  scsi: lpfc: Fix port initialization failure.
  scsi: lpfc: Fix 16gb hbas failing cq create.
  scsi: lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc
  scsi: lpfc: correct oversubscription of nvme io requests for an adapter
  scsi: lpfc: Fix MDS diagnostics failure (Rx < Tx)
  scsi: hisi_sas: Mark PHY as in reset for nexus reset
  scsi: hisi_sas: Fix return value when get_free_slot() failed
  scsi: hisi_sas: Terminate STP reject quickly for v2 hw
  scsi: hisi_sas: Add v2 hw force PHY function for internal ATA command
  scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot
  scsi: hisi_sas: Try wait commands before before controller reset
  scsi: hisi_sas: Init disks after controller reset
  scsi: hisi_sas: Create a scsi_host_template per HW module
  scsi: hisi_sas: Reset disks when discovered
  scsi: hisi_sas: Add LED feature for v3 hw
  scsi: hisi_sas: Change common allocation mode of device id
  scsi: hisi_sas: change slot index allocation mode
  scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate()
  scsi: hisi_sas: fix a typo in hisi_sas_task_prep()
  ...
2018-06-10 13:01:12 -07:00
Dexuan Cui
52acf73b6e hv_netvsc: Fix a network regression after ifdown/ifup
Recently people reported the NIC stops working after
"ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
enabled, after the refactoring of the common detach logic: when the NIC
has sub-channels, usually we enable all the TX queues after all
sub-channels are set up: see rndis_set_subchannel() ->
netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
the number of channels doesn't change, we also must make sure the TX queues
are enabled. The patch fixes the regression.

Fixes: 7b2ee50c0c ("hv_netvsc: common detach logic")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-07 16:24:51 -04:00
Wei Yongjun
9c6ffbacdb hv_netvsc: fix error return code in netvsc_probe()
Fix to return a negative error code from the failover register fail
error handling case instead of 0, as done elsewhere in this function.

Fixes: 1ff78076d8 ("netvsc: refactor notifier/event handling code to use the failover framework")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 10:35:25 -04:00
Sridhar Samudrala
1ff78076d8 netvsc: refactor notifier/event handling code to use the failover framework
Use the registration/notification framework supported by the generic
failover infrastructure.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 22:59:54 -04:00
Haiyang Zhang
273de02ae7 hv_netvsc: Add handlers for ethtool get/set msg level
The handlers for ethtool get/set msg level are missing from netvsc.
This patch adds them.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23 14:55:00 -04:00
Stephen Hemminger
97f3efb643 hv_netvsc: set master device
The hyper-v transparent bonding should have used master_dev_link.
The netvsc device should look like a master bond device not
like the upper side of a tunnel.

This makes the semantics the same so that userspace applications
looking at network devices see the correct master relationshipship.

Fixes: 0c195567a8 ("netvsc: transparent VF management")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-10 17:36:20 -04:00
Long Li
6b1f8376dc scsi: netvsc: Use the vmbus function to calculate ring buffer percentage
In Vmbus, we have defined a function to calculate available ring buffer
percentage to write.

Use that function and remove netvsc's private version.

[mkp: typo]

Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:52 -04:00
Joe Perches
d61e403856 drivers/net: Use octal not symbolic permissions
Prefer the direct use of octal for permissions.

Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.

Miscellanea:

o Whitespace neatening around these conversions.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26 12:07:49 -04:00
Haiyang Zhang
5c71dadbb4 hv_netvsc: Fix the return status in RX path
As defined in hyperv_net.h, the NVSP_STAT_SUCCESS is one not zero.
Some functions returns 0 when it actually means NVSP_STAT_SUCCESS.
This patch fixes them.

In netvsc_receive(), it puts the last RNDIS packet's receive status
for all packets in a vmxferpage which may contain multiple RNDIS
packets.
This patch puts NVSP_STAT_FAIL in the receive completion if one of
the packets in a vmxferpage fails.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25 17:07:40 -04:00
Stephen Hemminger
7b2ee50c0c hv_netvsc: common detach logic
Make common function for detaching internals of device
during changes to MTU and RSS. Make sure no more packets
are transmitted and all packets have been received before
doing device teardown.

Change the wait logic to be common and use usleep_range().

Changes transmit enabling logic so that transmit queues are disabled
during the period when lower device is being changed. And enabled
only after sub channels are setup. This avoids issue where it could
be that a packet was being sent while subchannel was not initialized.

Fixes: 8195b1396e ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 12:45:09 -04:00
Stephen Hemminger
b0dee79103 hv_netvsc: fix locking during VF setup
The dev_uc/mc_sync calls need to have the device address list
locked. This was spotted by running with lockdep enabled.

Fixes: bee9d41b37 ("hv_netvsc: propagate rx filters to VF")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08 12:48:57 -05:00
Stephen Hemminger
35a57b7fef hv_netvsc: fix locking for rx_mode
The rx_mode operation handler is different than other callbacks
in that is not always called with rtnl held. Therefore use
RCU to ensure that references are valid.

Fixes: bee9d41b37 ("hv_netvsc: propagate rx filters to VF")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08 12:48:56 -05:00
Stephen Hemminger
bee9d41b37 hv_netvsc: propagate rx filters to VF
The netvsc device should propagate filters to the SR-IOV VF
device (if present). The flags also need to be propagated to the
VF device as well. This only really matters on local Hyper-V
since Azure does not support multiple addresses.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04 22:18:21 -05:00
Stephen Hemminger
b3bf5666a5 hv_netvsc: defer queue selection to VF
When VF is used for accelerated networking it will likely have
more queues (and different policy) than the synthetic NIC.
This patch defers the queue policy to the VF so that all the
queues can be used. This impacts workloads like local generate UDP.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04 22:18:20 -05:00
Stephen Hemminger
f4950e4586 hv_netvsc: only wake transmit queue if link is up
Don't wake transmit queues if link is not up yet.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04 22:18:20 -05:00
Stephen Hemminger
cfd8afd986 hv_netvsc: empty current transmit aggregation if flow blocked
If the transmit queue is known full, then don't keep aggregating
data. And the cp_partial flag which indicates that the current
aggregation buffer is full can be folded in to avoid more
conditionals.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:39 -05:00
Stephen Hemminger
345ac08990 hv_netvsc: pass netvsc_device to receive callback
The netvsc_receive_callback function was using RCU to find the
appropriate underlying netvsc_device. Since calling function already
had that pointer, this was unnecessary.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:39 -05:00
Stephen Hemminger
79cf1bae38 hv_netvsc: simplify function args in receive status path
The caller (netvsc_receive) already has the net device pointer,
and should just pass that to functions rather than the hyperv device.
This eliminates several impossible error paths in the process.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:38 -05:00
Stephen Hemminger
f61a9d62b2 hv_netvsc: track memory allocation failures in ethtool stats
When skb can not be allocated, update ethtool statisitics
rather than rx_dropped which is intended for netif_receive.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:38 -05:00
Haiyang Zhang
41f61db2cd hv_netvsc: Fix the TX/RX buffer default sizes
The values were not computed correctly. There are no significant
visible impact, though.

The intended size of RX buffer is 16 MB, and the default slot size is 1728.
So, NETVSC_DEFAULT_RX should be 16*1024*1024 / 1728 = 9709.

The intended size of TX buffer is 1 MB, and the slot size is 6144.
So, NETVSC_DEFAULT_TX should be 1024*1024 / 6144 = 170.

The patch puts the formula directly into the macro, and moves them to
hyperv_net.h, together with related macros.

Fixes: 5023a6db73 ("netvsc: increase default receive buffer size")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 13:25:04 -05:00
Stephen Hemminger
f5a2255010 hv_netvsc: optimize initialization of RNDIS header
The memset of the whole maximum possible RNDIS header is unnecessary.
For the main part of the header use a structure assignment.

No need to memset the whole per packet info. Instead rely on caller to
set what it wants. Also get rid of cast to void and signed/unsigned
conversion. Now return pointer to per packet data (rather than the
header) which simplifies use by code setting up the packet data.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 10:10:02 -05:00
Stephen Hemminger
a7f99d0f2b hv_netvsc: use reciprocal divide to speed up percent calculation
Every packet sent checks the available ring space. The calculation
can be sped up by using reciprocal divide which is multiplication.

Since ring_size can only be configured by module parameter, so it doesn't
have to be passed around everywhere. Also it should be unsigned
since it is number of pages.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 10:10:02 -05:00
Vitaly Kuznetsov
aefd80e874 hv_netvsc: preserve hw_features on mtu/channels/ringparam changes
rndis_filter_device_add() is called both from netvsc_probe() when we
initially create the device and from set channels/mtu/ringparam
routines where we basically remove the device and add it back.

hw_features is reset in rndis_filter_device_add() and filled with
host data. However, we lose all additional flags which are set outside
of the driver, e.g. register_netdevice() adds NETIF_F_SOFT_FEATURES and
many others.

Unfortunately, calls to rndis_{query_hwcaps(), _set_offload_params()}
calls cannot be avoided on every RNDIS reset: host expects us to set
required features explicitly. Moreover, in theory hardware capabilities
can change and we need to reflect the change in hw_features.

Reset net->hw_features bits according to host data in
rndis_netdev_set_hwcaps(), clear corresponding feature bits
from net->features in case some features went missing (will never happen
in real life I guess but let's be consistent).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-16 10:49:00 +09:00
Haiyang Zhang
39e91cfbf6 hv_netvsc: Rename tx_send_table to tx_table
Simplify the variable name: tx_send_table

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14 18:42:55 -07:00
Haiyang Zhang
47371300df hv_netvsc: Rename ind_table to rx_table
Rename this variable because it is the Receive indirection
table.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14 18:42:55 -07:00
Haiyang Zhang
0518ec4f9d hv_netvsc: Add ethtool handler to set and get TCP hash levels
The patch supports the options to switch TCP hash level between
L3 and L4 by ethtool command. TCP over IPv4 and v6 can be set
differently. The default hash level is L4. We currently only
allow switching TX hash level from within the guests.

For example, for TCP over IPv4 on eth0:
To include TCP port numbers in hashing:
	ethtool -N eth0 rx-flow-hash tcp4 sdfn
To exclude TCP port numbers in hashing:
	ethtool -N eth0 rx-flow-hash tcp4 sd
To show TCP hash level:
	ethtool -n eth0 rx-flow-hash tcp4

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08 10:11:01 -07:00
Haiyang Zhang
486e398105 hv_netvsc: Change the hash level variable to bit flags
This simplifies the logic and make it easier to add more
options.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08 10:11:01 -07:00
David Ahern
42ab19ee90 net: Add extack to upper device linking
Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:39:33 -07:00
Simon Xiao
09af87d18f hv_netvsc: report stop_queue and wake_queue
Report the numbers of events for stop_queue and wake_queue in
ethtool stats.

Example:
ethtool -S eth0
NIC statistics:
	...
	stop_queue: 7
	wake_queue: 7
	...

Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:10:30 +01:00
Haiyang Zhang
6450f8f269 hv_netvsc: Fix the real number of queues of non-vRSS cases
For older hosts without multi-channel (vRSS) support, and some error
cases, we still need to set the real number of queues to one.
This patch adds this missing setting.

Fixes: 8195b1396e ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25 20:20:51 -07:00
Alex Ng
0ab09befdb hv_netvsc: fix send buffer failure on MTU change
If MTU is changed the host would reject the send buffer change.
This problem is result of recent change to allow changing send
buffer size.

Every time we change the MTU, we store the previous net_device section
count before destroying the buffer, but we don’t store the previous
section size. When we reinitialize the buffer, its size is calculated
by multiplying the previous count and previous size. Since we
continuously increase the MTU, the host returns us a decreasing count
value while the section size is reinitialized to 1728 bytes every
time.

This eventually leads to a condition where the calculated buf_size is
so small that the host rejects it.

Fixes: 8b5327975a ("netvsc: allow controlling send/recv buffer size")
Signed-off-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:17:16 -07:00
Stephen Hemminger
5023a6db73 netvsc: increase default receive buffer size
The default receive buffer size was reduced by recent change
to a value which was appropriate for 10G and Windows Server 2016.
But the value is too small for full performance with 40G on Azure.
Increase the default back to maximum supported by host.

Fixes: 8b5327975a ("netvsc: allow controlling send/recv buffer size")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15 14:41:12 -07:00
Stephen Hemminger
8195b1396e hv_netvsc: fix deadlock on hotplug
When a virtual device is added dynamically (via host console), then
the vmbus sends an offer message for the primary channel. The processing
of this message for networking causes the network device to then
initialize the sub channels.

The problem is that setting up the sub channels needs to wait until
the subsequent subchannel offers have been processed. These offers
come in on the same ring buffer and work queue as where the primary
offer is being processed; leading to a deadlock.

This did not happen in older kernels, because the sub channel waiting
logic was broken (it wasn't really waiting).

The solution is to do the sub channel setup in its own work queue
context that is scheduled by the primary channel setup; and then
happens later.

Fixes: 732e49850c ("netvsc: fix race on sub channel creation")
Reported-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11 14:21:30 -07:00
Haiyang Zhang
db3cd7af9d hv_netvsc: Fix the channel limit in netvsc_set_rxfh()
The limit of setting receive indirection table value should be
the current number of channels, not the VRSS_CHANNEL_MAX.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 20:39:12 -07:00
Haiyang Zhang
06be580ac7 hv_netvsc: Simplify the limit check in netvsc_set_channels()
Because of the following code, net->num_tx_queues equals to
VRSS_CHANNEL_MAX, and max_chn is less than or equals to VRSS_CHANNEL_MAX.

netvsc_drv.c:
alloc_etherdev_mq(sizeof(struct net_device_context),
                                VRSS_CHANNEL_MAX);
rndis_filter.c:
net_device->max_chn = min_t(u32, VRSS_CHANNEL_MAX, num_possible_rss_qs);

So this patch removes the unnecessary limit check before comparing
with "max_chn".

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 20:39:12 -07:00
Haiyang Zhang
715e2ec532 hv_netvsc: Clean up an unused parameter in rndis_filter_set_rss_param()
This patch removes the parameter, num_queue in
rndis_filter_set_rss_param(), which is no longer in use.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 20:39:12 -07:00
Stephen Hemminger
ec158f77de netvsc: allow driver to be removed even if VF is present
If VF is attached then can still allow netvsc driver module to
be removed. Just have to make sure and do the cleanup.

Also, avoid extra rtnl round trip when calling unregister.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 20:31:19 -07:00
Stephen Hemminger
9a0c48df0d netvsc: cleanup datapath switch
Use one routine for datapath up/down. Don't need to reopen
the rndis layer.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 20:31:19 -07:00
David S. Miller
6026e043d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 17:42:05 -07:00
stephen hemminger
9b4e946ce1 netvsc: fix deadlock betwen link status and removal
There is a deadlock possible when canceling the link status
delayed work queue. The removal process is run with RTNL held,
and the link status callback is acquring RTNL.

Resolve the issue by using trylock and rescheduling.
If cancel is in process, that block it from happening.

Fixes: 122a5f6410 ("staging: hv: use delayed_work for netvsc_send_garp()")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 21:59:08 -07:00
Haiyang Zhang
c6f71c418f hv_netvsc: Fix rndis_filter_close error during netvsc_remove
We now remove rndis filter before unregister_netdev(), which calls
device close. It involves closing rndis filter already removed.

This patch fixes this error.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 21:55:59 -07:00
Haiyang Zhang
4823eb2f3a hv_netvsc: Add ethtool handler to set and get UDP hash levels
The patch add the functions to switch UDP hash level between
L3 and L4 by ethtool command. UDP over IPv4 and v6 can be set
differently. The default hash level is L4. We currently only
allow switching TX hash level from within the guests.

On Azure, fragmented UDP packets have high loss rate with L4
hashing. Using L3 hashing is recommended in this case.

For example, for UDP over IPv4 on eth0:
To include UDP port numbers in hasing:
	ethtool -N eth0 rx-flow-hash udp4 sdfn
To exclude UDP port numbers in hasing:
	ethtool -N eth0 rx-flow-hash udp4 sd
To show UDP hash level:
	ethtool -n eth0 rx-flow-hash udp4

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22 14:08:12 -07:00
Haiyang Zhang
4c0e2cbfd9 hv_netvsc: Clean up unused parameter from netvsc_get_rss_hash_opts()
The parameter "nvdev" is not in use.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22 14:08:11 -07:00
Haiyang Zhang
fcba1569a0 hv_netvsc: Clean up unused parameter from netvsc_get_hash()
The parameter "sk" is not in use.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22 14:08:11 -07:00
stephen hemminger
cad5c19770 netvsc: keep track of some non-fatal overload conditions
Add ethtool statistics for case where send chimmeny buffer is
exhausted and driver has to fall back to doing scatter/gather
send. Also, add statistic for case where ring buffer is full and
receive completions are delayed.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11 14:00:07 -07:00
stephen hemminger
8b5327975a netvsc: allow controlling send/recv buffer size
Control the size of the buffer areas via ethtool ring settings.
They aren't really traditional hardware rings, but host API breaks
receive and send buffer into chunks. The final size of the chunks are
controlled by the host.

The default value of send and receive buffer area for host DMA
is much larger than it needs to be. Experimentation shows that
4M receive and 1M send is sufficient.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11 14:00:06 -07:00