Commit Graph

371 Commits

Author SHA1 Message Date
Marc Zyngier 2f8dc3a01f virtio-pci: Remove affinity hint before freeing the interrupt
virtio-pci registers a per-vq affinity hint when using MSIX,
but fails to remove it when freeing the interrupt, resulting
in this type of splat:

[   31.111202] WARNING: CPU: 0 PID: 2823 at kernel/irq/manage.c:1503 __free_irq+0x2c4/0x2c8
[   31.114689] Modules linked in:
[   31.116101] CPU: 0 PID: 2823 Comm: kexec Not tainted 4.10.0+ #6941
[   31.118911] Hardware name: Generic DT based system
[   31.121319] [<c022fb78>] (unwind_backtrace) from [<c0229d8c>] (show_stack+0x18/0x1c)
[   31.125017] [<c0229d8c>] (show_stack) from [<c05192f4>] (dump_stack+0x84/0x98)
[   31.128427] [<c05192f4>] (dump_stack) from [<c023d940>] (__warn+0xf4/0x10c)
[   31.131910] [<c023d940>] (__warn) from [<c023da20>] (warn_slowpath_null+0x28/0x30)
[   31.135543] [<c023da20>] (warn_slowpath_null) from [<c0290238>] (__free_irq+0x2c4/0x2c8)
[   31.139355] [<c0290238>] (__free_irq) from [<c02902d0>] (free_irq+0x44/0x78)
[   31.142909] [<c02902d0>] (free_irq) from [<c059d3a8>] (vp_del_vqs+0x68/0x1c0)
[   31.146299] [<c059d3a8>] (vp_del_vqs) from [<c056ca4c>] (pci_device_shutdown+0x3c/0x78)

The obvious fix is to drop the affinity hint before freeing the
interrupt.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-04-11 00:30:20 +03:00
Michael S. Tsirkin 0a9b3f47da Revert "virtio_pci: remove struct virtio_pci_vq_info"
This reverts commit 5c34d002dc.

Conflicts:
	drivers/virtio/virtio_pci_common.c

The cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.

This reverts the cleanup changes but keeps the affinity support.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-04-11 00:29:59 +03:00
Michael S. Tsirkin 0b0f9dc52e Revert "virtio_pci: use shared interrupts for virtqueues"
This reverts commit 07ec51480b.

Conflicts:
	drivers/virtio/virtio_pci_common.c

Unfortunately the idea does not work with threadirqs
as more than 32 queues can then map to a single interrupts.

Further, the cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.

This reverts the cleanup changes but keeps the affinity support.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-04-11 00:28:57 +03:00
Michael S. Tsirkin 2008c1544c Revert "virtio_pci: don't duplicate the msix_enable flag in struct pci_dev"
This reverts commit 53a020c661.

The cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-04-11 00:28:41 +03:00
Michael S. Tsirkin bf951b1045 Revert "virtio_pci: simplify MSI-X setup"
This reverts commit 52a6151612.

Conflicts:
	drivers/virtio/virtio_pci_common.c

The cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.

This reverts the cleanup changes but keeps the affinity support.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-04-11 00:19:40 +03:00
Michael S. Tsirkin 8f10d0149f Revert "virtio_pci: fix out of bound access for msix_names"
This reverts commit de85ec8b07.

Follow-up patches will revert 07ec51480b ("virtio_pci: use shared
interrupts for virtqueues") that triggered the problem so no need for
this one anymore.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-04-10 23:27:50 +03:00
Michael S. Tsirkin 404123c2db virtio: allow drivers to validate features
Some drivers can't support all features in all configurations.  At the
moment we blindly set FEATURES_OK and later FAILED.  Support this better
by adding a callback drivers can use to do some early checks.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-04-07 16:38:59 +03:00
Arnd Bergmann f0bb2d50df virtio_balloon: prevent uninitialized variable use
The latest gcc-7.0.1 snapshot reports a new warning:

virtio/virtio_balloon.c: In function 'update_balloon_stats':
virtio/virtio_balloon.c:258:26: error: 'events[2]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:260:26: error: 'events[3]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:261:56: error: 'events[18]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:262:56: error: 'events[17]' is used uninitialized in this function [-Werror=uninitialized]

This seems absolutely right, so we should add an extra check to
prevent copying uninitialized stack data into the statistics.
>From all I can tell, this has been broken since the statistics code
was originally added in 2.6.34.

Fixes: 9564e138b1 ("virtio: Add memory statistics reporting to the balloon driver (V4)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-28 20:41:28 +03:00
Ladi Prosek 9646b26e85 virtio-balloon: use actual number of stats for stats queue buffers
The virtio balloon driver contained a not-so-obvious invariant that
update_balloon_stats has to update exactly VIRTIO_BALLOON_S_NR counters
in order to send valid stats to the host. This commit fixes it by having
update_balloon_stats return the actual number of counters, and its
callers use it when pushing buffers to the stats virtqueue.

Note that it is still out of spec to change the number of counters
at run-time. "Driver MUST supply the same subset of statistics in all
buffers submitted to the statsq."

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-28 20:41:28 +03:00
Ladi Prosek fc8653228c virtio_balloon: init 1st buffer in stats vq
When init_vqs runs, virtio_balloon.stats is either uninitialized or
contains stale values. The host updates its state with garbage data
because it has no way of knowing that this is just a marker buffer
used for signaling.

This patch updates the stats before pushing the initial buffer.

Alternative fixes:
* Push an empty buffer in init_vqs. Not easily done with the current
  virtio implementation and violates the spec "Driver MUST supply the
  same subset of statistics in all buffers submitted to the statsq".
* Push a buffer with invalid tags in init_vqs. Violates the same
  spec clause, plus "invalid tag" is not really defined.

Note: the spec says:
	When using the legacy interface, the device SHOULD ignore all values in
	the first buffer in the statsq supplied by the driver after device
	initialization. Note: Historically, drivers supplied an uninitialized
	buffer in the first buffer.

Unfortunately QEMU does not seem to implement the recommendation
even for the legacy interface.

Cc: stable@vger.kernel.org
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-28 20:41:27 +03:00
Jason Wang de85ec8b07 virtio_pci: fix out of bound access for msix_names
Fedora has received multiple reports of crashes when running
4.11 as a guest

https://bugzilla.redhat.com/show_bug.cgi?id=1430297
https://bugzilla.redhat.com/show_bug.cgi?id=1434462
https://bugzilla.kernel.org/show_bug.cgi?id=194911
https://bugzilla.redhat.com/show_bug.cgi?id=1433899

The crashes are not always consistent but they are generally
some flavor of oops or GPF in virtio related code. Multiple people
have done bisections (Thank you Thorsten Leemhuis and
Richard W.M. Jones) and found this commit to be at fault

07ec51480b is the first bad commit
commit 07ec51480b
Author: Christoph Hellwig <hch@lst.de>
Date:   Sun Feb 5 18:15:19 2017 +0100

    virtio_pci: use shared interrupts for virtqueues

The issue seems to be an out of bounds access to the msix_names
array corrupting kernel memory.

Fixes: 07ec51480b ("virtio_pci: use shared interrupts for virtqueues")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
2017-03-28 20:40:53 +03:00
Linus Torvalds 1827adb11a Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched.h split-up from Ingo Molnar:
 "The point of these changes is to significantly reduce the
  <linux/sched.h> header footprint, to speed up the kernel build and to
  have a cleaner header structure.

  After these changes the new <linux/sched.h>'s typical preprocessed
  size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
  lines), which is around 40% faster to build on typical configs.

  Not much changed from the last version (-v2) posted three weeks ago: I
  eliminated quirks, backmerged fixes plus I rebased it to an upstream
  SHA1 from yesterday that includes most changes queued up in -next plus
  all sched.h changes that were pending from Andrew.

  I've re-tested the series both on x86 and on cross-arch defconfigs,
  and did a bisectability test at a number of random points.

  I tried to test as many build configurations as possible, but some
  build breakage is probably still left - but it should be mostly
  limited to architectures that have no cross-compiler binaries
  available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
  sched/headers: Clean up <linux/sched.h>
  sched/headers: Remove #ifdefs from <linux/sched.h>
  sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
  sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
  sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
  sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
  sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
  sched/core: Remove unused prefetch_stack()
  sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
  sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
  sched/headers: Remove <linux/signal.h> from <linux/sched.h>
  sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
  sched/headers: Remove the runqueue_is_locked() prototype
  sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
  sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
  sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
  ...
2017-03-03 10:16:38 -08:00
Linus Torvalds 54d7989f47 virtio, vhost: optimizations, fixes
Looks like a quiet cycle for vhost/virtio, just a couple of minor
 tweaks. Most notable is automatic interrupt affinity for blk and scsi.
 Hopefully other devices are not far behind.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYt1rRAAoJECgfDbjSjVRpEZsIALSHevdXWtRHBZUb0ZkqPLQb
 /x2Vn49CcALS1p7iSuP9L027MPeaLKyr0NBT9hptBChp/4b9lnZWyyAo6vYQrzfx
 Ia/hLBYsK4ml6lEwbyfLwqkF2cmYCrZhBSVAILifn84lTPoN7CT0PlYDfA+OCaNR
 geo75qF8KR+AUO0aqchwMRL3RV3OxZKxQr2AR6LttCuhiBgnV3Xqxffg/M3x6ONM
 0ffFFdodm6slem3hIEiGUMwKj4NKQhcOleV+y0fVBzWfLQG9210pZbQyRBRikIL0
 7IsaarpaUr7OrLAZFMGF6nJnyRAaRrt6WknTHZkyvyggrePrGcmGgPm4jrODwY4=
 =2zwv
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull vhost updates from Michael Tsirkin:
 "virtio, vhost: optimizations, fixes

  Looks like a quiet cycle for vhost/virtio, just a couple of minor
  tweaks. Most notable is automatic interrupt affinity for blk and scsi.
  Hopefully other devices are not far behind"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-console: avoid DMA from stack
  vhost: introduce O(1) vq metadata cache
  virtio_scsi: use virtio IRQ affinity
  virtio_blk: use virtio IRQ affinity
  blk-mq: provide a default queue mapping for virtio device
  virtio: provide a method to get the IRQ affinity mask for a virtqueue
  virtio: allow drivers to request IRQ affinity when creating VQs
  virtio_pci: simplify MSI-X setup
  virtio_pci: don't duplicate the msix_enable flag in struct pci_dev
  virtio_pci: use shared interrupts for virtqueues
  virtio_pci: remove struct virtio_pci_vq_info
  vhost: try avoiding avail index access when getting descriptor
  virtio_mmio: expose header to userspace
2017-03-02 13:53:13 -08:00
Ingo Molnar 50d34394ce sched/headers: Prepare to remove the <linux/magic.h> include from <linux/sched/task_stack.h>
Update files that depend on the magic.h inclusion.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Christoph Hellwig bbaba47956 virtio: provide a method to get the IRQ affinity mask for a virtqueue
This basically passed up the pci_irq_get_affinity information through
virtio through an optional get_vq_affinity method.  It is only implemented
by the PCI backend for now, and only when we use per-virtqueue IRQs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:05 +02:00
Christoph Hellwig fb5e31d970 virtio: allow drivers to request IRQ affinity when creating VQs
Add a struct irq_affinity pointer to the find_vqs methods, which if set
is used to tell the PCI layer to create the MSI-X vectors for our I/O
virtqueues with the proper affinity from the start.  Compared to after
the fact affinity hints this gives us an instantly working setup and
allows to allocate the irq descritors node-local and avoid interconnect
traffic.  Last but not least this will allow blk-mq queues are created
based on the interrupt affinity for storage drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:04 +02:00
Christoph Hellwig 52a6151612 virtio_pci: simplify MSI-X setup
Try to grab the MSI-X vectors early and fall back to the shared one
before doing lots of allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:04 +02:00
Christoph Hellwig 53a020c661 virtio_pci: don't duplicate the msix_enable flag in struct pci_dev
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:03 +02:00
Christoph Hellwig 07ec51480b virtio_pci: use shared interrupts for virtqueues
This lets IRQ layer handle dispatching IRQs to separate handlers for the
case where we don't have per-VQ MSI-X vectors, and allows us to greatly
simplify the code based on the assumption that we always have interrupt
vector 0 (legacy INTx or config interrupt for MSI-X) available, and
any other interrupt is request/freed throught the VQ, even if the
actual interrupt line might be shared in some cases.

This allows removing a great deal of variables keeping track of the
interrupt state in struct virtio_pci_device, as we can now simply walk the
list of VQs and deal with per-VQ interrupt handlers there, and only treat
vector 0 special.

Additionally clean up the VQ allocation code to properly unwind on error
instead of having a single global cleanup label, which is error prone,
and in this case also leads to more code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:03 +02:00
Christoph Hellwig 5c34d002dc virtio_pci: remove struct virtio_pci_vq_info
We don't really need struct virtio_pci_vq_info, as most field in there
are redundant:

 - the vq backpointer is not strictly neede to start with
 - the entry in the vqs list is not needed - the generic virtqueue already
   has list, we only need to check if it has a callback to get the same
   semantics
 - we can use a simple array to look up the MSI-X vec if needed.
 - That simple array now also duoble serves to replace the per_vq_vectors
   flag

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:03 +02:00
Michael S. Tsirkin 51be7a9a26 virtio_mmio: expose header to userspace
It's handy for userspace emulators like QEMU.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 16:31:23 +02:00
Yisheng Xie 9c57b5808c mm balloon: umount balloon_mnt when removing vb device
With CONFIG_BALLOON_COMPACTION=y the kernel will mount balloon_mnt for
balloon page migration when we probe a virtio_balloon device.  However
we do not unmount it when removing the device.  Fix this.

Fixes: b1123ea6d3 ("mm: balloon: use general non-lru movable page feature")
Link: http://lkml.kernel.org/r/1486531318-35189-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:56 -08:00
David S. Miller 3efa70d78f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 16:29:30 -05:00
John Fastabend 9fe7bfce8b virtio_net: refactor freeze/restore logic into virtnet reset logic
For XDP we will need to reset the queues to allow for buffer headroom
to be configured. In order to do this we need to essentially run the
freeze()/restore() code path. Unfortunately the locking requirements
between the freeze/restore and reset paths are different however so
we can not simply reuse the code.

This patch refactors the code path and adds a reset helper routine.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:05:12 -05:00
Michael S. Tsirkin 0d5415b489 Revert "vring: Force use of DMA API for ARM-based systems with legacy devices"
This reverts commit c7070619f3.

This has been shown to regress on some ARM systems:

by forcing on DMA API usage for ARM systems, we have inadvertently
kicked open a hornets' nest in terms of cache-coherency. Namely that
unless the virtio device is explicitly described as capable of coherent
DMA by firmware, the DMA APIs on ARM and other DT-based platforms will
assume it is non-coherent. This turns out to cause a big problem for the
likes of QEMU and kvmtool, which generate virtio-mmio devices in their
guest DTs but neglect to add the often-overlooked "dma-coherent"
property; as a result, we end up with the guest making non-cacheable
accesses to the vring, the host doing so cacheably, both talking past
each other and things going horribly wrong.

We are working on a safer work-around.

Fixes: c7070619f3 ("vring: Force use of DMA API for ARM-based systems with legacy devices")
Reported-by: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-02-03 23:38:50 +02:00
Will Deacon c7070619f3 vring: Force use of DMA API for ARM-based systems with legacy devices
Booting Linux on an ARM fastmodel containing an SMMU emulation results
in an unexpected I/O page fault from the legacy virtio-blk PCI device:

[    1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[    1.211800] arm-smmu-v3 2b400000.smmu:	0x00000000fffff010
[    1.211880] arm-smmu-v3 2b400000.smmu:	0x0000020800000000
[    1.211959] arm-smmu-v3 2b400000.smmu:	0x00000008fa081002
[    1.212075] arm-smmu-v3 2b400000.smmu:	0x0000000000000000
[    1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[    1.212234] arm-smmu-v3 2b400000.smmu:	0x00000000fffff010
[    1.212314] arm-smmu-v3 2b400000.smmu:	0x0000020800000000
[    1.212394] arm-smmu-v3 2b400000.smmu:	0x00000008fa081000
[    1.212471] arm-smmu-v3 2b400000.smmu:	0x0000000000000000

<system hangs failing to read partition table>

This is because the legacy virtio-blk device is behind an SMMU, so we
have consequently swizzled its DMA ops and configured the SMMU to
translate accesses. This then requires the vring code to use the DMA API
to establish translations, otherwise all transactions will result in
fatal faults and termination.

Given that ARM-based systems only see an SMMU if one is really present
(the topology is all described by firmware tables such as device-tree or
IORT), then we can safely use the DMA API for all legacy virtio devices.
Modern devices can advertise the prescense of an IOMMU using the
VIRTIO_F_IOMMU_PLATFORM feature flag.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: <stable@vger.kernel.org>
Fixes: 876945dbf6 ("arm64: Hook up IOMMU dma_ops")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-25 00:33:11 +02:00
Robin Murphy f7f6634d23 virtio_mmio: Set DMA masks appropriately
Once DMA API usage is enabled, it becomes apparent that virtio-mmio is
inadvertently relying on the default 32-bit DMA mask, which leads to
problems like rapidly exhausting SWIOTLB bounce buffers.

Ensure that we set the appropriate 64-bit DMA mask whenever possible,
with the coherent mask suitably limited for the legacy vring as per
a0be1db430 ("virtio_pci: Limit DMA mask to 44 bits for legacy virtio
devices").

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Fixes: b42111382f ("virtio_mmio: Use the DMA API if enabled")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-01-25 00:33:10 +02:00
Yuan Liu cecdbdc377 virtio_mmio: Set dev.release() to avoid warning
Fix a warning thrown from virtio_mmio_remove():
Device 'virtio0' does not have a release() function

The fix is according to virtio_pci_probe() of
drivers/virtio/virtio_pci_common.c

Signed-off-by: Yuan Liu <liuyuan@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-16 00:13:39 +02:00
Felipe Franciosi 0c7eaf5930 virtio_ring: fix description of virtqueue_get_buf
The device (not the driver) populates the used ring and includes the len
of how much data was written.

Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-16 00:13:38 +02:00
Christoph Hellwig a3cbec6972 virtio_pci: split vp_try_to_find_vqs into INTx and MSI-X variants
There is basically no shared logic between the INTx and MSI-X case in
vp_try_to_find_vqs, so split the function into two and clean them up
a little bit.

Also remove the fairly pointless vp_request_intx wrapper while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-16 00:12:49 +02:00
Christoph Hellwig 66f2f55542 virtio_pci: merge vp_free_vectors into vp_del_vqs
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-16 00:12:48 +02:00
Christoph Hellwig 9f8196cc05 virtio_pci: remove the call to vp_free_vectors in vp_request_msix_vectors
vp_request_msix_vectors is only called by vp_try_to_find_vqs, which already
calls vp_free_vectors through vp_del_vqs in the failure case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-16 00:12:48 +02:00
Christoph Hellwig fa3a327935 virtio_pci: use pci_alloc_irq_vectors
This avoids the separate allocation for the msix_entries structures, and
instead allows us to use pci_irq_vector to find a given IRQ vector.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-16 00:12:47 +02:00
Michael S. Tsirkin d41795978c virtio: clean up handling of request_irq failure
We call del_vqs twice when request_irq fails, this
makes no sense.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-16 00:12:44 +02:00
Gonglei c60923cb9c virtio_ring: fix complaint by sparse
# make C=2 CF="-D__CHECK_ENDIAN__" ./drivers/virtio/

drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types)
drivers/virtio/virtio_ring.c:423:19:    expected unsigned int [unsigned] [assigned] i
drivers/virtio/virtio_ring.c:423:19:    got restricted __virtio16 [usertype] next
drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types)
drivers/virtio/virtio_ring.c:423:19:    expected unsigned int [unsigned] [assigned] i
drivers/virtio/virtio_ring.c:423:19:    got restricted __virtio16 [usertype] next
drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types)
drivers/virtio/virtio_ring.c:423:19:    expected unsigned int [unsigned] [assigned] i
drivers/virtio/virtio_ring.c:423:19:    got restricted __virtio16 [usertype] next
drivers/virtio/virtio_ring.c:604:39: warning: incorrect type in initializer (different base types)
drivers/virtio/virtio_ring.c:604:39:    expected unsigned short [unsigned] [usertype] nextflag
drivers/virtio/virtio_ring.c:604:39:    got restricted __virtio16
drivers/virtio/virtio_ring.c:612:33: warning: restricted __virtio16 degrades to integer

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15 06:39:47 +02:00
Gonglei 61bd405f4e virtio_pci_modern: fix complaint by sparse
drivers/virtio/virtio_pci_modern.c:66:40: warning: incorrect type in argument 2 (different base types)
drivers/virtio/virtio_pci_modern.c:66:40:    expected unsigned int [noderef] [usertype] <asn:2>*addr
drivers/virtio/virtio_pci_modern.c:66:40:    got restricted __le32 [noderef] [usertype] <asn:2>*lo
drivers/virtio/virtio_pci_modern.c:67:33: warning: incorrect type in argument 2 (different base types)
drivers/virtio/virtio_pci_modern.c:67:33:    expected unsigned int [noderef] [usertype] <asn:2>*addr
drivers/virtio/virtio_pci_modern.c:67:33:    got restricted __le32 [noderef] [usertype] <asn:2>*hi
drivers/virtio/virtio_pci_modern.c:150:32: warning: incorrect type in argument 2 (different base types)
drivers/virtio/virtio_pci_modern.c:150:32:    expected unsigned int [noderef] [usertype] <asn:2>*addr
drivers/virtio/virtio_pci_modern.c:150:32:    got restricted __le32 [noderef] <asn:2>*<noident>
drivers/virtio/virtio_pci_modern.c:151:39: warning: incorrect type in argument 1 (different base types)
drivers/virtio/virtio_pci_modern.c:151:39:    expected unsigned int [noderef] [usertype] <asn:2>*addr
drivers/virtio/virtio_pci_modern.c:151:39:    got restricted __le32 [noderef] <asn:2>*<noident>
drivers/virtio/virtio_pci_modern.c:152:32: warning: incorrect type in argument 2 (different base types)
drivers/virtio/virtio_pci_modern.c:152:32:    expected unsigned int [noderef] [usertype] <asn:2>*addr

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15 06:39:46 +02:00
Jonathan Corbet 917fef6f7e Linux 4.9-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYHmoCAAoJEHm+PkMAQRiG7RMIAI2i7Y5hpL5yCxK5AFaL4u/G
 KxXfp1B1UanUTgjOmd7zGqtDYcFX9t7GTTUFixQ7/9Opr4PD9qbnatoDGSc3xjbT
 msDgA1B78F1/Q3kHWfeGq32MihQ4mj5NwUCo+igUcUvvWG7mHgzErj/Nh5RoobQX
 p/izdpTbrw3GX6xXB8olbG7XWHaVye/+TT3q6+gmgm8I/QEujcLeGoycE0zlhPN8
 FG/JX76At/+ZM2Py7Oxo3k+oKL9CHrtOQYDp/wN0uslV5eYvvkZz0/M1HMOGZt+c
 gZU5jzM17K7C4Nzo06WAuBU9wUBGc25m+cPicLlOmljnzfU+f50SKaDjZq3p7QI=
 =2KUF
 -----END PGP SIGNATURE-----

Merge tag 'v4.9-rc4' into sound

Bring in -rc4 patches so I can successfully merge the sound doc changes.
2016-11-18 16:13:41 -07:00
Michael S. Tsirkin 75bfa81bf0 virtio_ring: mark vring_dma_dev inline
This inline function is unused on configurations
where dma_map/unmap are empty macros.

Make the function inline to avoid gcc errors because
of an unused static function.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-31 00:40:08 +02:00
Juergen Gross 3dae2c6152 virtio: remove config.c
Remove unused file config.c

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-31 00:21:45 +02:00
Konstantin Neumoin 8424af5336 virtio: update balloon size in balloon "probe"
The following commit 'fad7b7b27b6a (virtio_balloon: Use a workqueue
instead of "vballoon" kthread)' has added a regression. Original code with
kthread starts the thread inside probe and checks the necessity to update
balloon inside the thread immediately.

Nowadays the code behaves differently. Work is queued only on the first
command from the host after the negotiation. Thus there is a window
especially at the guest startup or the module reloading when the balloon
size is not updated until the notification from the host.

This patch adds balloon size check at the end of the probe to match
original behaviour.

Signed-off-by: Konstantin Neumoin <kneumoin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-31 00:21:41 +02:00
Ladi Prosek 0ea1e4a6d9 virtio_ring: Make interrupt suppression spec compliant
According to the spec, if the VIRTIO_RING_F_EVENT_IDX feature bit is
negotiated the driver MUST set flags to 0. Not dirtying the available
ring in virtqueue_disable_cb also has a minor positive performance
impact, improving L1 dcache load missed by ~0.5% in vring_bench.

Writes to the used event field (vring_used_event) are still unconditional.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: <stable@vger.kernel.org> # f277ec4 virtio_ring: shadow available
Cc: <stable@vger.kernel.org>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-31 00:21:40 +02:00
Will Deacon a0be1db430 virtio_pci: Limit DMA mask to 44 bits for legacy virtio devices
Legacy virtio defines the virtqueue base using a 32-bit PFN field, with
a read-only register indicating a fixed page size of 4k.

This can cause problems for DMA allocators that allocate top down from
the DMA mask, which is set to 64 bits. In this case, the addresses are
silently truncated to 44-bit, leading to IOMMU faults, failure to read
from the queue or data corruption.

This patch restricts the coherent DMA mask for legacy PCI virtio devices
to 44 bits, which matches the specification.

Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Benjamin Serebrin <serebrin@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-31 00:21:39 +02:00
Mauro Carvalho Chehab 8c27ceff36 docs: fix locations of several documents that got moved
The previous patch renamed several files that are cross-referenced
along the Kernel documentation. Adjust the links to point to
the right places.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-10-24 08:12:35 -02:00
Baoyou Xie af7c1beccf virtio: mark vring_dma_dev() static
We get 1 warning when building kernel with W=1:
drivers/virtio/virtio_ring.c:170:16: warning: no previous prototype for 'vring_dma_dev' [-Wmissing-prototypes]

In fact, this function is only used in the file in which it is
declared and don't need a declaration, but can be made static.
so this patch marks this function with 'static'.

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 21:12:35 +03:00
Michael S. Tsirkin 3cc36f6e34 virtio: fix error handling for debug builds
On error, virtqueue_add calls START_USE but not
END_USE. Thankfully that's normally empty anyway,
but might not be when debugging. Fix it up.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-09 13:42:35 +03:00
Wei Yongjun 58625edf9e virtio: fix memory leak in virtqueue_add()
When using the indirect buffers feature, 'desc' is allocated in
virtqueue_add() but isn't freed before leaving on a ring full error,
causing a memory leak.

For example, it seems rather clear that this can trigger
with virtio net if mergeable buffers are not used.

Cc: stable@vger.kernel.org
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-09 13:42:34 +03:00
Linus Torvalds 0803e04011 virtio/vhost: new features for 4.8
- New vsock device support in host and guest
 - Platform IOMMU support in host and guest,
   including compatibility quirks for legacy systems.
 - Misc fixes and cleanups.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXofvbAAoJECgfDbjSjVRpUTIH/iEoK9h636tBayXy0PXkPby0
 6fMaRFy6H1HgEttgDhJE8Pqg/ba3qaW9Em0fHyFq7Mp2waFHAZ8hAT8phC6TAK3c
 CIBnfzyyuI8u3N9SnNOfelPVcwCBfuALuuTsXB/rwKbYQEVv+U5Rdt3Vyx9+lXkj
 P005klz7PfqxFhQrrnj4Eh7VawtHwmMuLH8YoWpCZpM71dHPo6eL+3ftKwhH2boo
 qK86uVprwba03Pewpm13vQnotemfVfUUkjXd4EJpG3dx7E0KZosuj0ZG9OV8mPGQ
 Cl2gBdUhocdJgeUnAHmf6tumYi9KFlYfy6xLy44YMmN7FL3E9nQjaKZp25UKfiM=
 =ztIm
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vhost updates from Michael Tsirkin:

 - new vsock device support in host and guest

 - platform IOMMU support in host and guest, including compatibility
   quirks for legacy systems.

 - misc fixes and cleanups.

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  VSOCK: Use kvfree()
  vhost: split out vringh Kconfig
  vhost: detect 32 bit integer wrap around
  vhost: new device IOTLB API
  vhost: drop vringh dependency
  vhost: convert pre sorted vhost memory array to interval tree
  vhost: introduce vhost memory accessors
  VSOCK: Add Makefile and Kconfig
  VSOCK: Introduce vhost_vsock.ko
  VSOCK: Introduce virtio_transport.ko
  VSOCK: Introduce virtio_vsock_common.ko
  VSOCK: defer sock removal to transports
  VSOCK: transport-specific vsock_transport functions
  vhost: drop vringh dependency
  vop: pull in vhost Kconfig
  virtio: new feature to detect IOMMU device quirk
  balloon: check the number of available pages in leak balloon
  vhost: lockless enqueuing
  vhost: simplify work flushing
2016-08-06 09:20:13 -04:00
Michael S. Tsirkin 1a93769399 virtio: new feature to detect IOMMU device quirk
The interaction between virtio and IOMMUs is messy.

On most systems with virtio, physical addresses match bus addresses,
and it doesn't particularly matter which one we use to program
the device.

On some systems, including Xen and any system with a physical device
that speaks virtio behind a physical IOMMU, we must program the IOMMU
for virtio DMA to work at all.

On other systems, including SPARC and PPC64, virtio-pci devices are
enumerated as though they are behind an IOMMU, but the virtio host
ignores the IOMMU, so we must either pretend that the IOMMU isn't
there or somehow map everything as the identity.

Add a feature bit to detect that quirk: VIRTIO_F_IOMMU_PLATFORM.

Any device with this feature bit set to 0 needs a quirk and has to be
passed physical addresses (as opposed to bus addresses) even though
the device is behind an IOMMU.

Note: it has to be a per-device quirk because for example, there could
be a mix of passed-through and virtual virtio devices. As another
example, some devices could be implemented by an out of process
hypervisor backend (in case of qemu vhost, or vhost-user) and so support
for an IOMMU needs to be coded up separately.

It would be cleanest to handle this in IOMMU core code, but that needs
per-device DMA ops. While we are waiting for that to be implemented, use
a work-around in virtio core.

Note: a "noiommu" feature is a quirk - add a wrapper to make
that clear.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-01 21:44:52 +03:00
Konstantin Neumoin 37cf99e08c balloon: check the number of available pages in leak balloon
The balloon has a special mechanism that is subscribed to the oom
notification which leads to deflation for a fixed number of pages.
The number is always fixed even when the balloon is fully deflated.
But leak_balloon did not expect that the pages to deflate will be more
than taken, and raise a "BUG" in balloon_page_dequeue when page list
will be empty.

So, the simplest solution would be to check that the number of releases
pages is less or equal to the number taken pages.

Cc: stable@vger.kernel.org
Signed-off-by: Konstantin Neumoin <kneumoin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-01 21:44:51 +03:00
Minchan Kim dd4123f324 mm: fix build warnings in <linux/compaction.h>
Randy reported below build error.

> In file included from ../include/linux/balloon_compaction.h:48:0,
>                  from ../mm/balloon_compaction.c:11:
> ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default]
>  static inline int compaction_register_node(struct node *node)
> ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
> ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default]
>  static inline void compaction_unregister_node(struct node *node)
>

It was caused by non-lru page migration which needs compaction.h but
compaction.h doesn't include any header to be standalone.

I think proper header for non-lru page migration is migrate.h rather
than compaction.h because migrate.h has already headers needed to work
non-lru page migration indirectly like isolate_mode_t, migrate_mode
MIGRATEPAGE_SUCCESS.

[akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix]
Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00