virtio,vhost,vdpa: features, fixes

device feature provisioning in ifcvf, mlx5
 new SolidNET driver
 support for zoned block device in virtio blk
 numa support in virtio pmem
 VIRTIO_F_RING_RESET support in vhost-net
 more debugfs entries in mlx5
 resume support in vdpa
 completion batching in virtio blk
 cleanup of dma api use in vdpa
 now simulating more features in vdpa-sim
 documentation, features, fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmP0D98PHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpV6IH/iecRgLMWWjp3n31IFdu31f/J4HpF7dczVjK
 qtV98eJ1N2pkgeJkdCfmB5XszfvFBeAurrS7++FTHiJhrRfR3Z+2ml/Qtvh5DEyP
 qxz6wOw6VVsi/txdUxM1wsxLeEmmzkmFdAmPM+FXeIjhWj76GOgy/4A3eaj6TgzV
 W8ShsBve/UZ5qMOC3XbIscvdOrudHJ18tH90Tiz3NZfH1fAs5E4uWbU6Mrz9DJVr
 canGvf4kAI9z8qram5HSgzPIXRJEYiF4q/eiStdtiiME8gL1mHLRZDNP1I1LeCAb
 q6Q6RCRKi3Ek+LGdH6u+nR1Swu03N2b/g+vgKtv30kJo06oZVzw=
 =EasV
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - device feature provisioning in ifcvf, mlx5

 - new SolidNET driver

 - support for zoned block device in virtio blk

 - numa support in virtio pmem

 - VIRTIO_F_RING_RESET support in vhost-net

 - more debugfs entries in mlx5

 - resume support in vdpa

 - completion batching in virtio blk

 - cleanup of dma api use in vdpa

 - now simulating more features in vdpa-sim

 - documentation, features, fixes all over the place

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits)
  vdpa/mlx5: support device features provisioning
  vdpa/mlx5: make MTU/STATUS presence conditional on feature bits
  vdpa: validate device feature provisioning against supported class
  vdpa: validate provisioned device features against specified attribute
  vdpa: conditionally read STATUS in config space
  vdpa: fix improper error message when adding vdpa dev
  vdpa/mlx5: Initialize CVQ iotlb spinlock
  vdpa/mlx5: Don't clear mr struct on destroy MR
  vdpa/mlx5: Directly assign memory key
  tools/virtio: enable to build with retpoline
  vringh: fix a typo in comments for vringh_kiov
  vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails
  scsi: virtio_scsi: fix handling of kmalloc failure
  vdpa: Fix a couple of spelling mistakes in some messages
  vhost-net: support VIRTIO_F_RING_RESET
  vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit
  vdpa: mlx5: support per virtqueue dma device
  vdpa: set dma mask for vDPA device
  virtio-vdpa: support per vq dma device
  vdpa: introduce get_vq_dma_device()
  ...
This commit is contained in:
Linus Torvalds 2023-02-25 11:48:02 -08:00
commit 84cc6674b7
47 changed files with 3536 additions and 503 deletions

View File

@ -108,6 +108,7 @@ available subsections can be seen below.
vfio-mediated-device
vfio
vfio-pci-device-specific-driver-acceptance
virtio/index
xilinx/index
xillybus
zorro

View File

@ -0,0 +1,11 @@
.. SPDX-License-Identifier: GPL-2.0
======
Virtio
======
.. toctree::
:maxdepth: 1
virtio
writing_virtio_drivers

View File

@ -0,0 +1,145 @@
.. SPDX-License-Identifier: GPL-2.0
.. _virtio:
===============
Virtio on Linux
===============
Introduction
============
Virtio is an open standard that defines a protocol for communication
between drivers and devices of different types, see Chapter 5 ("Device
Types") of the virtio spec (`[1]`_). Originally developed as a standard
for paravirtualized devices implemented by a hypervisor, it can be used
to interface any compliant device (real or emulated) with a driver.
For illustrative purposes, this document will focus on the common case
of a Linux kernel running in a virtual machine and using paravirtualized
devices provided by the hypervisor, which exposes them as virtio devices
via standard mechanisms such as PCI.
Device - Driver communication: virtqueues
=========================================
Although the virtio devices are really an abstraction layer in the
hypervisor, they're exposed to the guest as if they are physical devices
using a specific transport method -- PCI, MMIO or CCW -- that is
orthogonal to the device itself. The virtio spec defines these transport
methods in detail, including device discovery, capabilities and
interrupt handling.
The communication between the driver in the guest OS and the device in
the hypervisor is done through shared memory (that's what makes virtio
devices so efficient) using specialized data structures called
virtqueues, which are actually ring buffers [#f1]_ of buffer descriptors
similar to the ones used in a network device:
.. kernel-doc:: include/uapi/linux/virtio_ring.h
:identifiers: struct vring_desc
All the buffers the descriptors point to are allocated by the guest and
used by the host either for reading or for writing but not for both.
Refer to Chapter 2.5 ("Virtqueues") of the virtio spec (`[1]`_) for the
reference definitions of virtqueues and "Virtqueues and virtio ring: How
the data travels" blog post (`[2]`_) for an illustrated overview of how
the host device and the guest driver communicate.
The :c:type:`vring_virtqueue` struct models a virtqueue, including the
ring buffers and management data. Embedded in this struct is the
:c:type:`virtqueue` struct, which is the data structure that's
ultimately used by virtio drivers:
.. kernel-doc:: include/linux/virtio.h
:identifiers: struct virtqueue
The callback function pointed by this struct is triggered when the
device has consumed the buffers provided by the driver. More
specifically, the trigger will be an interrupt issued by the hypervisor
(see vring_interrupt()). Interrupt request handlers are registered for
a virtqueue during the virtqueue setup process (transport-specific).
.. kernel-doc:: drivers/virtio/virtio_ring.c
:identifiers: vring_interrupt
Device discovery and probing
============================
In the kernel, the virtio core contains the virtio bus driver and
transport-specific drivers like `virtio-pci` and `virtio-mmio`. Then
there are individual virtio drivers for specific device types that are
registered to the virtio bus driver.
How a virtio device is found and configured by the kernel depends on how
the hypervisor defines it. Taking the `QEMU virtio-console
<https://gitlab.com/qemu-project/qemu/-/blob/master/hw/char/virtio-console.c>`__
device as an example. When using PCI as a transport method, the device
will present itself on the PCI bus with vendor 0x1af4 (Red Hat, Inc.)
and device id 0x1003 (virtio console), as defined in the spec, so the
kernel will detect it as it would do with any other PCI device.
During the PCI enumeration process, if a device is found to match the
virtio-pci driver (according to the virtio-pci device table, any PCI
device with vendor id = 0x1af4)::
/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
static const struct pci_device_id virtio_pci_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
{ 0 }
};
then the virtio-pci driver is probed and, if the probing goes well, the
device is registered to the virtio bus::
static int virtio_pci_probe(struct pci_dev *pci_dev,
const struct pci_device_id *id)
{
...
if (force_legacy) {
rc = virtio_pci_legacy_probe(vp_dev);
/* Also try modern mode if we can't map BAR0 (no IO space). */
if (rc == -ENODEV || rc == -ENOMEM)
rc = virtio_pci_modern_probe(vp_dev);
if (rc)
goto err_probe;
} else {
rc = virtio_pci_modern_probe(vp_dev);
if (rc == -ENODEV)
rc = virtio_pci_legacy_probe(vp_dev);
if (rc)
goto err_probe;
}
...
rc = register_virtio_device(&vp_dev->vdev);
When the device is registered to the virtio bus the kernel will look
for a driver in the bus that can handle the device and call that
driver's ``probe`` method.
At this point, the virtqueues will be allocated and configured by
calling the appropriate ``virtio_find`` helper function, such as
virtio_find_single_vq() or virtio_find_vqs(), which will end up calling
a transport-specific ``find_vqs`` method.
References
==========
_`[1]` Virtio Spec v1.2:
https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html
.. Check for later versions of the spec as well.
_`[2]` Virtqueues and virtio ring: How the data travels
https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels
.. rubric:: Footnotes
.. [#f1] that's why they may be also referred to as virtrings.

View File

@ -0,0 +1,197 @@
.. SPDX-License-Identifier: GPL-2.0
.. _writing_virtio_drivers:
======================
Writing Virtio Drivers
======================
Introduction
============
This document serves as a basic guideline for driver programmers that
need to hack a new virtio driver or understand the essentials of the
existing ones. See :ref:`Virtio on Linux <virtio>` for a general
overview of virtio.
Driver boilerplate
==================
As a bare minimum, a virtio driver needs to register in the virtio bus
and configure the virtqueues for the device according to its spec, the
configuration of the virtqueues in the driver side must match the
virtqueue definitions in the device. A basic driver skeleton could look
like this::
#include <linux/virtio.h>
#include <linux/virtio_ids.h>
#include <linux/virtio_config.h>
#include <linux/module.h>
/* device private data (one per device) */
struct virtio_dummy_dev {
struct virtqueue *vq;
};
static void virtio_dummy_recv_cb(struct virtqueue *vq)
{
struct virtio_dummy_dev *dev = vq->vdev->priv;
char *buf;
unsigned int len;
while ((buf = virtqueue_get_buf(dev->vq, &len)) != NULL) {
/* process the received data */
}
}
static int virtio_dummy_probe(struct virtio_device *vdev)
{
struct virtio_dummy_dev *dev = NULL;
/* initialize device data */
dev = kzalloc(sizeof(struct virtio_dummy_dev), GFP_KERNEL);
if (!dev)
return -ENOMEM;
/* the device has a single virtqueue */
dev->vq = virtio_find_single_vq(vdev, virtio_dummy_recv_cb, "input");
if (IS_ERR(dev->vq)) {
kfree(dev);
return PTR_ERR(dev->vq);
}
vdev->priv = dev;
/* from this point on, the device can notify and get callbacks */
virtio_device_ready(vdev);
return 0;
}
static void virtio_dummy_remove(struct virtio_device *vdev)
{
struct virtio_dummy_dev *dev = vdev->priv;
/*
* disable vq interrupts: equivalent to
* vdev->config->reset(vdev)
*/
virtio_reset_device(vdev);
/* detach unused buffers */
while ((buf = virtqueue_detach_unused_buf(dev->vq)) != NULL) {
kfree(buf);
}
/* remove virtqueues */
vdev->config->del_vqs(vdev);
kfree(dev);
}
static const struct virtio_device_id id_table[] = {
{ VIRTIO_ID_DUMMY, VIRTIO_DEV_ANY_ID },
{ 0 },
};
static struct virtio_driver virtio_dummy_driver = {
.driver.name = KBUILD_MODNAME,
.driver.owner = THIS_MODULE,
.id_table = id_table,
.probe = virtio_dummy_probe,
.remove = virtio_dummy_remove,
};
module_virtio_driver(virtio_dummy_driver);
MODULE_DEVICE_TABLE(virtio, id_table);
MODULE_DESCRIPTION("Dummy virtio driver");
MODULE_LICENSE("GPL");
The device id ``VIRTIO_ID_DUMMY`` here is a placeholder, virtio drivers
should be added only for devices that are defined in the spec, see
include/uapi/linux/virtio_ids.h. Device ids need to be at least reserved
in the virtio spec before being added to that file.
If your driver doesn't have to do anything special in its ``init`` and
``exit`` methods, you can use the module_virtio_driver() helper to
reduce the amount of boilerplate code.
The ``probe`` method does the minimum driver setup in this case
(memory allocation for the device data) and initializes the
virtqueue. virtio_device_ready() is used to enable the virtqueue and to
notify the device that the driver is ready to manage the device
("DRIVER_OK"). The virtqueues are anyway enabled automatically by the
core after ``probe`` returns.
.. kernel-doc:: include/linux/virtio_config.h
:identifiers: virtio_device_ready
In any case, the virtqueues need to be enabled before adding buffers to
them.
Sending and receiving data
==========================
The virtio_dummy_recv_cb() callback in the code above will be triggered
when the device notifies the driver after it finishes processing a
descriptor or descriptor chain, either for reading or writing. However,
that's only the second half of the virtio device-driver communication
process, as the communication is always started by the driver regardless
of the direction of the data transfer.
To configure a buffer transfer from the driver to the device, first you
have to add the buffers -- packed as `scatterlists` -- to the
appropriate virtqueue using any of the virtqueue_add_inbuf(),
virtqueue_add_outbuf() or virtqueue_add_sgs(), depending on whether you
need to add one input `scatterlist` (for the device to fill in), one
output `scatterlist` (for the device to consume) or multiple
`scatterlists`, respectively. Then, once the virtqueue is set up, a call
to virtqueue_kick() sends a notification that will be serviced by the
hypervisor that implements the device::
struct scatterlist sg[1];
sg_init_one(sg, buffer, BUFLEN);
virtqueue_add_inbuf(dev->vq, sg, 1, buffer, GFP_ATOMIC);
virtqueue_kick(dev->vq);
.. kernel-doc:: drivers/virtio/virtio_ring.c
:identifiers: virtqueue_add_inbuf
.. kernel-doc:: drivers/virtio/virtio_ring.c
:identifiers: virtqueue_add_outbuf
.. kernel-doc:: drivers/virtio/virtio_ring.c
:identifiers: virtqueue_add_sgs
Then, after the device has read or written the buffers prepared by the
driver and notifies it back, the driver can call virtqueue_get_buf() to
read the data produced by the device (if the virtqueue was set up with
input buffers) or simply to reclaim the buffers if they were already
consumed by the device:
.. kernel-doc:: drivers/virtio/virtio_ring.c
:identifiers: virtqueue_get_buf_ctx
The virtqueue callbacks can be disabled and re-enabled using the
virtqueue_disable_cb() and the family of virtqueue_enable_cb() functions
respectively. See drivers/virtio/virtio_ring.c for more details:
.. kernel-doc:: drivers/virtio/virtio_ring.c
:identifiers: virtqueue_disable_cb
.. kernel-doc:: drivers/virtio/virtio_ring.c
:identifiers: virtqueue_enable_cb
But note that some spurious callbacks can still be triggered under
certain scenarios. The way to disable callbacks reliably is to reset the
device or the virtqueue (virtio_reset_device()).
References
==========
_`[1]` Virtio Spec v1.2:
https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html
Check for later versions of the spec as well.

View File

@ -22057,6 +22057,7 @@ S: Maintained
F: Documentation/ABI/testing/sysfs-bus-vdpa
F: Documentation/ABI/testing/sysfs-class-vduse
F: Documentation/devicetree/bindings/virtio/
F: Documentation/driver-api/virtio/
F: drivers/block/virtio_blk.c
F: drivers/crypto/virtio/
F: drivers/net/virtio_net.c
@ -22077,6 +22078,10 @@ IFCVF VIRTIO DATA PATH ACCELERATOR
R: Zhu Lingshan <lingshan.zhu@intel.com>
F: drivers/vdpa/ifcvf/
SNET DPU VIRTIO DATA PATH ACCELERATOR
R: Alvaro Karsz <alvaro.karsz@solid-run.com>
F: drivers/vdpa/solidrun/
VIRTIO BALLOON
M: "Michael S. Tsirkin" <mst@redhat.com>
M: David Hildenbrand <david@redhat.com>

View File

@ -15,6 +15,7 @@
#include <linux/blk-mq.h>
#include <linux/blk-mq-virtio.h>
#include <linux/numa.h>
#include <linux/vmalloc.h>
#include <uapi/linux/virtio_ring.h>
#define PART_BITS 4
@ -80,22 +81,51 @@ struct virtio_blk {
int num_vqs;
int io_queues[HCTX_MAX_TYPES];
struct virtio_blk_vq *vqs;
/* For zoned device */
unsigned int zone_sectors;
};
struct virtblk_req {
/* Out header */
struct virtio_blk_outhdr out_hdr;
u8 status;
/* In header */
union {
u8 status;
/*
* The zone append command has an extended in header.
* The status field in zone_append_in_hdr must have
* the same offset in virtblk_req as the non-zoned
* status field above.
*/
struct {
u8 status;
u8 reserved[7];
__le64 append_sector;
} zone_append_in_hdr;
};
size_t in_hdr_len;
struct sg_table sg_table;
struct scatterlist sg[];
};
static inline blk_status_t virtblk_result(struct virtblk_req *vbr)
static inline blk_status_t virtblk_result(u8 status)
{
switch (vbr->status) {
switch (status) {
case VIRTIO_BLK_S_OK:
return BLK_STS_OK;
case VIRTIO_BLK_S_UNSUPP:
return BLK_STS_NOTSUPP;
case VIRTIO_BLK_S_ZONE_OPEN_RESOURCE:
return BLK_STS_ZONE_OPEN_RESOURCE;
case VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE:
return BLK_STS_ZONE_ACTIVE_RESOURCE;
case VIRTIO_BLK_S_IOERR:
case VIRTIO_BLK_S_ZONE_UNALIGNED_WP:
default:
return BLK_STS_IOERR;
}
@ -111,11 +141,11 @@ static inline struct virtio_blk_vq *get_virtio_blk_vq(struct blk_mq_hw_ctx *hctx
static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr)
{
struct scatterlist hdr, status, *sgs[3];
struct scatterlist out_hdr, in_hdr, *sgs[3];
unsigned int num_out = 0, num_in = 0;
sg_init_one(&hdr, &vbr->out_hdr, sizeof(vbr->out_hdr));
sgs[num_out++] = &hdr;
sg_init_one(&out_hdr, &vbr->out_hdr, sizeof(vbr->out_hdr));
sgs[num_out++] = &out_hdr;
if (vbr->sg_table.nents) {
if (vbr->out_hdr.type & cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_OUT))
@ -124,8 +154,8 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr)
sgs[num_out + num_in++] = vbr->sg_table.sgl;
}
sg_init_one(&status, &vbr->status, sizeof(vbr->status));
sgs[num_out + num_in++] = &status;
sg_init_one(&in_hdr, &vbr->status, vbr->in_hdr_len);
sgs[num_out + num_in++] = &in_hdr;
return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
}
@ -212,21 +242,22 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev,
struct request *req,
struct virtblk_req *vbr)
{
size_t in_hdr_len = sizeof(vbr->status);
bool unmap = false;
u32 type;
u64 sector = 0;
vbr->out_hdr.sector = 0;
/* Set fields for all request types */
vbr->out_hdr.ioprio = cpu_to_virtio32(vdev, req_get_ioprio(req));
switch (req_op(req)) {
case REQ_OP_READ:
type = VIRTIO_BLK_T_IN;
vbr->out_hdr.sector = cpu_to_virtio64(vdev,
blk_rq_pos(req));
sector = blk_rq_pos(req);
break;
case REQ_OP_WRITE:
type = VIRTIO_BLK_T_OUT;
vbr->out_hdr.sector = cpu_to_virtio64(vdev,
blk_rq_pos(req));
sector = blk_rq_pos(req);
break;
case REQ_OP_FLUSH:
type = VIRTIO_BLK_T_FLUSH;
@ -241,16 +272,42 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev,
case REQ_OP_SECURE_ERASE:
type = VIRTIO_BLK_T_SECURE_ERASE;
break;
case REQ_OP_DRV_IN:
type = VIRTIO_BLK_T_GET_ID;
case REQ_OP_ZONE_OPEN:
type = VIRTIO_BLK_T_ZONE_OPEN;
sector = blk_rq_pos(req);
break;
case REQ_OP_ZONE_CLOSE:
type = VIRTIO_BLK_T_ZONE_CLOSE;
sector = blk_rq_pos(req);
break;
case REQ_OP_ZONE_FINISH:
type = VIRTIO_BLK_T_ZONE_FINISH;
sector = blk_rq_pos(req);
break;
case REQ_OP_ZONE_APPEND:
type = VIRTIO_BLK_T_ZONE_APPEND;
sector = blk_rq_pos(req);
in_hdr_len = sizeof(vbr->zone_append_in_hdr);
break;
case REQ_OP_ZONE_RESET:
type = VIRTIO_BLK_T_ZONE_RESET;
sector = blk_rq_pos(req);
break;
case REQ_OP_ZONE_RESET_ALL:
type = VIRTIO_BLK_T_ZONE_RESET_ALL;
break;
case REQ_OP_DRV_IN:
/* Out header already filled in, nothing to do */
return 0;
default:
WARN_ON_ONCE(1);
return BLK_STS_IOERR;
}
/* Set fields for non-REQ_OP_DRV_IN request types */
vbr->in_hdr_len = in_hdr_len;
vbr->out_hdr.type = cpu_to_virtio32(vdev, type);
vbr->out_hdr.ioprio = cpu_to_virtio32(vdev, req_get_ioprio(req));
vbr->out_hdr.sector = cpu_to_virtio64(vdev, sector);
if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES ||
type == VIRTIO_BLK_T_SECURE_ERASE) {
@ -264,39 +321,74 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev,
static inline void virtblk_request_done(struct request *req)
{
struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
blk_status_t status = virtblk_result(vbr->status);
virtblk_unmap_data(req, vbr);
virtblk_cleanup_cmd(req);
blk_mq_end_request(req, virtblk_result(vbr));
if (req_op(req) == REQ_OP_ZONE_APPEND)
req->__sector = le64_to_cpu(vbr->zone_append_in_hdr.append_sector);
blk_mq_end_request(req, status);
}
static void virtblk_complete_batch(struct io_comp_batch *iob)
{
struct request *req;
rq_list_for_each(&iob->req_list, req) {
virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
virtblk_cleanup_cmd(req);
}
blk_mq_end_request_batch(iob);
}
static int virtblk_handle_req(struct virtio_blk_vq *vq,
struct io_comp_batch *iob)
{
struct virtblk_req *vbr;
int req_done = 0;
unsigned int len;
while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
struct request *req = blk_mq_rq_from_pdu(vbr);
if (likely(!blk_should_fake_timeout(req->q)) &&
!blk_mq_complete_request_remote(req) &&
!blk_mq_add_to_batch(req, iob, vbr->status,
virtblk_complete_batch))
virtblk_request_done(req);
req_done++;
}
return req_done;
}
static void virtblk_done(struct virtqueue *vq)
{
struct virtio_blk *vblk = vq->vdev->priv;
bool req_done = false;
int qid = vq->index;
struct virtblk_req *vbr;
struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index];
int req_done = 0;
unsigned long flags;
unsigned int len;
DEFINE_IO_COMP_BATCH(iob);
spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
spin_lock_irqsave(&vblk_vq->lock, flags);
do {
virtqueue_disable_cb(vq);
while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) {
struct request *req = blk_mq_rq_from_pdu(vbr);
req_done += virtblk_handle_req(vblk_vq, &iob);
if (likely(!blk_should_fake_timeout(req->q)))
blk_mq_complete_request(req);
req_done = true;
}
if (unlikely(virtqueue_is_broken(vq)))
break;
} while (!virtqueue_enable_cb(vq));
/* In case queue is stopped waiting for more buffers. */
if (req_done)
if (req_done) {
if (!rq_list_empty(iob.req_list))
iob.complete(&iob);
/* In case queue is stopped waiting for more buffers. */
blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
}
spin_unlock_irqrestore(&vblk_vq->lock, flags);
}
static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
@ -455,6 +547,275 @@ static void virtio_queue_rqs(struct request **rqlist)
*rqlist = requeue_list;
}
#ifdef CONFIG_BLK_DEV_ZONED
static void *virtblk_alloc_report_buffer(struct virtio_blk *vblk,
unsigned int nr_zones,
unsigned int zone_sectors,
size_t *buflen)
{
struct request_queue *q = vblk->disk->queue;
size_t bufsize;
void *buf;
nr_zones = min_t(unsigned int, nr_zones,
get_capacity(vblk->disk) >> ilog2(zone_sectors));
bufsize = sizeof(struct virtio_blk_zone_report) +
nr_zones * sizeof(struct virtio_blk_zone_descriptor);
bufsize = min_t(size_t, bufsize,
queue_max_hw_sectors(q) << SECTOR_SHIFT);
bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT);
while (bufsize >= sizeof(struct virtio_blk_zone_report)) {
buf = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
if (buf) {
*buflen = bufsize;
return buf;
}
bufsize >>= 1;
}
return NULL;
}
static int virtblk_submit_zone_report(struct virtio_blk *vblk,
char *report_buf, size_t report_len,
sector_t sector)
{
struct request_queue *q = vblk->disk->queue;
struct request *req;
struct virtblk_req *vbr;
int err;
req = blk_mq_alloc_request(q, REQ_OP_DRV_IN, 0);
if (IS_ERR(req))
return PTR_ERR(req);
vbr = blk_mq_rq_to_pdu(req);
vbr->in_hdr_len = sizeof(vbr->status);
vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_ZONE_REPORT);
vbr->out_hdr.sector = cpu_to_virtio64(vblk->vdev, sector);
err = blk_rq_map_kern(q, req, report_buf, report_len, GFP_KERNEL);
if (err)
goto out;
blk_execute_rq(req, false);
err = blk_status_to_errno(virtblk_result(vbr->status));
out:
blk_mq_free_request(req);
return err;
}
static int virtblk_parse_zone(struct virtio_blk *vblk,
struct virtio_blk_zone_descriptor *entry,
unsigned int idx, unsigned int zone_sectors,
report_zones_cb cb, void *data)
{
struct blk_zone zone = { };
if (entry->z_type != VIRTIO_BLK_ZT_SWR &&
entry->z_type != VIRTIO_BLK_ZT_SWP &&
entry->z_type != VIRTIO_BLK_ZT_CONV) {
dev_err(&vblk->vdev->dev, "invalid zone type %#x\n",
entry->z_type);
return -EINVAL;
}
zone.type = entry->z_type;
zone.cond = entry->z_state;
zone.len = zone_sectors;
zone.capacity = le64_to_cpu(entry->z_cap);
zone.start = le64_to_cpu(entry->z_start);
if (zone.cond == BLK_ZONE_COND_FULL)
zone.wp = zone.start + zone.len;
else
zone.wp = le64_to_cpu(entry->z_wp);
return cb(&zone, idx, data);
}
static int virtblk_report_zones(struct gendisk *disk, sector_t sector,
unsigned int nr_zones, report_zones_cb cb,
void *data)
{
struct virtio_blk *vblk = disk->private_data;
struct virtio_blk_zone_report *report;
unsigned int zone_sectors = vblk->zone_sectors;
unsigned int nz, i;
int ret, zone_idx = 0;
size_t buflen;
if (WARN_ON_ONCE(!vblk->zone_sectors))
return -EOPNOTSUPP;
report = virtblk_alloc_report_buffer(vblk, nr_zones,
zone_sectors, &buflen);
if (!report)
return -ENOMEM;
while (zone_idx < nr_zones && sector < get_capacity(vblk->disk)) {
memset(report, 0, buflen);
ret = virtblk_submit_zone_report(vblk, (char *)report,
buflen, sector);
if (ret) {
if (ret > 0)
ret = -EIO;
goto out_free;
}
nz = min((unsigned int)le64_to_cpu(report->nr_zones), nr_zones);
if (!nz)
break;
for (i = 0; i < nz && zone_idx < nr_zones; i++) {
ret = virtblk_parse_zone(vblk, &report->zones[i],
zone_idx, zone_sectors, cb, data);
if (ret)
goto out_free;
sector = le64_to_cpu(report->zones[i].z_start) + zone_sectors;
zone_idx++;
}
}
if (zone_idx > 0)
ret = zone_idx;
else
ret = -EINVAL;
out_free:
kvfree(report);
return ret;
}
static void virtblk_revalidate_zones(struct virtio_blk *vblk)
{
u8 model;
if (!vblk->zone_sectors)
return;
virtio_cread(vblk->vdev, struct virtio_blk_config,
zoned.model, &model);
if (!blk_revalidate_disk_zones(vblk->disk, NULL))
set_capacity_and_notify(vblk->disk, 0);
}
static int virtblk_probe_zoned_device(struct virtio_device *vdev,
struct virtio_blk *vblk,
struct request_queue *q)
{
u32 v;
u8 model;
int ret;
virtio_cread(vdev, struct virtio_blk_config,
zoned.model, &model);
switch (model) {
case VIRTIO_BLK_Z_NONE:
return 0;
case VIRTIO_BLK_Z_HM:
break;
case VIRTIO_BLK_Z_HA:
/*
* Present the host-aware device as a regular drive.
* TODO It is possible to add an option to make it appear
* in the system as a zoned drive.
*/
return 0;
default:
dev_err(&vdev->dev, "unsupported zone model %d\n", model);
return -EINVAL;
}
dev_dbg(&vdev->dev, "probing host-managed zoned device\n");
disk_set_zoned(vblk->disk, BLK_ZONED_HM);
blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
virtio_cread(vdev, struct virtio_blk_config,
zoned.max_open_zones, &v);
disk_set_max_open_zones(vblk->disk, le32_to_cpu(v));
dev_dbg(&vdev->dev, "max open zones = %u\n", le32_to_cpu(v));
virtio_cread(vdev, struct virtio_blk_config,
zoned.max_active_zones, &v);
disk_set_max_active_zones(vblk->disk, le32_to_cpu(v));
dev_dbg(&vdev->dev, "max active zones = %u\n", le32_to_cpu(v));
virtio_cread(vdev, struct virtio_blk_config,
zoned.write_granularity, &v);
if (!v) {
dev_warn(&vdev->dev, "zero write granularity reported\n");
return -ENODEV;
}
blk_queue_physical_block_size(q, le32_to_cpu(v));
blk_queue_io_min(q, le32_to_cpu(v));
dev_dbg(&vdev->dev, "write granularity = %u\n", le32_to_cpu(v));
/*
* virtio ZBD specification doesn't require zones to be a power of
* two sectors in size, but the code in this driver expects that.
*/
virtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors, &v);
vblk->zone_sectors = le32_to_cpu(v);
if (vblk->zone_sectors == 0 || !is_power_of_2(vblk->zone_sectors)) {
dev_err(&vdev->dev,
"zoned device with non power of two zone size %u\n",
vblk->zone_sectors);
return -ENODEV;
}
dev_dbg(&vdev->dev, "zone sectors = %u\n", vblk->zone_sectors);
if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
dev_warn(&vblk->vdev->dev,
"ignoring negotiated F_DISCARD for zoned device\n");
blk_queue_max_discard_sectors(q, 0);
}
ret = blk_revalidate_disk_zones(vblk->disk, NULL);
if (!ret) {
virtio_cread(vdev, struct virtio_blk_config,
zoned.max_append_sectors, &v);
if (!v) {
dev_warn(&vdev->dev, "zero max_append_sectors reported\n");
return -ENODEV;
}
blk_queue_max_zone_append_sectors(q, le32_to_cpu(v));
dev_dbg(&vdev->dev, "max append sectors = %u\n", le32_to_cpu(v));
}
return ret;
}
static inline bool virtblk_has_zoned_feature(struct virtio_device *vdev)
{
return virtio_has_feature(vdev, VIRTIO_BLK_F_ZONED);
}
#else
/*
* Zoned block device support is not configured in this kernel.
* We only need to define a few symbols to avoid compilation errors.
*/
#define virtblk_report_zones NULL
static inline void virtblk_revalidate_zones(struct virtio_blk *vblk)
{
}
static inline int virtblk_probe_zoned_device(struct virtio_device *vdev,
struct virtio_blk *vblk, struct request_queue *q)
{
return -EOPNOTSUPP;
}
static inline bool virtblk_has_zoned_feature(struct virtio_device *vdev)
{
return false;
}
#endif /* CONFIG_BLK_DEV_ZONED */
/* return id (s/n) string for *disk to *id_str
*/
static int virtblk_get_id(struct gendisk *disk, char *id_str)
@ -462,18 +823,24 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
struct virtio_blk *vblk = disk->private_data;
struct request_queue *q = vblk->disk->queue;
struct request *req;
struct virtblk_req *vbr;
int err;
req = blk_mq_alloc_request(q, REQ_OP_DRV_IN, 0);
if (IS_ERR(req))
return PTR_ERR(req);
vbr = blk_mq_rq_to_pdu(req);
vbr->in_hdr_len = sizeof(vbr->status);
vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_GET_ID);
vbr->out_hdr.sector = 0;
err = blk_rq_map_kern(q, req, id_str, VIRTIO_BLK_ID_BYTES, GFP_KERNEL);
if (err)
goto out;
blk_execute_rq(req, false);
err = blk_status_to_errno(virtblk_result(blk_mq_rq_to_pdu(req)));
err = blk_status_to_errno(virtblk_result(vbr->status));
out:
blk_mq_free_request(req);
return err;
@ -524,6 +891,7 @@ static const struct block_device_operations virtblk_fops = {
.owner = THIS_MODULE,
.getgeo = virtblk_getgeo,
.free_disk = virtblk_free_disk,
.report_zones = virtblk_report_zones,
};
static int index_to_minor(int index)
@ -594,6 +962,7 @@ static void virtblk_config_changed_work(struct work_struct *work)
struct virtio_blk *vblk =
container_of(work, struct virtio_blk, config_work);
virtblk_revalidate_zones(vblk);
virtblk_update_capacity(vblk, true);
}
@ -835,36 +1204,15 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set)
}
}
static void virtblk_complete_batch(struct io_comp_batch *iob)
{
struct request *req;
rq_list_for_each(&iob->req_list, req) {
virtblk_unmap_data(req, blk_mq_rq_to_pdu(req));
virtblk_cleanup_cmd(req);
}
blk_mq_end_request_batch(iob);
}
static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
{
struct virtio_blk *vblk = hctx->queue->queuedata;
struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
struct virtblk_req *vbr;
unsigned long flags;
unsigned int len;
int found = 0;
spin_lock_irqsave(&vq->lock, flags);
while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
struct request *req = blk_mq_rq_from_pdu(vbr);
found++;
if (!blk_mq_add_to_batch(req, iob, vbr->status,
virtblk_complete_batch))
blk_mq_complete_request(req);
}
found = virtblk_handle_req(vq, iob);
if (found)
blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
@ -1150,6 +1498,15 @@ static int virtblk_probe(struct virtio_device *vdev)
virtblk_update_capacity(vblk, false);
virtio_device_ready(vdev);
if (virtblk_has_zoned_feature(vdev)) {
err = virtblk_probe_zoned_device(vdev, vblk, q);
if (err)
goto out_cleanup_disk;
}
dev_info(&vdev->dev, "blk config size: %zu\n",
sizeof(struct virtio_blk_config));
err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
if (err)
goto out_cleanup_disk;
@ -1251,6 +1608,9 @@ static unsigned int features[] = {
VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES,
VIRTIO_BLK_F_SECURE_ERASE,
#ifdef CONFIG_BLK_DEV_ZONED
VIRTIO_BLK_F_ZONED,
#endif /* CONFIG_BLK_DEV_ZONED */
};
static struct virtio_driver virtio_blk = {

View File

@ -32,7 +32,6 @@ static int init_vq(struct virtio_pmem *vpmem)
static int virtio_pmem_probe(struct virtio_device *vdev)
{
struct nd_region_desc ndr_desc = {};
int nid = dev_to_node(&vdev->dev);
struct nd_region *nd_region;
struct virtio_pmem *vpmem;
struct resource res;
@ -79,7 +78,15 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
ndr_desc.res = &res;
ndr_desc.numa_node = nid;
ndr_desc.numa_node = memory_add_physaddr_to_nid(res.start);
ndr_desc.target_node = phys_to_target_node(res.start);
if (ndr_desc.target_node == NUMA_NO_NODE) {
ndr_desc.target_node = ndr_desc.numa_node;
dev_dbg(&vdev->dev, "changing target node from %d to %d",
NUMA_NO_NODE, ndr_desc.target_node);
}
ndr_desc.flush = async_pmem_flush;
ndr_desc.provider_data = vdev;
set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);

View File

@ -5366,6 +5366,14 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x7901, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr);
/* FLR may cause the SolidRun SNET DPU (rev 0x1) to hang */
static void quirk_no_flr_snet(struct pci_dev *dev)
{
if (dev->revision == 0x1)
quirk_no_flr(dev);
}
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SOLIDRUN, 0x1000, quirk_no_flr_snet);
static void quirk_no_ext_tags(struct pci_dev *pdev)
{
struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);

View File

@ -330,7 +330,7 @@ static void virtscsi_handle_param_change(struct virtio_scsi *vscsi,
scsi_device_put(sdev);
}
static void virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi)
static int virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi)
{
struct scsi_device *sdev;
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
@ -338,6 +338,11 @@ static void virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi)
int result, inquiry_len, inq_result_len = 256;
char *inq_result = kmalloc(inq_result_len, GFP_KERNEL);
if (!inq_result) {
kfree(inq_result);
return -ENOMEM;
}
shost_for_each_device(sdev, shost) {
inquiry_len = sdev->inquiry_len ? sdev->inquiry_len : 36;
@ -366,6 +371,7 @@ static void virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi)
}
kfree(inq_result);
return 0;
}
static void virtscsi_handle_event(struct work_struct *work)
@ -377,9 +383,13 @@ static void virtscsi_handle_event(struct work_struct *work)
if (event->event &
cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) {
int ret;
event->event &= ~cpu_to_virtio32(vscsi->vdev,
VIRTIO_SCSI_T_EVENTS_MISSED);
virtscsi_rescan_hotunplug(vscsi);
ret = virtscsi_rescan_hotunplug(vscsi);
if (ret)
return;
scsi_scan_host(virtio_scsi_host(vscsi->vdev));
}

View File

@ -71,6 +71,18 @@ config MLX5_VDPA_NET
be executed by the hardware. It also supports a variety of stateless
offloads depending on the actual device used and firmware version.
config MLX5_VDPA_STEERING_DEBUG
bool "expose steering counters on debugfs"
select MLX5_VDPA
help
Expose RX steering counters in debugfs to aid in debugging. For each VLAN
or non VLAN interface, two hardware counters are added to the RX flow
table: one for unicast and one for multicast.
The counters counts the number of packets and bytes and exposes them in
debugfs. Once can read the counters using, e.g.:
cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/packets
cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/bytes
config VP_VDPA
tristate "Virtio PCI bridge vDPA driver"
select VIRTIO_PCI_LIB
@ -86,4 +98,22 @@ config ALIBABA_ENI_VDPA
VDPA driver for Alibaba ENI (Elastic Network Interface) which is built upon
virtio 0.9.5 specification.
config SNET_VDPA
tristate "SolidRun's vDPA driver for SolidNET"
depends on PCI_MSI && PCI_IOV && (HWMON || HWMON=n)
# This driver MAY create a HWMON device.
# Depending on (HWMON || HWMON=n) ensures that:
# If HWMON=n the driver can be compiled either as a module or built-in.
# If HWMON=y the driver can be compiled either as a module or built-in.
# If HWMON=m the driver is forced to be compiled as a module.
# By doing so, IS_ENABLED can be used instead of IS_REACHABLE
help
vDPA driver for SolidNET DPU.
With this driver, the VirtIO dataplane can be
offloaded to a SolidNET DPU.
This driver includes a HW monitor device that
reads health values from the DPU.
endif # VDPA

View File

@ -6,3 +6,4 @@ obj-$(CONFIG_IFCVF) += ifcvf/
obj-$(CONFIG_MLX5_VDPA) += mlx5/
obj-$(CONFIG_VP_VDPA) += virtio_pci/
obj-$(CONFIG_ALIBABA_ENI_VDPA) += alibaba/
obj-$(CONFIG_SNET_VDPA) += solidrun/

View File

@ -10,11 +10,6 @@
#include "ifcvf_base.h"
struct ifcvf_adapter *vf_to_adapter(struct ifcvf_hw *hw)
{
return container_of(hw, struct ifcvf_adapter, vf);
}
u16 ifcvf_set_vq_vector(struct ifcvf_hw *hw, u16 qid, int vector)
{
struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
@ -37,8 +32,6 @@ u16 ifcvf_set_config_vector(struct ifcvf_hw *hw, int vector)
static void __iomem *get_cap_addr(struct ifcvf_hw *hw,
struct virtio_pci_cap *cap)
{
struct ifcvf_adapter *ifcvf;
struct pci_dev *pdev;
u32 length, offset;
u8 bar;
@ -46,17 +39,14 @@ static void __iomem *get_cap_addr(struct ifcvf_hw *hw,
offset = le32_to_cpu(cap->offset);
bar = cap->bar;
ifcvf= vf_to_adapter(hw);
pdev = ifcvf->pdev;
if (bar >= IFCVF_PCI_MAX_RESOURCE) {
IFCVF_DBG(pdev,
IFCVF_DBG(hw->pdev,
"Invalid bar number %u to get capabilities\n", bar);
return NULL;
}
if (offset + length > pci_resource_len(pdev, bar)) {
IFCVF_DBG(pdev,
if (offset + length > pci_resource_len(hw->pdev, bar)) {
IFCVF_DBG(hw->pdev,
"offset(%u) + len(%u) overflows bar%u's capability\n",
offset, length, bar);
return NULL;
@ -92,6 +82,7 @@ int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *pdev)
IFCVF_ERR(pdev, "Failed to read PCI capability list\n");
return -EIO;
}
hw->pdev = pdev;
while (pos) {
ret = ifcvf_read_config_range(pdev, (u32 *)&cap,
@ -215,15 +206,13 @@ u64 ifcvf_get_hw_features(struct ifcvf_hw *hw)
u64 ifcvf_get_features(struct ifcvf_hw *hw)
{
return hw->hw_features;
return hw->dev_features;
}
int ifcvf_verify_min_features(struct ifcvf_hw *hw, u64 features)
{
struct ifcvf_adapter *ifcvf = vf_to_adapter(hw);
if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)) && features) {
IFCVF_ERR(ifcvf->pdev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n");
IFCVF_ERR(hw->pdev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n");
return -EINVAL;
}
@ -232,13 +221,11 @@ int ifcvf_verify_min_features(struct ifcvf_hw *hw, u64 features)
u32 ifcvf_get_config_size(struct ifcvf_hw *hw)
{
struct ifcvf_adapter *adapter;
u32 net_config_size = sizeof(struct virtio_net_config);
u32 blk_config_size = sizeof(struct virtio_blk_config);
u32 cap_size = hw->cap_dev_config_size;
u32 config_size;
adapter = vf_to_adapter(hw);
/* If the onboard device config space size is greater than
* the size of struct virtio_net/blk_config, only the spec
* implementing contents size is returned, this is very
@ -253,7 +240,7 @@ u32 ifcvf_get_config_size(struct ifcvf_hw *hw)
break;
default:
config_size = 0;
IFCVF_ERR(adapter->pdev, "VIRTIO ID %u not supported\n", hw->dev_type);
IFCVF_ERR(hw->pdev, "VIRTIO ID %u not supported\n", hw->dev_type);
}
return config_size;
@ -301,14 +288,11 @@ static void ifcvf_set_features(struct ifcvf_hw *hw, u64 features)
static int ifcvf_config_features(struct ifcvf_hw *hw)
{
struct ifcvf_adapter *ifcvf;
ifcvf = vf_to_adapter(hw);
ifcvf_set_features(hw, hw->req_features);
ifcvf_add_status(hw, VIRTIO_CONFIG_S_FEATURES_OK);
if (!(ifcvf_get_status(hw) & VIRTIO_CONFIG_S_FEATURES_OK)) {
IFCVF_ERR(ifcvf->pdev, "Failed to set FEATURES_OK status\n");
IFCVF_ERR(hw->pdev, "Failed to set FEATURES_OK status\n");
return -EIO;
}

View File

@ -19,6 +19,7 @@
#include <uapi/linux/virtio_blk.h>
#include <uapi/linux/virtio_config.h>
#include <uapi/linux/virtio_pci.h>
#include <uapi/linux/vdpa.h>
#define N3000_DEVICE_ID 0x1041
#define N3000_SUBSYS_DEVICE_ID 0x001A
@ -38,9 +39,6 @@
#define IFCVF_DBG(pdev, fmt, ...) dev_dbg(&pdev->dev, fmt, ##__VA_ARGS__)
#define IFCVF_INFO(pdev, fmt, ...) dev_info(&pdev->dev, fmt, ##__VA_ARGS__)
#define ifcvf_private_to_vf(adapter) \
(&((struct ifcvf_adapter *)adapter)->vf)
/* all vqs and config interrupt has its own vector */
#define MSIX_VECTOR_PER_VQ_AND_CONFIG 1
/* all vqs share a vector, and config interrupt has a separate vector */
@ -78,6 +76,8 @@ struct ifcvf_hw {
u32 dev_type;
u64 req_features;
u64 hw_features;
/* provisioned device features */
u64 dev_features;
struct virtio_pci_common_cfg __iomem *common_cfg;
void __iomem *dev_cfg;
struct vring_info vring[IFCVF_MAX_QUEUES];
@ -89,12 +89,13 @@ struct ifcvf_hw {
u16 nr_vring;
/* VIRTIO_PCI_CAP_DEVICE_CFG size */
u32 cap_dev_config_size;
struct pci_dev *pdev;
};
struct ifcvf_adapter {
struct vdpa_device vdpa;
struct pci_dev *pdev;
struct ifcvf_hw vf;
struct ifcvf_hw *vf;
};
struct ifcvf_vring_lm_cfg {
@ -109,6 +110,7 @@ struct ifcvf_lm_cfg {
struct ifcvf_vdpa_mgmt_dev {
struct vdpa_mgmt_dev mdev;
struct ifcvf_hw vf;
struct ifcvf_adapter *adapter;
struct pci_dev *pdev;
};

View File

@ -69,10 +69,9 @@ static void ifcvf_free_irq_vectors(void *data)
pci_free_irq_vectors(data);
}
static void ifcvf_free_per_vq_irq(struct ifcvf_adapter *adapter)
static void ifcvf_free_per_vq_irq(struct ifcvf_hw *vf)
{
struct pci_dev *pdev = adapter->pdev;
struct ifcvf_hw *vf = &adapter->vf;
struct pci_dev *pdev = vf->pdev;
int i;
for (i = 0; i < vf->nr_vring; i++) {
@ -83,10 +82,9 @@ static void ifcvf_free_per_vq_irq(struct ifcvf_adapter *adapter)
}
}
static void ifcvf_free_vqs_reused_irq(struct ifcvf_adapter *adapter)
static void ifcvf_free_vqs_reused_irq(struct ifcvf_hw *vf)
{
struct pci_dev *pdev = adapter->pdev;
struct ifcvf_hw *vf = &adapter->vf;
struct pci_dev *pdev = vf->pdev;
if (vf->vqs_reused_irq != -EINVAL) {
devm_free_irq(&pdev->dev, vf->vqs_reused_irq, vf);
@ -95,20 +93,17 @@ static void ifcvf_free_vqs_reused_irq(struct ifcvf_adapter *adapter)
}
static void ifcvf_free_vq_irq(struct ifcvf_adapter *adapter)
static void ifcvf_free_vq_irq(struct ifcvf_hw *vf)
{
struct ifcvf_hw *vf = &adapter->vf;
if (vf->msix_vector_status == MSIX_VECTOR_PER_VQ_AND_CONFIG)
ifcvf_free_per_vq_irq(adapter);
ifcvf_free_per_vq_irq(vf);
else
ifcvf_free_vqs_reused_irq(adapter);
ifcvf_free_vqs_reused_irq(vf);
}
static void ifcvf_free_config_irq(struct ifcvf_adapter *adapter)
static void ifcvf_free_config_irq(struct ifcvf_hw *vf)
{
struct pci_dev *pdev = adapter->pdev;
struct ifcvf_hw *vf = &adapter->vf;
struct pci_dev *pdev = vf->pdev;
if (vf->config_irq == -EINVAL)
return;
@ -123,12 +118,12 @@ static void ifcvf_free_config_irq(struct ifcvf_adapter *adapter)
}
}
static void ifcvf_free_irq(struct ifcvf_adapter *adapter)
static void ifcvf_free_irq(struct ifcvf_hw *vf)
{
struct pci_dev *pdev = adapter->pdev;
struct pci_dev *pdev = vf->pdev;
ifcvf_free_vq_irq(adapter);
ifcvf_free_config_irq(adapter);
ifcvf_free_vq_irq(vf);
ifcvf_free_config_irq(vf);
ifcvf_free_irq_vectors(pdev);
}
@ -137,10 +132,9 @@ static void ifcvf_free_irq(struct ifcvf_adapter *adapter)
* It returns the number of allocated vectors, negative
* return value when fails.
*/
static int ifcvf_alloc_vectors(struct ifcvf_adapter *adapter)
static int ifcvf_alloc_vectors(struct ifcvf_hw *vf)
{
struct pci_dev *pdev = adapter->pdev;
struct ifcvf_hw *vf = &adapter->vf;
struct pci_dev *pdev = vf->pdev;
int max_intr, ret;
/* all queues and config interrupt */
@ -160,10 +154,9 @@ static int ifcvf_alloc_vectors(struct ifcvf_adapter *adapter)
return ret;
}
static int ifcvf_request_per_vq_irq(struct ifcvf_adapter *adapter)
static int ifcvf_request_per_vq_irq(struct ifcvf_hw *vf)
{
struct pci_dev *pdev = adapter->pdev;
struct ifcvf_hw *vf = &adapter->vf;
struct pci_dev *pdev = vf->pdev;
int i, vector, ret, irq;
vf->vqs_reused_irq = -EINVAL;
@ -190,15 +183,14 @@ static int ifcvf_request_per_vq_irq(struct ifcvf_adapter *adapter)
return 0;
err:
ifcvf_free_irq(adapter);
ifcvf_free_irq(vf);
return -EFAULT;
}
static int ifcvf_request_vqs_reused_irq(struct ifcvf_adapter *adapter)
static int ifcvf_request_vqs_reused_irq(struct ifcvf_hw *vf)
{
struct pci_dev *pdev = adapter->pdev;
struct ifcvf_hw *vf = &adapter->vf;
struct pci_dev *pdev = vf->pdev;
int i, vector, ret, irq;
vector = 0;
@ -224,15 +216,14 @@ static int ifcvf_request_vqs_reused_irq(struct ifcvf_adapter *adapter)
return 0;
err:
ifcvf_free_irq(adapter);
ifcvf_free_irq(vf);
return -EFAULT;
}
static int ifcvf_request_dev_irq(struct ifcvf_adapter *adapter)
static int ifcvf_request_dev_irq(struct ifcvf_hw *vf)
{
struct pci_dev *pdev = adapter->pdev;
struct ifcvf_hw *vf = &adapter->vf;
struct pci_dev *pdev = vf->pdev;
int i, vector, ret, irq;
vector = 0;
@ -265,29 +256,27 @@ static int ifcvf_request_dev_irq(struct ifcvf_adapter *adapter)
return 0;
err:
ifcvf_free_irq(adapter);
ifcvf_free_irq(vf);
return -EFAULT;
}
static int ifcvf_request_vq_irq(struct ifcvf_adapter *adapter)
static int ifcvf_request_vq_irq(struct ifcvf_hw *vf)
{
struct ifcvf_hw *vf = &adapter->vf;
int ret;
if (vf->msix_vector_status == MSIX_VECTOR_PER_VQ_AND_CONFIG)
ret = ifcvf_request_per_vq_irq(adapter);
ret = ifcvf_request_per_vq_irq(vf);
else
ret = ifcvf_request_vqs_reused_irq(adapter);
ret = ifcvf_request_vqs_reused_irq(vf);
return ret;
}
static int ifcvf_request_config_irq(struct ifcvf_adapter *adapter)
static int ifcvf_request_config_irq(struct ifcvf_hw *vf)
{
struct pci_dev *pdev = adapter->pdev;
struct ifcvf_hw *vf = &adapter->vf;
struct pci_dev *pdev = vf->pdev;
int config_vector, ret;
if (vf->msix_vector_status == MSIX_VECTOR_PER_VQ_AND_CONFIG)
@ -320,17 +309,16 @@ static int ifcvf_request_config_irq(struct ifcvf_adapter *adapter)
return 0;
err:
ifcvf_free_irq(adapter);
ifcvf_free_irq(vf);
return -EFAULT;
}
static int ifcvf_request_irq(struct ifcvf_adapter *adapter)
static int ifcvf_request_irq(struct ifcvf_hw *vf)
{
struct ifcvf_hw *vf = &adapter->vf;
int nvectors, ret, max_intr;
nvectors = ifcvf_alloc_vectors(adapter);
nvectors = ifcvf_alloc_vectors(vf);
if (nvectors <= 0)
return -EFAULT;
@ -341,16 +329,16 @@ static int ifcvf_request_irq(struct ifcvf_adapter *adapter)
if (nvectors == 1) {
vf->msix_vector_status = MSIX_VECTOR_DEV_SHARED;
ret = ifcvf_request_dev_irq(adapter);
ret = ifcvf_request_dev_irq(vf);
return ret;
}
ret = ifcvf_request_vq_irq(adapter);
ret = ifcvf_request_vq_irq(vf);
if (ret)
return ret;
ret = ifcvf_request_config_irq(adapter);
ret = ifcvf_request_config_irq(vf);
if (ret)
return ret;
@ -358,9 +346,9 @@ static int ifcvf_request_irq(struct ifcvf_adapter *adapter)
return 0;
}
static int ifcvf_start_datapath(void *private)
static int ifcvf_start_datapath(struct ifcvf_adapter *adapter)
{
struct ifcvf_hw *vf = ifcvf_private_to_vf(private);
struct ifcvf_hw *vf = adapter->vf;
u8 status;
int ret;
@ -374,9 +362,9 @@ static int ifcvf_start_datapath(void *private)
return ret;
}
static int ifcvf_stop_datapath(void *private)
static int ifcvf_stop_datapath(struct ifcvf_adapter *adapter)
{
struct ifcvf_hw *vf = ifcvf_private_to_vf(private);
struct ifcvf_hw *vf = adapter->vf;
int i;
for (i = 0; i < vf->nr_vring; i++)
@ -389,7 +377,7 @@ static int ifcvf_stop_datapath(void *private)
static void ifcvf_reset_vring(struct ifcvf_adapter *adapter)
{
struct ifcvf_hw *vf = ifcvf_private_to_vf(adapter);
struct ifcvf_hw *vf = adapter->vf;
int i;
for (i = 0; i < vf->nr_vring; i++) {
@ -414,7 +402,7 @@ static struct ifcvf_hw *vdpa_to_vf(struct vdpa_device *vdpa_dev)
{
struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev);
return &adapter->vf;
return adapter->vf;
}
static u64 ifcvf_vdpa_get_device_features(struct vdpa_device *vdpa_dev)
@ -479,7 +467,7 @@ static void ifcvf_vdpa_set_status(struct vdpa_device *vdpa_dev, u8 status)
if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
!(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) {
ret = ifcvf_request_irq(adapter);
ret = ifcvf_request_irq(vf);
if (ret) {
status = ifcvf_get_status(vf);
status |= VIRTIO_CONFIG_S_FAILED;
@ -511,7 +499,7 @@ static int ifcvf_vdpa_reset(struct vdpa_device *vdpa_dev)
if (status_old & VIRTIO_CONFIG_S_DRIVER_OK) {
ifcvf_stop_datapath(adapter);
ifcvf_free_irq(adapter);
ifcvf_free_irq(vf);
}
ifcvf_reset_vring(adapter);
@ -755,17 +743,37 @@ static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
struct vdpa_device *vdpa_dev;
struct pci_dev *pdev;
struct ifcvf_hw *vf;
u64 device_features;
int ret;
ifcvf_mgmt_dev = container_of(mdev, struct ifcvf_vdpa_mgmt_dev, mdev);
if (!ifcvf_mgmt_dev->adapter)
return -EOPNOTSUPP;
vf = &ifcvf_mgmt_dev->vf;
pdev = vf->pdev;
adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
&pdev->dev, &ifc_vdpa_ops, 1, 1, NULL, false);
if (IS_ERR(adapter)) {
IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
return PTR_ERR(adapter);
}
adapter = ifcvf_mgmt_dev->adapter;
vf = &adapter->vf;
pdev = adapter->pdev;
ifcvf_mgmt_dev->adapter = adapter;
adapter->pdev = pdev;
adapter->vdpa.dma_dev = &pdev->dev;
adapter->vdpa.mdev = mdev;
adapter->vf = vf;
vdpa_dev = &adapter->vdpa;
device_features = vf->hw_features;
if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) {
if (config->device_features & ~device_features) {
IFCVF_ERR(pdev, "The provisioned features 0x%llx are not supported by this device with features 0x%llx\n",
config->device_features, device_features);
return -EINVAL;
}
device_features &= config->device_features;
}
vf->dev_features = device_features;
if (name)
ret = dev_set_name(&vdpa_dev->dev, "%s", name);
else
@ -781,7 +789,6 @@ static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
return 0;
}
static void ifcvf_vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev)
{
struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev;
@ -800,7 +807,6 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev;
struct device *dev = &pdev->dev;
struct ifcvf_adapter *adapter;
struct ifcvf_hw *vf;
u32 dev_type;
int ret, i;
@ -831,20 +837,16 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
}
pci_set_master(pdev);
adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
dev, &ifc_vdpa_ops, 1, 1, NULL, false);
if (IS_ERR(adapter)) {
IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
return PTR_ERR(adapter);
ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), GFP_KERNEL);
if (!ifcvf_mgmt_dev) {
IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management device\n");
return -ENOMEM;
}
vf = &adapter->vf;
vf = &ifcvf_mgmt_dev->vf;
vf->dev_type = get_dev_type(pdev);
vf->base = pcim_iomap_table(pdev);
adapter->pdev = pdev;
adapter->vdpa.dma_dev = &pdev->dev;
vf->pdev = pdev;
ret = ifcvf_init_hw(vf, pdev);
if (ret) {
@ -858,16 +860,6 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
vf->hw_features = ifcvf_get_hw_features(vf);
vf->config_size = ifcvf_get_config_size(vf);
ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), GFP_KERNEL);
if (!ifcvf_mgmt_dev) {
IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management device\n");
return -ENOMEM;
}
ifcvf_mgmt_dev->mdev.ops = &ifcvf_vdpa_mgmt_dev_ops;
ifcvf_mgmt_dev->mdev.device = dev;
ifcvf_mgmt_dev->adapter = adapter;
dev_type = get_dev_type(pdev);
switch (dev_type) {
case VIRTIO_ID_NET:
@ -882,11 +874,11 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
goto err;
}
ifcvf_mgmt_dev->mdev.ops = &ifcvf_vdpa_mgmt_dev_ops;
ifcvf_mgmt_dev->mdev.device = dev;
ifcvf_mgmt_dev->mdev.max_supported_vqs = vf->nr_vring;
ifcvf_mgmt_dev->mdev.supported_features = vf->hw_features;
adapter->vdpa.mdev = &ifcvf_mgmt_dev->mdev;
ifcvf_mgmt_dev->mdev.config_attr_mask = (1 << VDPA_ATTR_DEV_FEATURES);
ret = vdpa_mgmtdev_register(&ifcvf_mgmt_dev->mdev);
if (ret) {

View File

@ -1,4 +1,4 @@
subdir-ccflags-y += -I$(srctree)/drivers/vdpa/mlx5/core
obj-$(CONFIG_MLX5_VDPA_NET) += mlx5_vdpa.o
mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/mlx5_vnet.o core/resources.o core/mr.o
mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/mlx5_vnet.o core/resources.o core/mr.o net/debug.o

View File

@ -503,7 +503,6 @@ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
else
destroy_dma_mr(mvdev, mr);
memset(mr, 0, sizeof(*mr));
mr->initialized = false;
out:
mutex_unlock(&mr->mkey_mtx);

View File

@ -213,7 +213,7 @@ int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, u32 *mkey, u32 *in,
return err;
mkey_index = MLX5_GET(create_mkey_out, lout, mkey_index);
*mkey |= mlx5_idx_to_mkey(mkey_index);
*mkey = mlx5_idx_to_mkey(mkey_index);
return 0;
}
@ -233,6 +233,7 @@ static int init_ctrl_vq(struct mlx5_vdpa_dev *mvdev)
if (!mvdev->cvq.iotlb)
return -ENOMEM;
spin_lock_init(&mvdev->cvq.iommu_lock);
vringh_set_iotlb(&mvdev->cvq.vring, mvdev->cvq.iotlb, &mvdev->cvq.iommu_lock);
return 0;

View File

@ -0,0 +1,152 @@
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
#include <linux/debugfs.h>
#include <linux/mlx5/fs.h>
#include "mlx5_vnet.h"
static int tirn_show(struct seq_file *file, void *priv)
{
struct mlx5_vdpa_net *ndev = file->private;
seq_printf(file, "0x%x\n", ndev->res.tirn);
return 0;
}
DEFINE_SHOW_ATTRIBUTE(tirn);
void mlx5_vdpa_remove_tirn(struct mlx5_vdpa_net *ndev)
{
if (ndev->debugfs)
debugfs_remove(ndev->res.tirn_dent);
}
void mlx5_vdpa_add_tirn(struct mlx5_vdpa_net *ndev)
{
ndev->res.tirn_dent = debugfs_create_file("tirn", 0444, ndev->rx_dent,
ndev, &tirn_fops);
}
static int rx_flow_table_show(struct seq_file *file, void *priv)
{
struct mlx5_vdpa_net *ndev = file->private;
seq_printf(file, "0x%x\n", mlx5_flow_table_id(ndev->rxft));
return 0;
}
DEFINE_SHOW_ATTRIBUTE(rx_flow_table);
void mlx5_vdpa_remove_rx_flow_table(struct mlx5_vdpa_net *ndev)
{
if (ndev->debugfs)
debugfs_remove(ndev->rx_table_dent);
}
void mlx5_vdpa_add_rx_flow_table(struct mlx5_vdpa_net *ndev)
{
ndev->rx_table_dent = debugfs_create_file("table_id", 0444, ndev->rx_dent,
ndev, &rx_flow_table_fops);
}
#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG)
static int packets_show(struct seq_file *file, void *priv)
{
struct mlx5_vdpa_counter *counter = file->private;
u64 packets;
u64 bytes;
int err;
err = mlx5_fc_query(counter->mdev, counter->counter, &packets, &bytes);
if (err)
return err;
seq_printf(file, "0x%llx\n", packets);
return 0;
}
static int bytes_show(struct seq_file *file, void *priv)
{
struct mlx5_vdpa_counter *counter = file->private;
u64 packets;
u64 bytes;
int err;
err = mlx5_fc_query(counter->mdev, counter->counter, &packets, &bytes);
if (err)
return err;
seq_printf(file, "0x%llx\n", bytes);
return 0;
}
DEFINE_SHOW_ATTRIBUTE(packets);
DEFINE_SHOW_ATTRIBUTE(bytes);
static void add_counter_node(struct mlx5_vdpa_counter *counter,
struct dentry *parent)
{
debugfs_create_file("packets", 0444, parent, counter,
&packets_fops);
debugfs_create_file("bytes", 0444, parent, counter,
&bytes_fops);
}
void mlx5_vdpa_add_rx_counters(struct mlx5_vdpa_net *ndev,
struct macvlan_node *node)
{
static const char *ut = "untagged";
char vidstr[9];
u16 vid;
node->ucast_counter.mdev = ndev->mvdev.mdev;
node->mcast_counter.mdev = ndev->mvdev.mdev;
if (node->tagged) {
vid = key2vid(node->macvlan);
snprintf(vidstr, sizeof(vidstr), "0x%x", vid);
} else {
strcpy(vidstr, ut);
}
node->dent = debugfs_create_dir(vidstr, ndev->rx_dent);
if (IS_ERR(node->dent)) {
node->dent = NULL;
return;
}
node->ucast_counter.dent = debugfs_create_dir("ucast", node->dent);
if (IS_ERR(node->ucast_counter.dent))
return;
add_counter_node(&node->ucast_counter, node->ucast_counter.dent);
node->mcast_counter.dent = debugfs_create_dir("mcast", node->dent);
if (IS_ERR(node->mcast_counter.dent))
return;
add_counter_node(&node->mcast_counter, node->mcast_counter.dent);
}
void mlx5_vdpa_remove_rx_counters(struct mlx5_vdpa_net *ndev,
struct macvlan_node *node)
{
if (node->dent && ndev->debugfs)
debugfs_remove_recursive(node->dent);
}
#endif
void mlx5_vdpa_add_debugfs(struct mlx5_vdpa_net *ndev)
{
struct mlx5_core_dev *mdev;
mdev = ndev->mvdev.mdev;
ndev->debugfs = debugfs_create_dir(dev_name(&ndev->mvdev.vdev.dev),
mlx5_debugfs_get_dev_root(mdev));
if (!IS_ERR(ndev->debugfs))
ndev->rx_dent = debugfs_create_dir("rx", ndev->debugfs);
}
void mlx5_vdpa_remove_debugfs(struct dentry *dbg)
{
debugfs_remove_recursive(dbg);
}

View File

@ -18,15 +18,12 @@
#include <linux/mlx5/mlx5_ifc_vdpa.h>
#include <linux/mlx5/mpfs.h>
#include "mlx5_vdpa.h"
#include "mlx5_vnet.h"
MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
MODULE_DESCRIPTION("Mellanox VDPA driver");
MODULE_LICENSE("Dual BSD/GPL");
#define to_mlx5_vdpa_ndev(__mvdev) \
container_of(__mvdev, struct mlx5_vdpa_net, mvdev)
#define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev)
#define VALID_FEATURES_MASK \
(BIT_ULL(VIRTIO_NET_F_CSUM) | BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | \
BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) | BIT_ULL(VIRTIO_NET_F_MTU) | BIT_ULL(VIRTIO_NET_F_MAC) | \
@ -50,14 +47,6 @@ MODULE_LICENSE("Dual BSD/GPL");
#define MLX5V_UNTAGGED 0x1000
struct mlx5_vdpa_net_resources {
u32 tisn;
u32 tdn;
u32 tirn;
u32 rqtn;
bool valid;
};
struct mlx5_vdpa_cq_buf {
struct mlx5_frag_buf_ctrl fbc;
struct mlx5_frag_buf frag_buf;
@ -146,38 +135,6 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx)
return idx <= mvdev->max_idx;
}
#define MLX5V_MACVLAN_SIZE 256
struct mlx5_vdpa_net {
struct mlx5_vdpa_dev mvdev;
struct mlx5_vdpa_net_resources res;
struct virtio_net_config config;
struct mlx5_vdpa_virtqueue *vqs;
struct vdpa_callback *event_cbs;
/* Serialize vq resources creation and destruction. This is required
* since memory map might change and we need to destroy and create
* resources while driver in operational.
*/
struct rw_semaphore reslock;
struct mlx5_flow_table *rxft;
bool setup;
u32 cur_num_vqs;
u32 rqt_size;
bool nb_registered;
struct notifier_block nb;
struct vdpa_callback config_cb;
struct mlx5_vdpa_wq_ent cvq_ent;
struct hlist_head macvlan_hash[MLX5V_MACVLAN_SIZE];
};
struct macvlan_node {
struct hlist_node hlist;
struct mlx5_flow_handle *ucast_rule;
struct mlx5_flow_handle *mcast_rule;
u64 macvlan;
};
static void free_resources(struct mlx5_vdpa_net *ndev);
static void init_mvqs(struct mlx5_vdpa_net *ndev);
static int setup_driver(struct mlx5_vdpa_dev *mvdev);
@ -1431,36 +1388,85 @@ static int create_tir(struct mlx5_vdpa_net *ndev)
err = mlx5_vdpa_create_tir(&ndev->mvdev, in, &ndev->res.tirn);
kfree(in);
if (err)
return err;
mlx5_vdpa_add_tirn(ndev);
return err;
}
static void destroy_tir(struct mlx5_vdpa_net *ndev)
{
mlx5_vdpa_remove_tirn(ndev);
mlx5_vdpa_destroy_tir(&ndev->mvdev, ndev->res.tirn);
}
#define MAX_STEERING_ENT 0x8000
#define MAX_STEERING_GROUPS 2
static int mlx5_vdpa_add_mac_vlan_rules(struct mlx5_vdpa_net *ndev, u8 *mac,
u16 vid, bool tagged,
struct mlx5_flow_handle **ucast,
struct mlx5_flow_handle **mcast)
#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG)
#define NUM_DESTS 2
#else
#define NUM_DESTS 1
#endif
static int add_steering_counters(struct mlx5_vdpa_net *ndev,
struct macvlan_node *node,
struct mlx5_flow_act *flow_act,
struct mlx5_flow_destination *dests)
{
struct mlx5_flow_destination dest = {};
#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG)
int err;
node->ucast_counter.counter = mlx5_fc_create(ndev->mvdev.mdev, false);
if (IS_ERR(node->ucast_counter.counter))
return PTR_ERR(node->ucast_counter.counter);
node->mcast_counter.counter = mlx5_fc_create(ndev->mvdev.mdev, false);
if (IS_ERR(node->mcast_counter.counter)) {
err = PTR_ERR(node->mcast_counter.counter);
goto err_mcast_counter;
}
dests[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
flow_act->action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
return 0;
err_mcast_counter:
mlx5_fc_destroy(ndev->mvdev.mdev, node->ucast_counter.counter);
return err;
#else
return 0;
#endif
}
static void remove_steering_counters(struct mlx5_vdpa_net *ndev,
struct macvlan_node *node)
{
#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG)
mlx5_fc_destroy(ndev->mvdev.mdev, node->mcast_counter.counter);
mlx5_fc_destroy(ndev->mvdev.mdev, node->ucast_counter.counter);
#endif
}
static int mlx5_vdpa_add_mac_vlan_rules(struct mlx5_vdpa_net *ndev, u8 *mac,
struct macvlan_node *node)
{
struct mlx5_flow_destination dests[NUM_DESTS] = {};
struct mlx5_flow_act flow_act = {};
struct mlx5_flow_handle *rule;
struct mlx5_flow_spec *spec;
void *headers_c;
void *headers_v;
u8 *dmac_c;
u8 *dmac_v;
int err;
u16 vid;
spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
if (!spec)
return -ENOMEM;
vid = key2vid(node->macvlan);
spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, outer_headers);
headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, outer_headers);
@ -1472,44 +1478,58 @@ static int mlx5_vdpa_add_mac_vlan_rules(struct mlx5_vdpa_net *ndev, u8 *mac,
MLX5_SET(fte_match_set_lyr_2_4, headers_c, cvlan_tag, 1);
MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, first_vid);
}
if (tagged) {
if (node->tagged) {
MLX5_SET(fte_match_set_lyr_2_4, headers_v, cvlan_tag, 1);
MLX5_SET(fte_match_set_lyr_2_4, headers_v, first_vid, vid);
}
flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
dest.tir_num = ndev->res.tirn;
rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1);
if (IS_ERR(rule))
return PTR_ERR(rule);
dests[0].type = MLX5_FLOW_DESTINATION_TYPE_TIR;
dests[0].tir_num = ndev->res.tirn;
err = add_steering_counters(ndev, node, &flow_act, dests);
if (err)
goto out_free;
*ucast = rule;
#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG)
dests[1].counter_id = mlx5_fc_id(node->ucast_counter.counter);
#endif
node->ucast_rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, dests, NUM_DESTS);
if (IS_ERR(node->ucast_rule)) {
err = PTR_ERR(node->ucast_rule);
goto err_ucast;
}
#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG)
dests[1].counter_id = mlx5_fc_id(node->mcast_counter.counter);
#endif
memset(dmac_c, 0, ETH_ALEN);
memset(dmac_v, 0, ETH_ALEN);
dmac_c[0] = 1;
dmac_v[0] = 1;
rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1);
kvfree(spec);
if (IS_ERR(rule)) {
err = PTR_ERR(rule);
node->mcast_rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, dests, NUM_DESTS);
if (IS_ERR(node->mcast_rule)) {
err = PTR_ERR(node->mcast_rule);
goto err_mcast;
}
*mcast = rule;
kvfree(spec);
mlx5_vdpa_add_rx_counters(ndev, node);
return 0;
err_mcast:
mlx5_del_flow_rules(*ucast);
mlx5_del_flow_rules(node->ucast_rule);
err_ucast:
remove_steering_counters(ndev, node);
out_free:
kvfree(spec);
return err;
}
static void mlx5_vdpa_del_mac_vlan_rules(struct mlx5_vdpa_net *ndev,
struct mlx5_flow_handle *ucast,
struct mlx5_flow_handle *mcast)
struct macvlan_node *node)
{
mlx5_del_flow_rules(ucast);
mlx5_del_flow_rules(mcast);
mlx5_vdpa_remove_rx_counters(ndev, node);
mlx5_del_flow_rules(node->ucast_rule);
mlx5_del_flow_rules(node->mcast_rule);
}
static u64 search_val(u8 *mac, u16 vlan, bool tagged)
@ -1543,14 +1563,14 @@ static struct macvlan_node *mac_vlan_lookup(struct mlx5_vdpa_net *ndev, u64 valu
return NULL;
}
static int mac_vlan_add(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vlan, bool tagged) // vlan -> vid
static int mac_vlan_add(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vid, bool tagged)
{
struct macvlan_node *ptr;
u64 val;
u32 idx;
int err;
val = search_val(mac, vlan, tagged);
val = search_val(mac, vid, tagged);
if (mac_vlan_lookup(ndev, val))
return -EEXIST;
@ -1558,12 +1578,13 @@ static int mac_vlan_add(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vlan, bool tagg
if (!ptr)
return -ENOMEM;
err = mlx5_vdpa_add_mac_vlan_rules(ndev, ndev->config.mac, vlan, tagged,
&ptr->ucast_rule, &ptr->mcast_rule);
ptr->tagged = tagged;
ptr->macvlan = val;
ptr->ndev = ndev;
err = mlx5_vdpa_add_mac_vlan_rules(ndev, ndev->config.mac, ptr);
if (err)
goto err_add;
ptr->macvlan = val;
idx = hash_64(val, 8);
hlist_add_head(&ptr->hlist, &ndev->macvlan_hash[idx]);
return 0;
@ -1582,7 +1603,8 @@ static void mac_vlan_del(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vlan, bool tag
return;
hlist_del(&ptr->hlist);
mlx5_vdpa_del_mac_vlan_rules(ndev, ptr->ucast_rule, ptr->mcast_rule);
mlx5_vdpa_del_mac_vlan_rules(ndev, ptr);
remove_steering_counters(ndev, ptr);
kfree(ptr);
}
@ -1595,7 +1617,8 @@ static void clear_mac_vlan_table(struct mlx5_vdpa_net *ndev)
for (i = 0; i < MLX5V_MACVLAN_SIZE; i++) {
hlist_for_each_entry_safe(pos, n, &ndev->macvlan_hash[i], hlist) {
hlist_del(&pos->hlist);
mlx5_vdpa_del_mac_vlan_rules(ndev, pos->ucast_rule, pos->mcast_rule);
mlx5_vdpa_del_mac_vlan_rules(ndev, pos);
remove_steering_counters(ndev, pos);
kfree(pos);
}
}
@ -1621,6 +1644,7 @@ static int setup_steering(struct mlx5_vdpa_net *ndev)
mlx5_vdpa_warn(&ndev->mvdev, "failed to create flow table\n");
return PTR_ERR(ndev->rxft);
}
mlx5_vdpa_add_rx_flow_table(ndev);
err = mac_vlan_add(ndev, ndev->config.mac, 0, false);
if (err)
@ -1629,6 +1653,7 @@ static int setup_steering(struct mlx5_vdpa_net *ndev)
return 0;
err_add:
mlx5_vdpa_remove_rx_flow_table(ndev);
mlx5_destroy_flow_table(ndev->rxft);
return err;
}
@ -1636,6 +1661,7 @@ err_add:
static void teardown_steering(struct mlx5_vdpa_net *ndev)
{
clear_mac_vlan_table(ndev);
mlx5_vdpa_remove_rx_flow_table(ndev);
mlx5_destroy_flow_table(ndev->rxft);
}
@ -2183,6 +2209,7 @@ static u64 get_supported_features(struct mlx5_core_dev *mdev)
mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_STATUS);
mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_MTU);
mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_CTRL_VLAN);
mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_MAC);
return mlx_vdpa_features;
}
@ -2655,6 +2682,16 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, unsigned int asid,
return err;
}
static struct device *mlx5_get_vq_dma_dev(struct vdpa_device *vdev, u16 idx)
{
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
if (is_ctrl_vq_idx(mvdev, idx))
return &vdev->dev;
return mvdev->vdev.dma_dev;
}
static void mlx5_vdpa_free(struct vdpa_device *vdev)
{
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
@ -2870,6 +2907,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = {
.get_generation = mlx5_vdpa_get_generation,
.set_map = mlx5_vdpa_set_map,
.set_group_asid = mlx5_set_group_asid,
.get_vq_dma_dev = mlx5_get_vq_dma_dev,
.free = mlx5_vdpa_free,
.suspend = mlx5_vdpa_suspend,
};
@ -3009,6 +3047,8 @@ static int event_handler(struct notifier_block *nb, unsigned long event, void *p
struct mlx5_vdpa_wq_ent *wqent;
if (event == MLX5_EVENT_TYPE_PORT_CHANGE) {
if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_STATUS)))
return NOTIFY_DONE;
switch (eqe->sub_type) {
case MLX5_PORT_CHANGE_SUBTYPE_DOWN:
case MLX5_PORT_CHANGE_SUBTYPE_ACTIVE:
@ -3060,6 +3100,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
struct mlx5_vdpa_dev *mvdev;
struct mlx5_vdpa_net *ndev;
struct mlx5_core_dev *mdev;
u64 device_features;
u32 max_vqs;
u16 mtu;
int err;
@ -3068,6 +3109,24 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
return -ENOSPC;
mdev = mgtdev->madev->mdev;
device_features = mgtdev->mgtdev.supported_features;
if (add_config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) {
if (add_config->device_features & ~device_features) {
dev_warn(mdev->device,
"The provisioned features 0x%llx are not supported by this device with features 0x%llx\n",
add_config->device_features, device_features);
return -EINVAL;
}
device_features &= add_config->device_features;
}
if (!(device_features & BIT_ULL(VIRTIO_F_VERSION_1) &&
device_features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) {
dev_warn(mdev->device,
"Must provision minimum features 0x%llx for this device",
BIT_ULL(VIRTIO_F_VERSION_1) | BIT_ULL(VIRTIO_F_ACCESS_PLATFORM));
return -EOPNOTSUPP;
}
if (!(MLX5_CAP_DEV_VDPA_EMULATION(mdev, virtio_queue_type) &
MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_SPLIT)) {
dev_warn(mdev->device, "missing support for split virtqueues\n");
@ -3096,7 +3155,6 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
if (IS_ERR(ndev))
return PTR_ERR(ndev);
ndev->mvdev.mlx_features = mgtdev->mgtdev.supported_features;
ndev->mvdev.max_vqs = max_vqs;
mvdev = &ndev->mvdev;
mvdev->mdev = mdev;
@ -3118,20 +3176,26 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
goto err_alloc;
}
err = query_mtu(mdev, &mtu);
if (err)
goto err_alloc;
if (device_features & BIT_ULL(VIRTIO_NET_F_MTU)) {
err = query_mtu(mdev, &mtu);
if (err)
goto err_alloc;
ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, mtu);
ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, mtu);
}
if (get_link_state(mvdev))
ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
else
ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, ~VIRTIO_NET_S_LINK_UP);
if (device_features & BIT_ULL(VIRTIO_NET_F_STATUS)) {
if (get_link_state(mvdev))
ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
else
ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, ~VIRTIO_NET_S_LINK_UP);
}
if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR)) {
memcpy(ndev->config.mac, add_config->net.mac, ETH_ALEN);
} else {
/* No bother setting mac address in config if not going to provision _F_MAC */
} else if ((add_config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) == 0 ||
device_features & BIT_ULL(VIRTIO_NET_F_MAC)) {
err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
if (err)
goto err_alloc;
@ -3142,11 +3206,26 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
err = mlx5_mpfs_add_mac(pfmdev, config->mac);
if (err)
goto err_alloc;
ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_MAC);
} else if ((add_config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) == 0) {
/*
* We used to clear _F_MAC feature bit if seeing
* zero mac address when device features are not
* specifically provisioned. Keep the behaviour
* so old scripts do not break.
*/
device_features &= ~BIT_ULL(VIRTIO_NET_F_MAC);
} else if (device_features & BIT_ULL(VIRTIO_NET_F_MAC)) {
/* Don't provision zero mac address for _F_MAC */
mlx5_vdpa_warn(&ndev->mvdev,
"No mac address provisioned?\n");
err = -EINVAL;
goto err_alloc;
}
config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, max_vqs / 2);
if (device_features & BIT_ULL(VIRTIO_NET_F_MQ))
config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, max_vqs / 2);
ndev->mvdev.mlx_features = device_features;
mvdev->vdev.dma_dev = &mdev->pdev->dev;
err = mlx5_vdpa_alloc_resources(&ndev->mvdev);
if (err)
@ -3178,6 +3257,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
if (err)
goto err_reg;
mlx5_vdpa_add_debugfs(ndev);
mgtdev->ndev = ndev;
return 0;
@ -3204,6 +3284,8 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
struct workqueue_struct *wq;
mlx5_vdpa_remove_debugfs(ndev->debugfs);
ndev->debugfs = NULL;
if (ndev->nb_registered) {
ndev->nb_registered = false;
mlx5_notifier_unregister(mvdev->mdev, &ndev->nb);
@ -3243,7 +3325,8 @@ static int mlx5v_probe(struct auxiliary_device *adev,
mgtdev->mgtdev.id_table = id_table;
mgtdev->mgtdev.config_attr_mask = BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR) |
BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP) |
BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU);
BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) |
BIT_ULL(VDPA_ATTR_DEV_FEATURES);
mgtdev->mgtdev.max_supported_vqs =
MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues) + 1;
mgtdev->mgtdev.supported_features = get_supported_features(mdev);

View File

@ -0,0 +1,94 @@
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
#ifndef __MLX5_VNET_H__
#define __MLX5_VNET_H__
#include "mlx5_vdpa.h"
#define to_mlx5_vdpa_ndev(__mvdev) \
container_of(__mvdev, struct mlx5_vdpa_net, mvdev)
#define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev)
struct mlx5_vdpa_net_resources {
u32 tisn;
u32 tdn;
u32 tirn;
u32 rqtn;
bool valid;
struct dentry *tirn_dent;
};
#define MLX5V_MACVLAN_SIZE 256
static inline u16 key2vid(u64 key)
{
return (u16)(key >> 48) & 0xfff;
}
struct mlx5_vdpa_net {
struct mlx5_vdpa_dev mvdev;
struct mlx5_vdpa_net_resources res;
struct virtio_net_config config;
struct mlx5_vdpa_virtqueue *vqs;
struct vdpa_callback *event_cbs;
/* Serialize vq resources creation and destruction. This is required
* since memory map might change and we need to destroy and create
* resources while driver in operational.
*/
struct rw_semaphore reslock;
struct mlx5_flow_table *rxft;
struct dentry *rx_dent;
struct dentry *rx_table_dent;
bool setup;
u32 cur_num_vqs;
u32 rqt_size;
bool nb_registered;
struct notifier_block nb;
struct vdpa_callback config_cb;
struct mlx5_vdpa_wq_ent cvq_ent;
struct hlist_head macvlan_hash[MLX5V_MACVLAN_SIZE];
struct dentry *debugfs;
};
struct mlx5_vdpa_counter {
struct mlx5_fc *counter;
struct dentry *dent;
struct mlx5_core_dev *mdev;
};
struct macvlan_node {
struct hlist_node hlist;
struct mlx5_flow_handle *ucast_rule;
struct mlx5_flow_handle *mcast_rule;
u64 macvlan;
struct mlx5_vdpa_net *ndev;
bool tagged;
#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG)
struct dentry *dent;
struct mlx5_vdpa_counter ucast_counter;
struct mlx5_vdpa_counter mcast_counter;
#endif
};
void mlx5_vdpa_add_debugfs(struct mlx5_vdpa_net *ndev);
void mlx5_vdpa_remove_debugfs(struct dentry *dbg);
void mlx5_vdpa_add_rx_flow_table(struct mlx5_vdpa_net *ndev);
void mlx5_vdpa_remove_rx_flow_table(struct mlx5_vdpa_net *ndev);
void mlx5_vdpa_add_tirn(struct mlx5_vdpa_net *ndev);
void mlx5_vdpa_remove_tirn(struct mlx5_vdpa_net *ndev);
#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG)
void mlx5_vdpa_add_rx_counters(struct mlx5_vdpa_net *ndev,
struct macvlan_node *node);
void mlx5_vdpa_remove_rx_counters(struct mlx5_vdpa_net *ndev,
struct macvlan_node *node);
#else
static inline void mlx5_vdpa_add_rx_counters(struct mlx5_vdpa_net *ndev,
struct macvlan_node *node) {}
static inline void mlx5_vdpa_remove_rx_counters(struct mlx5_vdpa_net *ndev,
struct macvlan_node *node) {}
#endif
#endif /* __MLX5_VNET_H__ */

View File

@ -0,0 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_SNET_VDPA) += snet_vdpa.o
snet_vdpa-$(CONFIG_SNET_VDPA) += snet_main.o
ifdef CONFIG_HWMON
snet_vdpa-$(CONFIG_SNET_VDPA) += snet_hwmon.o
endif

View File

@ -0,0 +1,188 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* SolidRun DPU driver for control plane
*
* Copyright (C) 2022 SolidRun
*
* Author: Alvaro Karsz <alvaro.karsz@solid-run.com>
*
*/
#include <linux/hwmon.h>
#include "snet_vdpa.h"
/* Monitor offsets */
#define SNET_MON_TMP0_IN_OFF 0x00
#define SNET_MON_TMP0_MAX_OFF 0x08
#define SNET_MON_TMP0_CRIT_OFF 0x10
#define SNET_MON_TMP1_IN_OFF 0x18
#define SNET_MON_TMP1_CRIT_OFF 0x20
#define SNET_MON_CURR_IN_OFF 0x28
#define SNET_MON_CURR_MAX_OFF 0x30
#define SNET_MON_CURR_CRIT_OFF 0x38
#define SNET_MON_PWR_IN_OFF 0x40
#define SNET_MON_VOLT_IN_OFF 0x48
#define SNET_MON_VOLT_CRIT_OFF 0x50
#define SNET_MON_VOLT_LCRIT_OFF 0x58
static void snet_hwmon_read_reg(struct psnet *psnet, u32 reg, long *out)
{
*out = psnet_read64(psnet, psnet->cfg.hwmon_off + reg);
}
static umode_t snet_howmon_is_visible(const void *data,
enum hwmon_sensor_types type,
u32 attr, int channel)
{
return 0444;
}
static int snet_howmon_read(struct device *dev, enum hwmon_sensor_types type,
u32 attr, int channel, long *val)
{
struct psnet *psnet = dev_get_drvdata(dev);
int ret = 0;
switch (type) {
case hwmon_in:
switch (attr) {
case hwmon_in_lcrit:
snet_hwmon_read_reg(psnet, SNET_MON_VOLT_LCRIT_OFF, val);
break;
case hwmon_in_crit:
snet_hwmon_read_reg(psnet, SNET_MON_VOLT_CRIT_OFF, val);
break;
case hwmon_in_input:
snet_hwmon_read_reg(psnet, SNET_MON_VOLT_IN_OFF, val);
break;
default:
ret = -EOPNOTSUPP;
break;
}
break;
case hwmon_power:
switch (attr) {
case hwmon_power_input:
snet_hwmon_read_reg(psnet, SNET_MON_PWR_IN_OFF, val);
break;
default:
ret = -EOPNOTSUPP;
break;
}
break;
case hwmon_curr:
switch (attr) {
case hwmon_curr_input:
snet_hwmon_read_reg(psnet, SNET_MON_CURR_IN_OFF, val);
break;
case hwmon_curr_max:
snet_hwmon_read_reg(psnet, SNET_MON_CURR_MAX_OFF, val);
break;
case hwmon_curr_crit:
snet_hwmon_read_reg(psnet, SNET_MON_CURR_CRIT_OFF, val);
break;
default:
ret = -EOPNOTSUPP;
break;
}
break;
case hwmon_temp:
switch (attr) {
case hwmon_temp_input:
if (channel == 0)
snet_hwmon_read_reg(psnet, SNET_MON_TMP0_IN_OFF, val);
else
snet_hwmon_read_reg(psnet, SNET_MON_TMP1_IN_OFF, val);
break;
case hwmon_temp_max:
if (channel == 0)
snet_hwmon_read_reg(psnet, SNET_MON_TMP0_MAX_OFF, val);
else
ret = -EOPNOTSUPP;
break;
case hwmon_temp_crit:
if (channel == 0)
snet_hwmon_read_reg(psnet, SNET_MON_TMP0_CRIT_OFF, val);
else
snet_hwmon_read_reg(psnet, SNET_MON_TMP1_CRIT_OFF, val);
break;
default:
ret = -EOPNOTSUPP;
break;
}
break;
default:
ret = -EOPNOTSUPP;
break;
}
return ret;
}
static int snet_hwmon_read_string(struct device *dev,
enum hwmon_sensor_types type, u32 attr,
int channel, const char **str)
{
int ret = 0;
switch (type) {
case hwmon_in:
*str = "main_vin";
break;
case hwmon_power:
*str = "soc_pin";
break;
case hwmon_curr:
*str = "soc_iin";
break;
case hwmon_temp:
if (channel == 0)
*str = "power_stage_temp";
else
*str = "ic_junction_temp";
break;
default:
ret = -EOPNOTSUPP;
break;
}
return ret;
}
static const struct hwmon_ops snet_hwmon_ops = {
.is_visible = snet_howmon_is_visible,
.read = snet_howmon_read,
.read_string = snet_hwmon_read_string
};
static const struct hwmon_channel_info *snet_hwmon_info[] = {
HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_LABEL,
HWMON_T_INPUT | HWMON_T_CRIT | HWMON_T_LABEL),
HWMON_CHANNEL_INFO(power, HWMON_P_INPUT | HWMON_P_LABEL),
HWMON_CHANNEL_INFO(curr, HWMON_C_INPUT | HWMON_C_MAX | HWMON_C_CRIT | HWMON_C_LABEL),
HWMON_CHANNEL_INFO(in, HWMON_I_INPUT | HWMON_I_CRIT | HWMON_I_LCRIT | HWMON_I_LABEL),
NULL
};
static const struct hwmon_chip_info snet_hwmono_info = {
.ops = &snet_hwmon_ops,
.info = snet_hwmon_info,
};
/* Create an HW monitor device */
void psnet_create_hwmon(struct pci_dev *pdev)
{
struct device *hwmon;
struct psnet *psnet = pci_get_drvdata(pdev);
snprintf(psnet->hwmon_name, SNET_NAME_SIZE, "snet_%s", pci_name(pdev));
hwmon = devm_hwmon_device_register_with_info(&pdev->dev, psnet->hwmon_name, psnet,
&snet_hwmono_info, NULL);
/* The monitor is not mandatory, Just alert user in case of an error */
if (IS_ERR(hwmon))
SNET_WARN(pdev, "Failed to create SNET hwmon, error %ld\n", PTR_ERR(hwmon));
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,194 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* SolidRun DPU driver for control plane
*
* Copyright (C) 2022 SolidRun
*
* Author: Alvaro Karsz <alvaro.karsz@solid-run.com>
*
*/
#ifndef _SNET_VDPA_H_
#define _SNET_VDPA_H_
#include <linux/vdpa.h>
#include <linux/pci.h>
#define SNET_NAME_SIZE 256
#define SNET_ERR(pdev, fmt, ...) dev_err(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__)
#define SNET_WARN(pdev, fmt, ...) dev_warn(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__)
#define SNET_INFO(pdev, fmt, ...) dev_info(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__)
#define SNET_DBG(pdev, fmt, ...) dev_dbg(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__)
#define SNET_HAS_FEATURE(s, f) ((s)->negotiated_features & BIT_ULL(f))
/* VQ struct */
struct snet_vq {
/* VQ callback */
struct vdpa_callback cb;
/* desc base address */
u64 desc_area;
/* device base address */
u64 device_area;
/* driver base address */
u64 driver_area;
/* Queue size */
u32 num;
/* Serial ID for VQ */
u32 sid;
/* is ready flag */
bool ready;
/* IRQ number */
u32 irq;
/* IRQ index, DPU uses this to parse data from MSI-X table */
u32 irq_idx;
/* IRQ name */
char irq_name[SNET_NAME_SIZE];
/* pointer to mapped PCI BAR register used by this VQ to kick */
void __iomem *kick_ptr;
};
struct snet {
/* vdpa device */
struct vdpa_device vdpa;
/* Config callback */
struct vdpa_callback cb;
/* array of virqueues */
struct snet_vq **vqs;
/* Used features */
u64 negotiated_features;
/* Device serial ID */
u32 sid;
/* device status */
u8 status;
/* boolean indicating if snet config was passed to the device */
bool dpu_ready;
/* IRQ number */
u32 cfg_irq;
/* IRQ index, DPU uses this to parse data from MSI-X table */
u32 cfg_irq_idx;
/* IRQ name */
char cfg_irq_name[SNET_NAME_SIZE];
/* BAR to access the VF */
void __iomem *bar;
/* PCI device */
struct pci_dev *pdev;
/* Pointer to snet pdev parent device */
struct psnet *psnet;
/* Pointer to snet config device */
struct snet_dev_cfg *cfg;
};
struct snet_dev_cfg {
/* Device ID following VirtIO spec. */
u32 virtio_id;
/* Number of VQs for this device */
u32 vq_num;
/* Size of every VQ */
u32 vq_size;
/* Virtual Function id */
u32 vfid;
/* Device features, following VirtIO spec */
u64 features;
/* Reserved for future usage */
u32 rsvd[6];
/* VirtIO device specific config size */
u32 cfg_size;
/* VirtIO device specific config address */
void __iomem *virtio_cfg;
} __packed;
struct snet_cfg {
/* Magic key */
u32 key;
/* Size of total config in bytes */
u32 cfg_size;
/* Config version */
u32 cfg_ver;
/* Number of Virtual Functions to create */
u32 vf_num;
/* BAR to use for the VFs */
u32 vf_bar;
/* Where should we write the SNET's config */
u32 host_cfg_off;
/* Max. allowed size for a SNET's config */
u32 max_size_host_cfg;
/* VirtIO config offset in BAR */
u32 virtio_cfg_off;
/* Offset in PCI BAR for VQ kicks */
u32 kick_off;
/* Offset in PCI BAR for HW monitoring */
u32 hwmon_off;
/* Offset in PCI BAR for SNET messages */
u32 msg_off;
/* Config general flags - enum snet_cfg_flags */
u32 flags;
/* Reserved for future usage */
u32 rsvd[6];
/* Number of snet devices */
u32 devices_num;
/* The actual devices */
struct snet_dev_cfg **devs;
} __packed;
/* SolidNET PCIe device, one device per PCIe physical function */
struct psnet {
/* PCI BARs */
void __iomem *bars[PCI_STD_NUM_BARS];
/* Negotiated config version */
u32 negotiated_cfg_ver;
/* Next IRQ index to use in case when the IRQs are allocated from this device */
u32 next_irq;
/* BAR number used to communicate with the device */
u8 barno;
/* spinlock to protect data that can be changed by SNET devices */
spinlock_t lock;
/* Pointer to the device's config read from BAR */
struct snet_cfg cfg;
/* Name of monitor device */
char hwmon_name[SNET_NAME_SIZE];
};
enum snet_cfg_flags {
/* Create a HWMON device */
SNET_CFG_FLAG_HWMON = BIT(0),
/* USE IRQs from the physical function */
SNET_CFG_FLAG_IRQ_PF = BIT(1),
};
#define PSNET_FLAG_ON(p, f) ((p)->cfg.flags & (f))
static inline u32 psnet_read32(struct psnet *psnet, u32 off)
{
return ioread32(psnet->bars[psnet->barno] + off);
}
static inline u32 snet_read32(struct snet *snet, u32 off)
{
return ioread32(snet->bar + off);
}
static inline void snet_write32(struct snet *snet, u32 off, u32 val)
{
iowrite32(val, snet->bar + off);
}
static inline u64 psnet_read64(struct psnet *psnet, u32 off)
{
u64 val;
/* 64bits are written in 2 halves, low part first */
val = (u64)psnet_read32(psnet, off);
val |= ((u64)psnet_read32(psnet, off + 4) << 32);
return val;
}
static inline void snet_write64(struct snet *snet, u32 off, u64 val)
{
/* The DPU expects a 64bit integer in 2 halves, the low part first */
snet_write32(snet, off, (u32)val);
snet_write32(snet, off + 4, (u32)(val >> 32));
}
#if IS_ENABLED(CONFIG_HWMON)
void psnet_create_hwmon(struct pci_dev *pdev);
#endif
#endif //_SNET_VDPA_H_

View File

@ -39,6 +39,11 @@ static int vdpa_dev_probe(struct device *d)
u32 max_num, min_num = 1;
int ret = 0;
d->dma_mask = &d->coherent_dma_mask;
ret = dma_set_mask_and_coherent(d, DMA_BIT_MASK(64));
if (ret)
return ret;
max_num = ops->get_vq_num_max(vdev);
if (ops->get_vq_num_min)
min_num = ops->get_vq_num_min(vdev);
@ -460,12 +465,28 @@ static int vdpa_nl_mgmtdev_handle_fill(struct sk_buff *msg, const struct vdpa_mg
return 0;
}
static u64 vdpa_mgmtdev_get_classes(const struct vdpa_mgmt_dev *mdev,
unsigned int *nclasses)
{
u64 supported_classes = 0;
unsigned int n = 0;
for (int i = 0; mdev->id_table[i].device; i++) {
if (mdev->id_table[i].device > 63)
continue;
supported_classes |= BIT_ULL(mdev->id_table[i].device);
n++;
}
if (nclasses)
*nclasses = n;
return supported_classes;
}
static int vdpa_mgmtdev_fill(const struct vdpa_mgmt_dev *mdev, struct sk_buff *msg,
u32 portid, u32 seq, int flags)
{
u64 supported_classes = 0;
void *hdr;
int i = 0;
int err;
hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, flags, VDPA_CMD_MGMTDEV_NEW);
@ -475,14 +496,9 @@ static int vdpa_mgmtdev_fill(const struct vdpa_mgmt_dev *mdev, struct sk_buff *m
if (err)
goto msg_err;
while (mdev->id_table[i].device) {
if (mdev->id_table[i].device <= 63)
supported_classes |= BIT_ULL(mdev->id_table[i].device);
i++;
}
if (nla_put_u64_64bit(msg, VDPA_ATTR_MGMTDEV_SUPPORTED_CLASSES,
supported_classes, VDPA_ATTR_UNSPEC)) {
vdpa_mgmtdev_get_classes(mdev, NULL),
VDPA_ATTR_UNSPEC)) {
err = -EMSGSIZE;
goto msg_err;
}
@ -566,13 +582,25 @@ out:
BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) | \
BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP))
/*
* Bitmask for all per-device features: feature bits VIRTIO_TRANSPORT_F_START
* through VIRTIO_TRANSPORT_F_END are unset, i.e. 0xfffffc000fffffff for
* all 64bit features. If the features are extended beyond 64 bits, or new
* "holes" are reserved for other type of features than per-device, this
* macro would have to be updated.
*/
#define VIRTIO_DEVICE_F_MASK (~0ULL << (VIRTIO_TRANSPORT_F_END + 1) | \
((1ULL << VIRTIO_TRANSPORT_F_START) - 1))
static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *info)
{
struct vdpa_dev_set_config config = {};
struct nlattr **nl_attrs = info->attrs;
struct vdpa_mgmt_dev *mdev;
unsigned int ncls = 0;
const u8 *macaddr;
const char *name;
u64 classes;
int err = 0;
if (!info->attrs[VDPA_ATTR_DEV_NAME])
@ -601,8 +629,26 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i
config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
}
if (nl_attrs[VDPA_ATTR_DEV_FEATURES]) {
u64 missing = 0x0ULL;
config.device_features =
nla_get_u64(nl_attrs[VDPA_ATTR_DEV_FEATURES]);
if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MACADDR] &&
!(config.device_features & BIT_ULL(VIRTIO_NET_F_MAC)))
missing |= BIT_ULL(VIRTIO_NET_F_MAC);
if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU] &&
!(config.device_features & BIT_ULL(VIRTIO_NET_F_MTU)))
missing |= BIT_ULL(VIRTIO_NET_F_MTU);
if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP] &&
config.net.max_vq_pairs > 1 &&
!(config.device_features & BIT_ULL(VIRTIO_NET_F_MQ)))
missing |= BIT_ULL(VIRTIO_NET_F_MQ);
if (missing) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"Missing features 0x%llx for provided attributes",
missing);
return -EINVAL;
}
config.mask |= BIT_ULL(VDPA_ATTR_DEV_FEATURES);
}
@ -622,13 +668,33 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i
err = PTR_ERR(mdev);
goto err;
}
if ((config.mask & mdev->config_attr_mask) != config.mask) {
NL_SET_ERR_MSG_MOD(info->extack,
"All provided attributes are not supported");
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"Some provided attributes are not supported: 0x%llx",
config.mask & ~mdev->config_attr_mask);
err = -EOPNOTSUPP;
goto err;
}
classes = vdpa_mgmtdev_get_classes(mdev, &ncls);
if (config.mask & VDPA_DEV_NET_ATTRS_MASK &&
!(classes & BIT_ULL(VIRTIO_ID_NET))) {
NL_SET_ERR_MSG_MOD(info->extack,
"Network class attributes provided on unsupported management device");
err = -EINVAL;
goto err;
}
if (!(config.mask & VDPA_DEV_NET_ATTRS_MASK) &&
config.mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES) &&
classes & BIT_ULL(VIRTIO_ID_NET) && ncls > 1 &&
config.device_features & VIRTIO_DEVICE_F_MASK) {
NL_SET_ERR_MSG_MOD(info->extack,
"Management device supports multi-class while device features specified are ambiguous");
err = -EINVAL;
goto err;
}
err = mdev->ops->dev_add(mdev, name, &config);
err:
up_write(&vdpa_dev_lock);
@ -841,18 +907,25 @@ static int vdpa_dev_net_mac_config_fill(struct sk_buff *msg, u64 features,
sizeof(config->mac), config->mac);
}
static int vdpa_dev_net_status_config_fill(struct sk_buff *msg, u64 features,
const struct virtio_net_config *config)
{
u16 val_u16;
if ((features & BIT_ULL(VIRTIO_NET_F_STATUS)) == 0)
return 0;
val_u16 = __virtio16_to_cpu(true, config->status);
return nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16);
}
static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *msg)
{
struct virtio_net_config config = {};
u64 features_device;
u16 val_u16;
vdev->config->get_config(vdev, 0, &config, sizeof(config));
val_u16 = __virtio16_to_cpu(true, config.status);
if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
return -EMSGSIZE;
features_device = vdev->config->get_device_features(vdev);
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES, features_device,
@ -865,6 +938,9 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms
if (vdpa_dev_net_mac_config_fill(msg, features_device, &config))
return -EMSGSIZE;
if (vdpa_dev_net_status_config_fill(msg, features_device, &config))
return -EMSGSIZE;
return vdpa_dev_net_mq_config_fill(msg, features_device, &config);
}
@ -1011,7 +1087,7 @@ static int vdpa_dev_vendor_stats_fill(struct vdpa_device *vdev,
switch (device_id) {
case VIRTIO_ID_NET:
if (index > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX) {
NL_SET_ERR_MSG_MOD(info->extack, "queue index excceeds max value");
NL_SET_ERR_MSG_MOD(info->extack, "queue index exceeds max value");
err = -ERANGE;
break;
}

View File

@ -17,7 +17,6 @@
#include <linux/vringh.h>
#include <linux/vdpa.h>
#include <linux/vhost_iotlb.h>
#include <linux/iova.h>
#include <uapi/linux/vdpa.h>
#include "vdpa_sim.h"
@ -45,13 +44,6 @@ static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
return container_of(vdpa, struct vdpasim, vdpa);
}
static struct vdpasim *dev_to_sim(struct device *dev)
{
struct vdpa_device *vdpa = dev_to_vdpa(dev);
return vdpa_to_sim(vdpa);
}
static void vdpasim_vq_notify(struct vringh *vring)
{
struct vdpasim_virtqueue *vq =
@ -66,14 +58,16 @@ static void vdpasim_vq_notify(struct vringh *vring)
static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
{
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
uint16_t last_avail_idx = vq->vring.last_avail_idx;
vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, false,
vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true,
(struct vring_desc *)(uintptr_t)vq->desc_addr,
(struct vring_avail *)
(uintptr_t)vq->driver_addr,
(struct vring_used *)
(uintptr_t)vq->device_addr);
vq->vring.last_avail_idx = last_avail_idx;
vq->vring.notify = vdpasim_vq_notify;
}
@ -104,8 +98,12 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
&vdpasim->iommu_lock);
}
for (i = 0; i < vdpasim->dev_attr.nas; i++)
for (i = 0; i < vdpasim->dev_attr.nas; i++) {
vhost_iotlb_reset(&vdpasim->iommu[i]);
vhost_iotlb_add_range(&vdpasim->iommu[i], 0, ULONG_MAX,
0, VHOST_MAP_RW);
vdpasim->iommu_pt[i] = true;
}
vdpasim->running = true;
spin_unlock(&vdpasim->iommu_lock);
@ -115,133 +113,6 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
++vdpasim->generation;
}
static int dir_to_perm(enum dma_data_direction dir)
{
int perm = -EFAULT;
switch (dir) {
case DMA_FROM_DEVICE:
perm = VHOST_MAP_WO;
break;
case DMA_TO_DEVICE:
perm = VHOST_MAP_RO;
break;
case DMA_BIDIRECTIONAL:
perm = VHOST_MAP_RW;
break;
default:
break;
}
return perm;
}
static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr,
size_t size, unsigned int perm)
{
struct iova *iova;
dma_addr_t dma_addr;
int ret;
/* We set the limit_pfn to the maximum (ULONG_MAX - 1) */
iova = alloc_iova(&vdpasim->iova, size >> iova_shift(&vdpasim->iova),
ULONG_MAX - 1, true);
if (!iova)
return DMA_MAPPING_ERROR;
dma_addr = iova_dma_addr(&vdpasim->iova, iova);
spin_lock(&vdpasim->iommu_lock);
ret = vhost_iotlb_add_range(&vdpasim->iommu[0], (u64)dma_addr,
(u64)dma_addr + size - 1, (u64)paddr, perm);
spin_unlock(&vdpasim->iommu_lock);
if (ret) {
__free_iova(&vdpasim->iova, iova);
return DMA_MAPPING_ERROR;
}
return dma_addr;
}
static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr,
size_t size)
{
spin_lock(&vdpasim->iommu_lock);
vhost_iotlb_del_range(&vdpasim->iommu[0], (u64)dma_addr,
(u64)dma_addr + size - 1);
spin_unlock(&vdpasim->iommu_lock);
free_iova(&vdpasim->iova, iova_pfn(&vdpasim->iova, dma_addr));
}
static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size,
enum dma_data_direction dir,
unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
phys_addr_t paddr = page_to_phys(page) + offset;
int perm = dir_to_perm(dir);
if (perm < 0)
return DMA_MAPPING_ERROR;
return vdpasim_map_range(vdpasim, paddr, size, perm);
}
static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr,
size_t size, enum dma_data_direction dir,
unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
vdpasim_unmap_range(vdpasim, dma_addr, size);
}
static void *vdpasim_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_addr, gfp_t flag,
unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
phys_addr_t paddr;
void *addr;
addr = kmalloc(size, flag);
if (!addr) {
*dma_addr = DMA_MAPPING_ERROR;
return NULL;
}
paddr = virt_to_phys(addr);
*dma_addr = vdpasim_map_range(vdpasim, paddr, size, VHOST_MAP_RW);
if (*dma_addr == DMA_MAPPING_ERROR) {
kfree(addr);
return NULL;
}
return addr;
}
static void vdpasim_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_addr,
unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
vdpasim_unmap_range(vdpasim, dma_addr, size);
kfree(vaddr);
}
static const struct dma_map_ops vdpasim_dma_ops = {
.map_page = vdpasim_map_page,
.unmap_page = vdpasim_unmap_page,
.alloc = vdpasim_alloc_coherent,
.free = vdpasim_free_coherent,
};
static const struct vdpa_config_ops vdpasim_config_ops;
static const struct vdpa_config_ops vdpasim_batch_config_ops;
@ -249,10 +120,14 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
const struct vdpa_dev_set_config *config)
{
const struct vdpa_config_ops *ops;
struct vdpa_device *vdpa;
struct vdpasim *vdpasim;
struct device *dev;
int i, ret = -ENOMEM;
if (!dev_attr->alloc_size)
return ERR_PTR(-EINVAL);
if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) {
if (config->device_features &
~dev_attr->supported_features)
@ -266,14 +141,16 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
else
ops = &vdpasim_config_ops;
vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops,
dev_attr->ngroups, dev_attr->nas,
dev_attr->name, false);
if (IS_ERR(vdpasim)) {
ret = PTR_ERR(vdpasim);
vdpa = __vdpa_alloc_device(NULL, ops,
dev_attr->ngroups, dev_attr->nas,
dev_attr->alloc_size,
dev_attr->name, false);
if (IS_ERR(vdpa)) {
ret = PTR_ERR(vdpa);
goto err_alloc;
}
vdpasim = vdpa_to_sim(vdpa);
vdpasim->dev_attr = *dev_attr;
INIT_WORK(&vdpasim->work, dev_attr->work_fn);
spin_lock_init(&vdpasim->lock);
@ -283,7 +160,6 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
dev->dma_mask = &dev->coherent_dma_mask;
if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
goto err_iommu;
set_dma_ops(dev, &vdpasim_dma_ops);
vdpasim->vdpa.mdev = dev_attr->mgmt_dev;
vdpasim->config = kzalloc(dev_attr->config_size, GFP_KERNEL);
@ -300,6 +176,11 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
if (!vdpasim->iommu)
goto err_iommu;
vdpasim->iommu_pt = kmalloc_array(vdpasim->dev_attr.nas,
sizeof(*vdpasim->iommu_pt), GFP_KERNEL);
if (!vdpasim->iommu_pt)
goto err_iommu;
for (i = 0; i < vdpasim->dev_attr.nas; i++)
vhost_iotlb_init(&vdpasim->iommu[i], max_iotlb_entries, 0);
@ -311,13 +192,6 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0],
&vdpasim->iommu_lock);
ret = iova_cache_get();
if (ret)
goto err_iommu;
/* For simplicity we use an IOVA allocator with byte granularity */
init_iova_domain(&vdpasim->iova, 1, 0);
vdpasim->vdpa.dma_dev = dev;
return vdpasim;
@ -356,6 +230,12 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
if (!vdpasim->running &&
(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
vdpasim->pending_kick = true;
return;
}
if (vq->ready)
schedule_work(&vdpasim->work);
}
@ -418,6 +298,18 @@ static int vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
return 0;
}
static int vdpasim_get_vq_stats(struct vdpa_device *vdpa, u16 idx,
struct sk_buff *msg,
struct netlink_ext_ack *extack)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
if (vdpasim->dev_attr.get_stats)
return vdpasim->dev_attr.get_stats(vdpasim, idx,
msg, extack);
return -EOPNOTSUPP;
}
static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa)
{
return VDPASIM_QUEUE_ALIGN;
@ -526,6 +418,27 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
return 0;
}
static int vdpasim_resume(struct vdpa_device *vdpa)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int i;
spin_lock(&vdpasim->lock);
vdpasim->running = true;
if (vdpasim->pending_kick) {
/* Process pending descriptors */
for (i = 0; i < vdpasim->dev_attr.nvqs; ++i)
vdpasim_kick_vq(vdpa, i);
vdpasim->pending_kick = false;
}
spin_unlock(&vdpasim->lock);
return 0;
}
static size_t vdpasim_get_config_size(struct vdpa_device *vdpa)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
@ -621,6 +534,7 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,
iommu = &vdpasim->iommu[asid];
vhost_iotlb_reset(iommu);
vdpasim->iommu_pt[asid] = false;
for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
map = vhost_iotlb_itree_next(map, start, last)) {
@ -649,6 +563,10 @@ static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,
return -EINVAL;
spin_lock(&vdpasim->iommu_lock);
if (vdpasim->iommu_pt[asid]) {
vhost_iotlb_reset(&vdpasim->iommu[asid]);
vdpasim->iommu_pt[asid] = false;
}
ret = vhost_iotlb_add_range_ctx(&vdpasim->iommu[asid], iova,
iova + size - 1, pa, perm, opaque);
spin_unlock(&vdpasim->iommu_lock);
@ -664,6 +582,11 @@ static int vdpasim_dma_unmap(struct vdpa_device *vdpa, unsigned int asid,
if (asid >= vdpasim->dev_attr.nas)
return -EINVAL;
if (vdpasim->iommu_pt[asid]) {
vhost_iotlb_reset(&vdpasim->iommu[asid]);
vdpasim->iommu_pt[asid] = false;
}
spin_lock(&vdpasim->iommu_lock);
vhost_iotlb_del_range(&vdpasim->iommu[asid], iova, iova + size - 1);
spin_unlock(&vdpasim->iommu_lock);
@ -683,15 +606,11 @@ static void vdpasim_free(struct vdpa_device *vdpa)
vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov);
}
if (vdpa_get_dma_dev(vdpa)) {
put_iova_domain(&vdpasim->iova);
iova_cache_put();
}
kvfree(vdpasim->buffer);
for (i = 0; i < vdpasim->dev_attr.nas; i++)
vhost_iotlb_reset(&vdpasim->iommu[i]);
kfree(vdpasim->iommu);
kfree(vdpasim->iommu_pt);
kfree(vdpasim->vqs);
kfree(vdpasim->config);
}
@ -704,6 +623,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
.set_vq_ready = vdpasim_set_vq_ready,
.get_vq_ready = vdpasim_get_vq_ready,
.set_vq_state = vdpasim_set_vq_state,
.get_vendor_vq_stats = vdpasim_get_vq_stats,
.get_vq_state = vdpasim_get_vq_state,
.get_vq_align = vdpasim_get_vq_align,
.get_vq_group = vdpasim_get_vq_group,
@ -718,6 +638,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
.set_status = vdpasim_set_status,
.reset = vdpasim_reset,
.suspend = vdpasim_suspend,
.resume = vdpasim_resume,
.get_config_size = vdpasim_get_config_size,
.get_config = vdpasim_get_config,
.set_config = vdpasim_set_config,
@ -737,6 +658,7 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
.set_vq_ready = vdpasim_set_vq_ready,
.get_vq_ready = vdpasim_get_vq_ready,
.set_vq_state = vdpasim_set_vq_state,
.get_vendor_vq_stats = vdpasim_get_vq_stats,
.get_vq_state = vdpasim_get_vq_state,
.get_vq_align = vdpasim_get_vq_align,
.get_vq_group = vdpasim_get_vq_group,
@ -751,6 +673,7 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
.set_status = vdpasim_set_status,
.reset = vdpasim_reset,
.suspend = vdpasim_suspend,
.resume = vdpasim_resume,
.get_config_size = vdpasim_get_config_size,
.get_config = vdpasim_get_config,
.set_config = vdpasim_set_config,

View File

@ -37,6 +37,7 @@ struct vdpasim_dev_attr {
struct vdpa_mgmt_dev *mgmt_dev;
const char *name;
u64 supported_features;
size_t alloc_size;
size_t config_size;
size_t buffer_size;
int nvqs;
@ -47,6 +48,9 @@ struct vdpasim_dev_attr {
work_func_t work_fn;
void (*get_config)(struct vdpasim *vdpasim, void *config);
void (*set_config)(struct vdpasim *vdpasim, const void *config);
int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
struct sk_buff *msg,
struct netlink_ext_ack *extack);
};
/* State of each vdpasim device */
@ -60,13 +64,14 @@ struct vdpasim {
/* virtio config according to device type */
void *config;
struct vhost_iotlb *iommu;
struct iova_domain iova;
bool *iommu_pt;
void *buffer;
u32 status;
u32 generation;
u64 features;
u32 groups;
bool running;
bool pending_kick;
/* spinlock to synchronize iommu table */
spinlock_t iommu_lock;
};

View File

@ -378,6 +378,7 @@ static int vdpasim_blk_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
dev_attr.nvqs = VDPASIM_BLK_VQ_NUM;
dev_attr.ngroups = VDPASIM_BLK_GROUP_NUM;
dev_attr.nas = VDPASIM_BLK_AS_NUM;
dev_attr.alloc_size = sizeof(struct vdpasim);
dev_attr.config_size = sizeof(struct virtio_blk_config);
dev_attr.get_config = vdpasim_blk_get_config;
dev_attr.work_fn = vdpasim_blk_work;

View File

@ -15,6 +15,7 @@
#include <linux/etherdevice.h>
#include <linux/vringh.h>
#include <linux/vdpa.h>
#include <net/netlink.h>
#include <uapi/linux/virtio_net.h>
#include <uapi/linux/vdpa.h>
@ -27,6 +28,7 @@
#define VDPASIM_NET_FEATURES (VDPASIM_FEATURES | \
(1ULL << VIRTIO_NET_F_MAC) | \
(1ULL << VIRTIO_NET_F_STATUS) | \
(1ULL << VIRTIO_NET_F_MTU) | \
(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
(1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR))
@ -36,6 +38,34 @@
#define VDPASIM_NET_AS_NUM 2
#define VDPASIM_NET_GROUP_NUM 2
struct vdpasim_dataq_stats {
struct u64_stats_sync syncp;
u64 pkts;
u64 bytes;
u64 drops;
u64 errors;
u64 overruns;
};
struct vdpasim_cq_stats {
struct u64_stats_sync syncp;
u64 requests;
u64 successes;
u64 errors;
};
struct vdpasim_net{
struct vdpasim vdpasim;
struct vdpasim_dataq_stats tx_stats;
struct vdpasim_dataq_stats rx_stats;
struct vdpasim_cq_stats cq_stats;
};
static struct vdpasim_net *sim_to_net(struct vdpasim *vdpasim)
{
return container_of(vdpasim, struct vdpasim_net, vdpasim);
}
static void vdpasim_net_complete(struct vdpasim_virtqueue *vq, size_t len)
{
/* Make sure data is wrote before advancing index */
@ -96,9 +126,11 @@ static virtio_net_ctrl_ack vdpasim_handle_ctrl_mac(struct vdpasim *vdpasim,
static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
{
struct vdpasim_virtqueue *cvq = &vdpasim->vqs[2];
struct vdpasim_net *net = sim_to_net(vdpasim);
virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
struct virtio_net_ctrl_hdr ctrl;
size_t read, write;
u64 requests = 0, errors = 0, successes = 0;
int err;
if (!(vdpasim->features & (1ULL << VIRTIO_NET_F_CTRL_VQ)))
@ -114,10 +146,13 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
if (err <= 0)
break;
++requests;
read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->in_iov, &ctrl,
sizeof(ctrl));
if (read != sizeof(ctrl))
if (read != sizeof(ctrl)) {
++errors;
break;
}
switch (ctrl.class) {
case VIRTIO_NET_CTRL_MAC:
@ -127,6 +162,11 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
break;
}
if (status == VIRTIO_NET_OK)
++successes;
else
++errors;
/* Make sure data is wrote before advancing index */
smp_wmb();
@ -144,6 +184,12 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
cvq->cb(cvq->private);
local_bh_enable();
}
u64_stats_update_begin(&net->cq_stats.syncp);
net->cq_stats.requests += requests;
net->cq_stats.errors += errors;
net->cq_stats.successes += successes;
u64_stats_update_end(&net->cq_stats.syncp);
}
static void vdpasim_net_work(struct work_struct *work)
@ -151,8 +197,10 @@ static void vdpasim_net_work(struct work_struct *work)
struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
struct vdpasim_net *net = sim_to_net(vdpasim);
ssize_t read, write;
int pkts = 0;
u64 tx_pkts = 0, rx_pkts = 0, tx_bytes = 0, rx_bytes = 0;
u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0;
int err;
spin_lock(&vdpasim->lock);
@ -171,14 +219,21 @@ static void vdpasim_net_work(struct work_struct *work)
while (true) {
err = vringh_getdesc_iotlb(&txq->vring, &txq->out_iov, NULL,
&txq->head, GFP_ATOMIC);
if (err <= 0)
if (err <= 0) {
if (err)
++tx_errors;
break;
}
++tx_pkts;
read = vringh_iov_pull_iotlb(&txq->vring, &txq->out_iov,
vdpasim->buffer,
PAGE_SIZE);
tx_bytes += read;
if (!receive_filter(vdpasim, read)) {
++rx_drops;
vdpasim_net_complete(txq, 0);
continue;
}
@ -186,19 +241,25 @@ static void vdpasim_net_work(struct work_struct *work)
err = vringh_getdesc_iotlb(&rxq->vring, NULL, &rxq->in_iov,
&rxq->head, GFP_ATOMIC);
if (err <= 0) {
++rx_overruns;
vdpasim_net_complete(txq, 0);
break;
}
write = vringh_iov_push_iotlb(&rxq->vring, &rxq->in_iov,
vdpasim->buffer, read);
if (write <= 0)
if (write <= 0) {
++rx_errors;
break;
}
++rx_pkts;
rx_bytes += write;
vdpasim_net_complete(txq, 0);
vdpasim_net_complete(rxq, write);
if (++pkts > 4) {
if (tx_pkts > 4) {
schedule_work(&vdpasim->work);
goto out;
}
@ -206,6 +267,145 @@ static void vdpasim_net_work(struct work_struct *work)
out:
spin_unlock(&vdpasim->lock);
u64_stats_update_begin(&net->tx_stats.syncp);
net->tx_stats.pkts += tx_pkts;
net->tx_stats.bytes += tx_bytes;
net->tx_stats.errors += tx_errors;
u64_stats_update_end(&net->tx_stats.syncp);
u64_stats_update_begin(&net->rx_stats.syncp);
net->rx_stats.pkts += rx_pkts;
net->rx_stats.bytes += rx_bytes;
net->rx_stats.drops += rx_drops;
net->rx_stats.errors += rx_errors;
net->rx_stats.overruns += rx_overruns;
u64_stats_update_end(&net->rx_stats.syncp);
}
static int vdpasim_net_get_stats(struct vdpasim *vdpasim, u16 idx,
struct sk_buff *msg,
struct netlink_ext_ack *extack)
{
struct vdpasim_net *net = sim_to_net(vdpasim);
u64 rx_pkts, rx_bytes, rx_errors, rx_overruns, rx_drops;
u64 tx_pkts, tx_bytes, tx_errors, tx_drops;
u64 cq_requests, cq_successes, cq_errors;
unsigned int start;
int err = -EMSGSIZE;
switch(idx) {
case 0:
do {
start = u64_stats_fetch_begin(&net->rx_stats.syncp);
rx_pkts = net->rx_stats.pkts;
rx_bytes = net->rx_stats.bytes;
rx_errors = net->rx_stats.errors;
rx_overruns = net->rx_stats.overruns;
rx_drops = net->rx_stats.drops;
} while (u64_stats_fetch_retry(&net->rx_stats.syncp, start));
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"rx packets"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
rx_pkts, VDPA_ATTR_PAD))
break;
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"rx bytes"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
rx_bytes, VDPA_ATTR_PAD))
break;
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"rx errors"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
rx_errors, VDPA_ATTR_PAD))
break;
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"rx overruns"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
rx_overruns, VDPA_ATTR_PAD))
break;
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"rx drops"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
rx_drops, VDPA_ATTR_PAD))
break;
err = 0;
break;
case 1:
do {
start = u64_stats_fetch_begin(&net->tx_stats.syncp);
tx_pkts = net->tx_stats.pkts;
tx_bytes = net->tx_stats.bytes;
tx_errors = net->tx_stats.errors;
tx_drops = net->tx_stats.drops;
} while (u64_stats_fetch_retry(&net->tx_stats.syncp, start));
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"tx packets"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
tx_pkts, VDPA_ATTR_PAD))
break;
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"tx bytes"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
tx_bytes, VDPA_ATTR_PAD))
break;
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"tx errors"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
tx_errors, VDPA_ATTR_PAD))
break;
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"tx drops"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
tx_drops, VDPA_ATTR_PAD))
break;
err = 0;
break;
case 2:
do {
start = u64_stats_fetch_begin(&net->cq_stats.syncp);
cq_requests = net->cq_stats.requests;
cq_successes = net->cq_stats.successes;
cq_errors = net->cq_stats.errors;
} while (u64_stats_fetch_retry(&net->cq_stats.syncp, start));
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"cvq requests"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
cq_requests, VDPA_ATTR_PAD))
break;
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"cvq successes"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
cq_successes, VDPA_ATTR_PAD))
break;
if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME,
"cvq errors"))
break;
if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,
cq_errors, VDPA_ATTR_PAD))
break;
err = 0;
break;
default:
err = -EINVAL;
break;
}
return err;
}
static void vdpasim_net_get_config(struct vdpasim *vdpasim, void *config)
@ -242,6 +442,7 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
const struct vdpa_dev_set_config *config)
{
struct vdpasim_dev_attr dev_attr = {};
struct vdpasim_net *net;
struct vdpasim *simdev;
int ret;
@ -252,9 +453,11 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
dev_attr.nvqs = VDPASIM_NET_VQ_NUM;
dev_attr.ngroups = VDPASIM_NET_GROUP_NUM;
dev_attr.nas = VDPASIM_NET_AS_NUM;
dev_attr.alloc_size = sizeof(struct vdpasim_net);
dev_attr.config_size = sizeof(struct virtio_net_config);
dev_attr.get_config = vdpasim_net_get_config;
dev_attr.work_fn = vdpasim_net_work;
dev_attr.get_stats = vdpasim_net_get_stats;
dev_attr.buffer_size = PAGE_SIZE;
simdev = vdpasim_create(&dev_attr, config);
@ -267,6 +470,12 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
if (ret)
goto reg_err;
net = sim_to_net(simdev);
u64_stats_init(&net->tx_stats.syncp);
u64_stats_init(&net->rx_stats.syncp);
u64_stats_init(&net->cq_stats.syncp);
return 0;
reg_err:

View File

@ -73,7 +73,8 @@ enum {
VHOST_NET_FEATURES = VHOST_FEATURES |
(1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
(1ULL << VIRTIO_NET_F_MRG_RXBUF) |
(1ULL << VIRTIO_F_ACCESS_PLATFORM)
(1ULL << VIRTIO_F_ACCESS_PLATFORM) |
(1ULL << VIRTIO_F_RING_RESET)
};
enum {
@ -1645,7 +1646,7 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features)
goto out_unlock;
if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) {
if (vhost_init_device_iotlb(&n->dev, true))
if (vhost_init_device_iotlb(&n->dev))
goto out_unlock;
}

View File

@ -2105,7 +2105,7 @@ static ssize_t vhost_scsi_tpg_attrib_fabric_prot_type_show(
struct vhost_scsi_tpg *tpg = container_of(se_tpg,
struct vhost_scsi_tpg, se_tpg);
return sprintf(page, "%d\n", tpg->tv_fabric_prot_type);
return sysfs_emit(page, "%d\n", tpg->tv_fabric_prot_type);
}
CONFIGFS_ATTR(vhost_scsi_tpg_attrib_, fabric_prot_type);
@ -2215,7 +2215,7 @@ static ssize_t vhost_scsi_tpg_nexus_show(struct config_item *item, char *page)
mutex_unlock(&tpg->tv_tpg_mutex);
return -ENODEV;
}
ret = snprintf(page, PAGE_SIZE, "%s\n",
ret = sysfs_emit(page, "%s\n",
tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
mutex_unlock(&tpg->tv_tpg_mutex);
@ -2440,7 +2440,7 @@ static void vhost_scsi_drop_tport(struct se_wwn *wwn)
static ssize_t
vhost_scsi_wwn_version_show(struct config_item *item, char *page)
{
return sprintf(page, "TCM_VHOST fabric module %s on %s/%s"
return sysfs_emit(page, "TCM_VHOST fabric module %s on %s/%s"
"on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname,
utsname()->machine);
}

View File

@ -333,13 +333,10 @@ static long vhost_test_ioctl(struct file *f, unsigned int ioctl,
return -EFAULT;
return 0;
case VHOST_SET_FEATURES:
printk(KERN_ERR "1\n");
if (copy_from_user(&features, featurep, sizeof features))
return -EFAULT;
printk(KERN_ERR "2\n");
if (features & ~VHOST_FEATURES)
return -EOPNOTSUPP;
printk(KERN_ERR "3\n");
return vhost_test_set_features(n, features);
case VHOST_RESET_OWNER:
return vhost_test_reset_owner(n);

View File

@ -359,6 +359,14 @@ static bool vhost_vdpa_can_suspend(const struct vhost_vdpa *v)
return ops->suspend;
}
static bool vhost_vdpa_can_resume(const struct vhost_vdpa *v)
{
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
return ops->resume;
}
static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep)
{
struct vdpa_device *vdpa = v->vdpa;
@ -498,6 +506,21 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
return ops->suspend(vdpa);
}
/* After a successful return of this ioctl the device resumes processing
* virtqueue descriptors. The device becomes fully operational the same way it
* was before it was suspended.
*/
static long vhost_vdpa_resume(struct vhost_vdpa *v)
{
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
if (!ops->resume)
return -EOPNOTSUPP;
return ops->resume(vdpa);
}
static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
void __user *argp)
{
@ -606,11 +629,15 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
if (copy_from_user(&features, featurep, sizeof(features)))
return -EFAULT;
if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
BIT_ULL(VHOST_BACKEND_F_SUSPEND)))
BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
BIT_ULL(VHOST_BACKEND_F_RESUME)))
return -EOPNOTSUPP;
if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
!vhost_vdpa_can_suspend(v))
return -EOPNOTSUPP;
if ((features & BIT_ULL(VHOST_BACKEND_F_RESUME)) &&
!vhost_vdpa_can_resume(v))
return -EOPNOTSUPP;
vhost_set_backend_features(&v->vdev, features);
return 0;
}
@ -662,6 +689,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
features = VHOST_VDPA_BACKEND_FEATURES;
if (vhost_vdpa_can_suspend(v))
features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND);
if (vhost_vdpa_can_resume(v))
features |= BIT_ULL(VHOST_BACKEND_F_RESUME);
if (copy_to_user(featurep, &features, sizeof(features)))
r = -EFAULT;
break;
@ -677,6 +706,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
case VHOST_VDPA_SUSPEND:
r = vhost_vdpa_suspend(v);
break;
case VHOST_VDPA_RESUME:
r = vhost_vdpa_resume(v);
break;
default:
r = vhost_dev_ioctl(&v->vdev, cmd, argp);
if (r == -ENOIOCTLCMD)
@ -1119,8 +1151,11 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
if (!bus)
return -EFAULT;
if (!device_iommu_capable(dma_dev, IOMMU_CAP_CACHE_COHERENCY))
if (!device_iommu_capable(dma_dev, IOMMU_CAP_CACHE_COHERENCY)) {
dev_warn_once(&v->dev,
"Failed to allocate domain, device is not IOMMU cache coherent capable\n");
return -ENOTSUPP;
}
v->domain = iommu_domain_alloc(bus);
if (!v->domain)

View File

@ -1730,7 +1730,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
}
EXPORT_SYMBOL_GPL(vhost_vring_ioctl);
int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled)
int vhost_init_device_iotlb(struct vhost_dev *d)
{
struct vhost_iotlb *niotlb, *oiotlb;
int i;

View File

@ -222,7 +222,7 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct iov_iter *to,
int noblock);
ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
struct iov_iter *from);
int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled);
int vhost_init_device_iotlb(struct vhost_dev *d);
void vhost_iotlb_map_free(struct vhost_iotlb *iotlb,
struct vhost_iotlb_map *map);

View File

@ -793,7 +793,7 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
}
if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) {
if (vhost_init_device_iotlb(&vsock->dev, true))
if (vhost_init_device_iotlb(&vsock->dev))
goto err;
}

View File

@ -202,6 +202,9 @@ struct vring_virtqueue {
/* DMA, allocation, and size information */
bool we_own_ring;
/* Device used for doing DMA */
struct device *dma_dev;
#ifdef DEBUG
/* They're supposed to lock for us. */
unsigned int in_use;
@ -219,7 +222,8 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
bool context,
bool (*notify)(struct virtqueue *),
void (*callback)(struct virtqueue *),
const char *name);
const char *name,
struct device *dma_dev);
static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num);
static void vring_free(struct virtqueue *_vq);
@ -297,10 +301,11 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
EXPORT_SYMBOL_GPL(virtio_max_dma_size);
static void *vring_alloc_queue(struct virtio_device *vdev, size_t size,
dma_addr_t *dma_handle, gfp_t flag)
dma_addr_t *dma_handle, gfp_t flag,
struct device *dma_dev)
{
if (vring_use_dma_api(vdev)) {
return dma_alloc_coherent(vdev->dev.parent, size,
return dma_alloc_coherent(dma_dev, size,
dma_handle, flag);
} else {
void *queue = alloc_pages_exact(PAGE_ALIGN(size), flag);
@ -330,10 +335,11 @@ static void *vring_alloc_queue(struct virtio_device *vdev, size_t size,
}
static void vring_free_queue(struct virtio_device *vdev, size_t size,
void *queue, dma_addr_t dma_handle)
void *queue, dma_addr_t dma_handle,
struct device *dma_dev)
{
if (vring_use_dma_api(vdev))
dma_free_coherent(vdev->dev.parent, size, queue, dma_handle);
dma_free_coherent(dma_dev, size, queue, dma_handle);
else
free_pages_exact(queue, PAGE_ALIGN(size));
}
@ -341,11 +347,11 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
/*
* The DMA ops on various arches are rather gnarly right now, and
* making all of the arch DMA ops work on the vring device itself
* is a mess. For now, we use the parent device for DMA ops.
* is a mess.
*/
static inline struct device *vring_dma_dev(const struct vring_virtqueue *vq)
{
return vq->vq.vdev->dev.parent;
return vq->dma_dev;
}
/* Map one sg entry. */
@ -1032,11 +1038,12 @@ err_state:
}
static void vring_free_split(struct vring_virtqueue_split *vring_split,
struct virtio_device *vdev)
struct virtio_device *vdev, struct device *dma_dev)
{
vring_free_queue(vdev, vring_split->queue_size_in_bytes,
vring_split->vring.desc,
vring_split->queue_dma_addr);
vring_split->queue_dma_addr,
dma_dev);
kfree(vring_split->desc_state);
kfree(vring_split->desc_extra);
@ -1046,7 +1053,8 @@ static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split,
struct virtio_device *vdev,
u32 num,
unsigned int vring_align,
bool may_reduce_num)
bool may_reduce_num,
struct device *dma_dev)
{
void *queue = NULL;
dma_addr_t dma_addr;
@ -1061,7 +1069,8 @@ static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split,
for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
&dma_addr,
GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO);
GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO,
dma_dev);
if (queue)
break;
if (!may_reduce_num)
@ -1074,7 +1083,8 @@ static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split,
if (!queue) {
/* Try to get a single page. You are my only hope! */
queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
&dma_addr, GFP_KERNEL | __GFP_ZERO);
&dma_addr, GFP_KERNEL | __GFP_ZERO,
dma_dev);
}
if (!queue)
return -ENOMEM;
@ -1100,21 +1110,22 @@ static struct virtqueue *vring_create_virtqueue_split(
bool context,
bool (*notify)(struct virtqueue *),
void (*callback)(struct virtqueue *),
const char *name)
const char *name,
struct device *dma_dev)
{
struct vring_virtqueue_split vring_split = {};
struct virtqueue *vq;
int err;
err = vring_alloc_queue_split(&vring_split, vdev, num, vring_align,
may_reduce_num);
may_reduce_num, dma_dev);
if (err)
return NULL;
vq = __vring_new_virtqueue(index, &vring_split, vdev, weak_barriers,
context, notify, callback, name);
context, notify, callback, name, dma_dev);
if (!vq) {
vring_free_split(&vring_split, vdev);
vring_free_split(&vring_split, vdev, dma_dev);
return NULL;
}
@ -1132,7 +1143,8 @@ static int virtqueue_resize_split(struct virtqueue *_vq, u32 num)
err = vring_alloc_queue_split(&vring_split, vdev, num,
vq->split.vring_align,
vq->split.may_reduce_num);
vq->split.may_reduce_num,
vring_dma_dev(vq));
if (err)
goto err;
@ -1150,7 +1162,7 @@ static int virtqueue_resize_split(struct virtqueue *_vq, u32 num)
return 0;
err_state_extra:
vring_free_split(&vring_split, vdev);
vring_free_split(&vring_split, vdev, vring_dma_dev(vq));
err:
virtqueue_reinit_split(vq);
return -ENOMEM;
@ -1841,22 +1853,26 @@ static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num)
}
static void vring_free_packed(struct vring_virtqueue_packed *vring_packed,
struct virtio_device *vdev)
struct virtio_device *vdev,
struct device *dma_dev)
{
if (vring_packed->vring.desc)
vring_free_queue(vdev, vring_packed->ring_size_in_bytes,
vring_packed->vring.desc,
vring_packed->ring_dma_addr);
vring_packed->ring_dma_addr,
dma_dev);
if (vring_packed->vring.driver)
vring_free_queue(vdev, vring_packed->event_size_in_bytes,
vring_packed->vring.driver,
vring_packed->driver_event_dma_addr);
vring_packed->driver_event_dma_addr,
dma_dev);
if (vring_packed->vring.device)
vring_free_queue(vdev, vring_packed->event_size_in_bytes,
vring_packed->vring.device,
vring_packed->device_event_dma_addr);
vring_packed->device_event_dma_addr,
dma_dev);
kfree(vring_packed->desc_state);
kfree(vring_packed->desc_extra);
@ -1864,7 +1880,7 @@ static void vring_free_packed(struct vring_virtqueue_packed *vring_packed,
static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed,
struct virtio_device *vdev,
u32 num)
u32 num, struct device *dma_dev)
{
struct vring_packed_desc *ring;
struct vring_packed_desc_event *driver, *device;
@ -1875,7 +1891,8 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed,
ring = vring_alloc_queue(vdev, ring_size_in_bytes,
&ring_dma_addr,
GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO);
GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO,
dma_dev);
if (!ring)
goto err;
@ -1887,7 +1904,8 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed,
driver = vring_alloc_queue(vdev, event_size_in_bytes,
&driver_event_dma_addr,
GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO);
GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO,
dma_dev);
if (!driver)
goto err;
@ -1897,7 +1915,8 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed,
device = vring_alloc_queue(vdev, event_size_in_bytes,
&device_event_dma_addr,
GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO);
GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO,
dma_dev);
if (!device)
goto err;
@ -1909,7 +1928,7 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed,
return 0;
err:
vring_free_packed(vring_packed, vdev);
vring_free_packed(vring_packed, vdev, dma_dev);
return -ENOMEM;
}
@ -1987,13 +2006,14 @@ static struct virtqueue *vring_create_virtqueue_packed(
bool context,
bool (*notify)(struct virtqueue *),
void (*callback)(struct virtqueue *),
const char *name)
const char *name,
struct device *dma_dev)
{
struct vring_virtqueue_packed vring_packed = {};
struct vring_virtqueue *vq;
int err;
if (vring_alloc_queue_packed(&vring_packed, vdev, num))
if (vring_alloc_queue_packed(&vring_packed, vdev, num, dma_dev))
goto err_ring;
vq = kmalloc(sizeof(*vq), GFP_KERNEL);
@ -2014,6 +2034,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->broken = false;
#endif
vq->packed_ring = true;
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
@ -2040,7 +2061,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
err_state_extra:
kfree(vq);
err_vq:
vring_free_packed(&vring_packed, vdev);
vring_free_packed(&vring_packed, vdev, dma_dev);
err_ring:
return NULL;
}
@ -2052,7 +2073,7 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
struct virtio_device *vdev = _vq->vdev;
int err;
if (vring_alloc_queue_packed(&vring_packed, vdev, num))
if (vring_alloc_queue_packed(&vring_packed, vdev, num, vring_dma_dev(vq)))
goto err_ring;
err = vring_alloc_state_extra_packed(&vring_packed);
@ -2069,7 +2090,7 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
return 0;
err_state_extra:
vring_free_packed(&vring_packed, vdev);
vring_free_packed(&vring_packed, vdev, vring_dma_dev(vq));
err_ring:
virtqueue_reinit_packed(vq);
return -ENOMEM;
@ -2481,7 +2502,8 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
bool context,
bool (*notify)(struct virtqueue *),
void (*callback)(struct virtqueue *),
const char *name)
const char *name,
struct device *dma_dev)
{
struct vring_virtqueue *vq;
int err;
@ -2507,6 +2529,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
#else
vq->broken = false;
#endif
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
@ -2549,14 +2572,39 @@ struct virtqueue *vring_create_virtqueue(
if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED))
return vring_create_virtqueue_packed(index, num, vring_align,
vdev, weak_barriers, may_reduce_num,
context, notify, callback, name);
context, notify, callback, name, vdev->dev.parent);
return vring_create_virtqueue_split(index, num, vring_align,
vdev, weak_barriers, may_reduce_num,
context, notify, callback, name);
context, notify, callback, name, vdev->dev.parent);
}
EXPORT_SYMBOL_GPL(vring_create_virtqueue);
struct virtqueue *vring_create_virtqueue_dma(
unsigned int index,
unsigned int num,
unsigned int vring_align,
struct virtio_device *vdev,
bool weak_barriers,
bool may_reduce_num,
bool context,
bool (*notify)(struct virtqueue *),
void (*callback)(struct virtqueue *),
const char *name,
struct device *dma_dev)
{
if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED))
return vring_create_virtqueue_packed(index, num, vring_align,
vdev, weak_barriers, may_reduce_num,
context, notify, callback, name, dma_dev);
return vring_create_virtqueue_split(index, num, vring_align,
vdev, weak_barriers, may_reduce_num,
context, notify, callback, name, dma_dev);
}
EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
/**
* virtqueue_resize - resize the vring of vq
* @_vq: the struct virtqueue we're talking about.
@ -2645,7 +2693,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
vring_init(&vring_split.vring, num, pages, vring_align);
return __vring_new_virtqueue(index, &vring_split, vdev, weak_barriers,
context, notify, callback, name);
context, notify, callback, name,
vdev->dev.parent);
}
EXPORT_SYMBOL_GPL(vring_new_virtqueue);
@ -2658,17 +2707,20 @@ static void vring_free(struct virtqueue *_vq)
vring_free_queue(vq->vq.vdev,
vq->packed.ring_size_in_bytes,
vq->packed.vring.desc,
vq->packed.ring_dma_addr);
vq->packed.ring_dma_addr,
vring_dma_dev(vq));
vring_free_queue(vq->vq.vdev,
vq->packed.event_size_in_bytes,
vq->packed.vring.driver,
vq->packed.driver_event_dma_addr);
vq->packed.driver_event_dma_addr,
vring_dma_dev(vq));
vring_free_queue(vq->vq.vdev,
vq->packed.event_size_in_bytes,
vq->packed.vring.device,
vq->packed.device_event_dma_addr);
vq->packed.device_event_dma_addr,
vring_dma_dev(vq));
kfree(vq->packed.desc_state);
kfree(vq->packed.desc_extra);
@ -2676,7 +2728,8 @@ static void vring_free(struct virtqueue *_vq)
vring_free_queue(vq->vq.vdev,
vq->split.queue_size_in_bytes,
vq->split.vring.desc,
vq->split.queue_dma_addr);
vq->split.queue_dma_addr,
vring_dma_dev(vq));
}
}
if (!vq->packed_ring) {

View File

@ -135,6 +135,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index,
{
struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
struct vdpa_device *vdpa = vd_get_vdpa(vdev);
struct device *dma_dev;
const struct vdpa_config_ops *ops = vdpa->config;
struct virtio_vdpa_vq_info *info;
struct vdpa_callback cb;
@ -175,9 +176,15 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index,
/* Create the vring */
align = ops->get_vq_align(vdpa);
vq = vring_create_virtqueue(index, max_num, align, vdev,
true, may_reduce_num, ctx,
virtio_vdpa_notify, callback, name);
if (ops->get_vq_dma_dev)
dma_dev = ops->get_vq_dma_dev(vdpa, index);
else
dma_dev = vdpa_get_dma_dev(vdpa);
vq = vring_create_virtqueue_dma(index, max_num, align, vdev,
true, may_reduce_num, ctx,
virtio_vdpa_notify, callback,
name, dma_dev);
if (!vq) {
err = -ENOMEM;
goto error_new_virtqueue;

View File

@ -3094,6 +3094,8 @@
#define PCI_VENDOR_ID_3COM_2 0xa727
#define PCI_VENDOR_ID_SOLIDRUN 0xd063
#define PCI_VENDOR_ID_DIGIUM 0xd161
#define PCI_DEVICE_ID_DIGIUM_HFC4S 0xb410

View File

@ -219,7 +219,10 @@ struct vdpa_map_file {
* @reset: Reset device
* @vdev: vdpa device
* Returns integer: success (0) or error (< 0)
* @suspend: Suspend or resume the device (optional)
* @suspend: Suspend the device (optional)
* @vdev: vdpa device
* Returns integer: success (0) or error (< 0)
* @resume: Resume the device (optional)
* @vdev: vdpa device
* Returns integer: success (0) or error (< 0)
* @get_config_size: Get the size of the configuration space includes
@ -282,6 +285,11 @@ struct vdpa_map_file {
* @iova: iova to be unmapped
* @size: size of the area
* Returns integer: success (0) or error (< 0)
* @get_vq_dma_dev: Get the dma device for a specific
* virtqueue (optional)
* @vdev: vdpa device
* @idx: virtqueue index
* Returns pointer to structure device or error (NULL)
* @free: Free resources that belongs to vDPA (optional)
* @vdev: vdpa device
*/
@ -324,6 +332,7 @@ struct vdpa_config_ops {
void (*set_status)(struct vdpa_device *vdev, u8 status);
int (*reset)(struct vdpa_device *vdev);
int (*suspend)(struct vdpa_device *vdev);
int (*resume)(struct vdpa_device *vdev);
size_t (*get_config_size)(struct vdpa_device *vdev);
void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
void *buf, unsigned int len);
@ -341,6 +350,7 @@ struct vdpa_config_ops {
u64 iova, u64 size);
int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
unsigned int asid);
struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
/* Free device resources */
void (*free)(struct vdpa_device *vdev);

View File

@ -16,8 +16,10 @@ struct virtio_shm_region {
u64 len;
};
typedef void vq_callback_t(struct virtqueue *);
/**
* virtio_config_ops - operations for configuring a virtio device
* struct virtio_config_ops - operations for configuring a virtio device
* Note: Do not assume that a transport implements all of the operations
* getting/setting a value as a simple read/write! Generally speaking,
* any of @get/@set, @get_status/@set_status, or @get_features/
@ -69,7 +71,8 @@ struct virtio_shm_region {
* vdev: the virtio_device
* This sends the driver feature bits to the device: it can change
* the dev->feature bits if it wants.
* Note: despite the name this can be called any number of times.
* Note that despite the name this can be called any number of
* times.
* Returns 0 on success or error status
* @bus_name: return the bus name associated with the device (optional)
* vdev: the virtio_device
@ -91,7 +94,6 @@ struct virtio_shm_region {
* If disable_vq_and_reset is set, then enable_vq_after_reset must also be
* set.
*/
typedef void vq_callback_t(struct virtqueue *);
struct virtio_config_ops {
void (*get)(struct virtio_device *vdev, unsigned offset,
void *buf, unsigned len);

View File

@ -76,6 +76,22 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
void (*callback)(struct virtqueue *vq),
const char *name);
/*
* Creates a virtqueue and allocates the descriptor ring with per
* virtqueue DMA device.
*/
struct virtqueue *vring_create_virtqueue_dma(unsigned int index,
unsigned int num,
unsigned int vring_align,
struct virtio_device *vdev,
bool weak_barriers,
bool may_reduce_num,
bool ctx,
bool (*notify)(struct virtqueue *vq),
void (*callback)(struct virtqueue *vq),
const char *name,
struct device *dma_dev);
/*
* Creates a virtqueue with a standard layout but a caller-allocated
* ring.

View File

@ -92,7 +92,7 @@ struct vringh_iov {
};
/**
* struct vringh_iov - kvec mangler.
* struct vringh_kiov - kvec mangler.
*
* Mangles kvec in place, and restores it.
* Remaining data is iov + i, of used - i elements.

View File

@ -180,4 +180,12 @@
*/
#define VHOST_VDPA_SUSPEND _IO(VHOST_VIRTIO, 0x7D)
/* Resume a device so it can resume processing virtqueue requests
*
* After the return of this ioctl the device will have restored all the
* necessary states and it is fully operational to continue processing the
* virtqueue descriptors.
*/
#define VHOST_VDPA_RESUME _IO(VHOST_VIRTIO, 0x7E)
#endif

View File

@ -163,5 +163,7 @@ struct vhost_vdpa_iova_range {
#define VHOST_BACKEND_F_IOTLB_ASID 0x3
/* Device can be suspended */
#define VHOST_BACKEND_F_SUSPEND 0x4
/* Device can be resumed */
#define VHOST_BACKEND_F_RESUME 0x5
#endif

View File

@ -41,6 +41,7 @@
#define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */
#define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */
#define VIRTIO_BLK_F_SECURE_ERASE 16 /* Secure Erase is supported */
#define VIRTIO_BLK_F_ZONED 17 /* Zoned block device */
/* Legacy feature bits */
#ifndef VIRTIO_BLK_NO_LEGACY
@ -137,6 +138,16 @@ struct virtio_blk_config {
/* Secure erase commands must be aligned to this number of sectors. */
__virtio32 secure_erase_sector_alignment;
/* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
struct virtio_blk_zoned_characteristics {
__le32 zone_sectors;
__le32 max_open_zones;
__le32 max_active_zones;
__le32 max_append_sectors;
__le32 write_granularity;
__u8 model;
__u8 unused2[3];
} zoned;
} __attribute__((packed));
/*
@ -174,6 +185,27 @@ struct virtio_blk_config {
/* Secure erase command */
#define VIRTIO_BLK_T_SECURE_ERASE 14
/* Zone append command */
#define VIRTIO_BLK_T_ZONE_APPEND 15
/* Report zones command */
#define VIRTIO_BLK_T_ZONE_REPORT 16
/* Open zone command */
#define VIRTIO_BLK_T_ZONE_OPEN 18
/* Close zone command */
#define VIRTIO_BLK_T_ZONE_CLOSE 20
/* Finish zone command */
#define VIRTIO_BLK_T_ZONE_FINISH 22
/* Reset zone command */
#define VIRTIO_BLK_T_ZONE_RESET 24
/* Reset All zones command */
#define VIRTIO_BLK_T_ZONE_RESET_ALL 26
#ifndef VIRTIO_BLK_NO_LEGACY
/* Barrier before this op. */
#define VIRTIO_BLK_T_BARRIER 0x80000000
@ -193,6 +225,72 @@ struct virtio_blk_outhdr {
__virtio64 sector;
};
/*
* Supported zoned device models.
*/
/* Regular block device */
#define VIRTIO_BLK_Z_NONE 0
/* Host-managed zoned device */
#define VIRTIO_BLK_Z_HM 1
/* Host-aware zoned device */
#define VIRTIO_BLK_Z_HA 2
/*
* Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply.
*/
struct virtio_blk_zone_descriptor {
/* Zone capacity */
__le64 z_cap;
/* The starting sector of the zone */
__le64 z_start;
/* Zone write pointer position in sectors */
__le64 z_wp;
/* Zone type */
__u8 z_type;
/* Zone state */
__u8 z_state;
__u8 reserved[38];
};
struct virtio_blk_zone_report {
__le64 nr_zones;
__u8 reserved[56];
struct virtio_blk_zone_descriptor zones[];
};
/*
* Supported zone types.
*/
/* Conventional zone */
#define VIRTIO_BLK_ZT_CONV 1
/* Sequential Write Required zone */
#define VIRTIO_BLK_ZT_SWR 2
/* Sequential Write Preferred zone */
#define VIRTIO_BLK_ZT_SWP 3
/*
* Zone states that are available for zones of all types.
*/
/* Not a write pointer (conventional zones only) */
#define VIRTIO_BLK_ZS_NOT_WP 0
/* Empty */
#define VIRTIO_BLK_ZS_EMPTY 1
/* Implicitly Open */
#define VIRTIO_BLK_ZS_IOPEN 2
/* Explicitly Open */
#define VIRTIO_BLK_ZS_EOPEN 3
/* Closed */
#define VIRTIO_BLK_ZS_CLOSED 4
/* Read-Only */
#define VIRTIO_BLK_ZS_RDONLY 13
/* Full */
#define VIRTIO_BLK_ZS_FULL 14
/* Offline */
#define VIRTIO_BLK_ZS_OFFLINE 15
/* Unmap this range (only valid for write zeroes command) */
#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x00000001
@ -219,4 +317,11 @@ struct virtio_scsi_inhdr {
#define VIRTIO_BLK_S_OK 0
#define VIRTIO_BLK_S_IOERR 1
#define VIRTIO_BLK_S_UNSUPP 2
/* Error codes that are specific to zoned block devices */
#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3
#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4
#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5
#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6
#endif /* _LINUX_VIRTIO_BLK_H */

View File

@ -4,7 +4,7 @@ test: virtio_test vringh_test
virtio_test: virtio_ring.o virtio_test.o
vringh_test: vringh_test.o vringh.o virtio_ring.o
CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h
CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h -mfunction-return=thunk -fcf-protection=none -mindirect-branch-register
CFLAGS += -pthread
LDFLAGS += -pthread
vpath %.c ../../drivers/virtio ../../drivers/vhost