Commit Graph

38 Commits

Author SHA1 Message Date
Christian Brauner 3652117f85 eventfd: simplify eventfd_signal()
Ever since the eventfd type was introduced back in 2007 in commit
e1ad7468c7 ("signal/timer/event: eventfd core") the eventfd_signal()
function only ever passed 1 as a value for @n. There's no point in
keeping that additional argument.

Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org
Acked-by: Xu Yilun <yilun.xu@intel.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl
Acked-by: Eric Farman <farman@linux.ibm.com>  # s390
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-28 14:08:38 +01:00
Greg Kroah-Hartman 484f0a071f vduse: make vduse_class constant
Now that the driver core allows for struct class to be in read-only
memory, we should make all 'class' structures declared at build time
placing them into read-only memory, instead of having to be dynamically
allocated at runtime.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <2023100643-tricolor-citizen-6c2d@gregkh>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-11-01 09:19:59 -04:00
Maxime Coquelin 7ca26efb09 vduse: Use proper spinlock for IRQ injection
The IRQ injection work used spin_lock_irq() to protect the
scheduling of the softirq, but spin_lock_bh() should be
used.

With spin_lock_irq(), we noticed delay of more than 6
seconds between the time a NAPI polling work is scheduled
and the time it is executed.

Fixes: c8a6153b6c ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Cc: xieyongji@bytedance.com

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20230705114505.63274-1-maxime.coquelin@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-08-10 15:24:28 -04:00
Linus Torvalds a8d70602b1 virtio: features, fixes, cleanups
resume support in vdpa/solidrun
 structure size optimizations in virtio_pci
 new pds_vdpa driver
 immediate initialization mechanism for vdpa/ifcvf
 interrupt bypass for vdpa/mlx5
 multiple worker support for vhost
 viirtio net in Intel F2000X-PL support for vdpa/ifcvf
 
 fixes, cleanups all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmSi95cPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpJ58H/A++mwjLsoiJ/3xgf2S2Fn0p0lkgxg61LrjD
 Q9p8zstG/BndRvv018XcJKoklaHpb4WcmNctKRNZJ6BjPj+ZwdADuSzoaUNurPD4
 M3s/nXVWvg7/2FifGDpCzQpyw0MR0Ip6/yqtdTl4vuLZGxI1nNGDfYbDgc3vZeCp
 pYTLsl/XUPzix4iJouEQqy5rmlUia3IE751aECrdk58z17lDo/Znar4hsXF0acQY
 +G6C/MNZQgf1rtJIl901h54W8rAoubxeda80uXSQk9fDyPG/wZYh2wFBGPi9qYa+
 9E+tOgHprMZ7G5GEopDdd19UctuW7M+YXPWZa1ijp8EZXmMpzuo=
 =5tdF
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - resume support in vdpa/solidrun

 - structure size optimizations in virtio_pci

 - new pds_vdpa driver

 - immediate initialization mechanism for vdpa/ifcvf

 - interrupt bypass for vdpa/mlx5

 - multiple worker support for vhost

 - viirtio net in Intel F2000X-PL support for vdpa/ifcvf

 - fixes, cleanups all over the place

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
  vhost: Make parameter name match of vhost_get_vq_desc()
  vduse: fix NULL pointer dereference
  vhost: Allow worker switching while work is queueing
  vhost_scsi: add support for worker ioctls
  vhost: allow userspace to create workers
  vhost: replace single worker pointer with xarray
  vhost: add helper to parse userspace vring state/file
  vhost: remove vhost_work_queue
  vhost_scsi: flush IO vqs then send TMF rsp
  vhost_scsi: convert to vhost_vq_work_queue
  vhost_scsi: make SCSI cmd completion per vq
  vhost_sock: convert to vhost_vq_work_queue
  vhost: convert poll work to be vq based
  vhost: take worker or vq for flushing
  vhost: take worker or vq instead of dev for queueing
  vhost, vhost_net: add helper to check if vq has work
  vhost: add vhost_worker pointer to vhost_virtqueue
  vhost: dynamically allocate vhost_worker
  vhost: create worker at end of vhost_dev_set_owner
  virtio_bt: call scheduler when we free unused buffs
  ...
2023-07-03 15:38:26 -07:00
Maxime Coquelin f06cf1e1a5 vduse: fix NULL pointer dereference
vduse_vdpa_set_vq_affinity callback can be called
with NULL value as cpu_mask when deleting the vduse
device.

This patch resets virtqueue's IRQ affinity mask value
to set all CPUs instead of dereferencing NULL cpu_mask.

[ 4760.952149] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 4760.959110] #PF: supervisor read access in kernel mode
[ 4760.964247] #PF: error_code(0x0000) - not-present page
[ 4760.969385] PGD 0 P4D 0
[ 4760.971927] Oops: 0000 [#1] PREEMPT SMP PTI
[ 4760.976112] CPU: 13 PID: 2346 Comm: vdpa Not tainted 6.4.0-rc6+ #4
[ 4760.982291] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.8.1 06/26/2020
[ 4760.989769] RIP: 0010:memcpy_orig+0xc5/0x130
[ 4760.994049] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa 08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc cc 66
[ 4761.012793] RSP: 0018:ffffb1d565abb830 EFLAGS: 00010246
[ 4761.018020] RAX: ffff9f4bf6b27898 RBX: ffff9f4be23969c0 RCX: ffff9f4bcadf6400
[ 4761.025152] RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffff9f4bf6b27898
[ 4761.032286] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000
[ 4761.039416] R10: 0000000000000000 R11: 0000000000000600 R12: 0000000000000000
[ 4761.046549] R13: 0000000000000000 R14: 0000000000000080 R15: ffffb1d565abbb10
[ 4761.053680] FS:  00007f64c2ec2740(0000) GS:ffff9f635f980000(0000) knlGS:0000000000000000
[ 4761.061765] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4761.067513] CR2: 0000000000000000 CR3: 0000001875270006 CR4: 00000000007706e0
[ 4761.074645] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4761.081775] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4761.088909] PKRU: 55555554
[ 4761.091620] Call Trace:
[ 4761.094074]  <TASK>
[ 4761.096180]  ? __die+0x1f/0x70
[ 4761.099238]  ? page_fault_oops+0x171/0x4f0
[ 4761.103340]  ? exc_page_fault+0x7b/0x180
[ 4761.107265]  ? asm_exc_page_fault+0x22/0x30
[ 4761.111460]  ? memcpy_orig+0xc5/0x130
[ 4761.115126]  vduse_vdpa_set_vq_affinity+0x3e/0x50 [vduse]
[ 4761.120533]  virtnet_clean_affinity.part.0+0x3d/0x90 [virtio_net]
[ 4761.126635]  remove_vq_common+0x1a4/0x250 [virtio_net]
[ 4761.131781]  virtnet_remove+0x5d/0x70 [virtio_net]
[ 4761.136580]  virtio_dev_remove+0x3a/0x90
[ 4761.140509]  device_release_driver_internal+0x19b/0x200
[ 4761.145742]  bus_remove_device+0xc2/0x130
[ 4761.149755]  device_del+0x158/0x3e0
[ 4761.153245]  ? kernfs_find_ns+0x35/0xc0
[ 4761.157086]  device_unregister+0x13/0x60
[ 4761.161010]  unregister_virtio_device+0x11/0x20
[ 4761.165543]  device_release_driver_internal+0x19b/0x200
[ 4761.170770]  bus_remove_device+0xc2/0x130
[ 4761.174782]  device_del+0x158/0x3e0
[ 4761.178276]  ? __pfx_vdpa_name_match+0x10/0x10 [vdpa]
[ 4761.183336]  device_unregister+0x13/0x60
[ 4761.187260]  vdpa_nl_cmd_dev_del_set_doit+0x63/0xe0 [vdpa]

Fixes: 28f6288eb6 ("vduse: Support set_vq_affinity callback")
Cc: xieyongji@bytedance.com
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20230622204851.318125-1-maxime.coquelin@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-07-03 12:15:14 -04:00
Linus Torvalds 6e17c6de3d - Yosry Ahmed brought back some cgroup v1 stats in OOM logs.
- Yosry has also eliminated cgroup's atomic rstat flushing.
 
 - Nhat Pham adds the new cachestat() syscall.  It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability.
 
 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning.
 
 - Lorenzo Stoakes has done some maintanance work on the get_user_pages()
   interface.
 
 - Liam Howlett continues with cleanups and maintenance work to the maple
   tree code.  Peng Zhang also does some work on maple tree.
 
 - Johannes Weiner has done some cleanup work on the compaction code.
 
 - David Hildenbrand has contributed additional selftests for
   get_user_pages().
 
 - Thomas Gleixner has contributed some maintenance and optimization work
   for the vmalloc code.
 
 - Baolin Wang has provided some compaction cleanups,
 
 - SeongJae Park continues maintenance work on the DAMON code.
 
 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting.
 
 - Christoph Hellwig has some cleanups for the filemap/directio code.
 
 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the provided
   APIs rather than open-coding accesses.
 
 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings.
 
 - John Hubbard has a series of fixes to the MM selftesting code.
 
 - ZhangPeng continues the folio conversion campaign.
 
 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock.
 
 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from
   128 to 8.
 
 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management.
 
 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code.
 
 - Vishal Moola also has done some folio conversion work.
 
 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA
 joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh
 J0qXCzqaaN8+BuEyLGDVPaXur9KirwY=
 =B7yQ
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:

 - Yosry Ahmed brought back some cgroup v1 stats in OOM logs

 - Yosry has also eliminated cgroup's atomic rstat flushing

 - Nhat Pham adds the new cachestat() syscall. It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability

 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning

 - Lorenzo Stoakes has done some maintanance work on the
   get_user_pages() interface

 - Liam Howlett continues with cleanups and maintenance work to the
   maple tree code. Peng Zhang also does some work on maple tree

 - Johannes Weiner has done some cleanup work on the compaction code

 - David Hildenbrand has contributed additional selftests for
   get_user_pages()

 - Thomas Gleixner has contributed some maintenance and optimization
   work for the vmalloc code

 - Baolin Wang has provided some compaction cleanups,

 - SeongJae Park continues maintenance work on the DAMON code

 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting

 - Christoph Hellwig has some cleanups for the filemap/directio code

 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the
   provided APIs rather than open-coding accesses

 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings

 - John Hubbard has a series of fixes to the MM selftesting code

 - ZhangPeng continues the folio conversion campaign

 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock

 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
   from 128 to 8

 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management

 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code

 - Vishal Moola also has done some folio conversion work

 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch

* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
  mm/hugetlb: remove hugetlb_set_page_subpool()
  mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
  hugetlb: revert use of page_cache_next_miss()
  Revert "page cache: fix page_cache_next/prev_miss off by one"
  mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
  mm: memcg: rename and document global_reclaim()
  mm: kill [add|del]_page_to_lru_list()
  mm: compaction: convert to use a folio in isolate_migratepages_block()
  mm: zswap: fix double invalidate with exclusive loads
  mm: remove unnecessary pagevec includes
  mm: remove references to pagevec
  mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: remove struct pagevec
  net: convert sunrpc from pagevec to folio_batch
  i915: convert i915_gpu_error to use a folio_batch
  pagevec: rename fbatch_count()
  mm: remove check_move_unevictable_pages()
  drm: convert drm_gem_put_pages() to use a folio_batch
  i915: convert shmem_sg_free_table() to use a folio_batch
  scatterlist: add sg_set_folio()
  ...
2023-06-28 10:28:11 -07:00
Lorenzo Stoakes 4c630f3074 mm/gup: remove vmas parameter from pin_user_pages()
We are now in a position where no caller of pin_user_pages() requires the
vmas parameter at all, so eliminate this parameter from the function and
all callers.

This clears the way to removing the vmas parameter from GUP altogether.

Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>	[qib]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>	[drivers/media]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:26 -07:00
Sheng Zhao a90e8608eb vduse: avoid empty string for dev name
Syzkaller hits a kernel WARN when the first character of the dev name
provided is NULL. Solution is to add a NULL check before calling
cdev_device_add() in vduse_create_dev().

kobject: (0000000072042169): attempted to be registered with empty name!
WARNING: CPU: 0 PID: 112695 at lib/kobject.c:236
Call Trace:
 kobject_add_varg linux/src/lib/kobject.c:390 [inline]
 kobject_add+0xf6/0x150 linux/src/lib/kobject.c:442
 device_add+0x28f/0xc20 linux/src/drivers/base/core.c:2167
 cdev_device_add+0x83/0xc0 linux/src/fs/char_dev.c:546
 vduse_create_dev linux/src/drivers/vdpa/vdpa_user/vduse_dev.c:2254 [inline]
 vduse_ioctl+0x7b5/0xf30 linux/src/drivers/vdpa/vdpa_user/vduse_dev.c:2316
 vfs_ioctl linux/src/fs/ioctl.c:47 [inline]
 file_ioctl linux/src/fs/ioctl.c:510 [inline]
 do_vfs_ioctl+0x14b/0xa80 linux/src/fs/ioctl.c:697
 ksys_ioctl+0x7c/0xa0 linux/src/fs/ioctl.c:714
 __do_sys_ioctl linux/src/fs/ioctl.c:721 [inline]
 __se_sys_ioctl linux/src/fs/ioctl.c:719 [inline]
 __x64_sys_ioctl+0x42/0x50 linux/src/fs/ioctl.c:719
 do_syscall_64+0x94/0x330 linux/src/arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: c8a6153b6c ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Cc: "Xie Yongji" <xieyongji@bytedance.com>
Reported-by: Xianjun Zeng <zengxianjun@bytedance.com>
Signed-off-by: Sheng Zhao <sheng.zhao@bytedance.com>
Message-Id: <20230530033626.1266794-1-sheng.zhao@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Cc: "Michael S. Tsirkin"<mst@redhat.com>, "Jason Wang"<jasowang@redhat.com>,
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-06-08 15:43:08 -04:00
Linus Torvalds 8ccd54fe45 virtio,vhost,vdpa: features, fixes, cleanups
reduction in interrupt rate in virtio
 perf improvement for VDUSE
 scalability for vhost-scsi
 non power of 2 ring support for packed rings
 better management for mlx5 vdpa
 suspend for snet
 VIRTIO_F_NOTIFICATION_DATA
 shared backend with vdpa-sim-blk
 user VA support in vdpa-sim
 better struct packing for virtio
 
 fixes, cleanups all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmRG+QcPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpMyAIALpq8Z9ljl7ADGLuvt/xeCnIdifo7NXam71s
 +algalRplF3QplnMxZ0vH19Z8Gvyl18fkk/l0tHoCrZZgyseYR6DbyZXPv8YIfFh
 NSBokhil+ZURH6eNJc2PLcBUF3QIL3rSv7tBq7/++PN3KIqdHIePbyUFLlwqb272
 NLkOkHT30QBtncRWJORj/GqDxi/4H1zHDmfMd6xD/1B6IrC3gin205RnLuCa2H65
 bP0IE025VrmrRqNGX7nhi7dIFo6SmMPwG5O0YWeEhFHaSOL9PJM/Z9EN4tLhC1v1
 Y34fryH9e+MMSgBnCK2ExxTq/pGWsbhPbvisDfDf3M1m1HHfhYI=
 =N1SV
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "virtio,vhost,vdpa: features, fixes, and cleanups:

   - reduction in interrupt rate in virtio

   - perf improvement for VDUSE

   - scalability for vhost-scsi

   - non power of 2 ring support for packed rings

   - better management for mlx5 vdpa

   - suspend for snet

   - VIRTIO_F_NOTIFICATION_DATA

   - shared backend with vdpa-sim-blk

   - user VA support in vdpa-sim

   - better struct packing for virtio

  and fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (52 commits)
  vhost_vdpa: fix unmap process in no-batch mode
  MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS
  tools/virtio: fix build caused by virtio_ring changes
  virtio_ring: add a struct device forward declaration
  vdpa_sim_blk: support shared backend
  vdpa_sim: move buffer allocation in the devices
  vdpa/snet: use likely/unlikely macros in hot functions
  vdpa/snet: implement kick_vq_with_data callback
  virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support
  virtio: add VIRTIO_F_NOTIFICATION_DATA feature support
  vdpa/snet: support the suspend vDPA callback
  vdpa/snet: support getting and setting VQ state
  MAINTAINERS: add vringh.h to Virtio Core and Net Drivers
  vringh: address kdoc warnings
  vdpa: address kdoc warnings
  virtio_ring: don't update event idx on get_buf
  vdpa_sim: add support for user VA
  vdpa_sim: replace the spinlock with a mutex to protect the state
  vdpa_sim: use kthread worker
  vdpa_sim: make devices agnostic for work management
  ...
2023-04-27 17:05:34 -07:00
Xie Yongji b774f93d87 vduse: Support specifying bounce buffer size via sysfs
As discussed in [1], this adds sysfs interface to support
specifying bounce buffer size in virtio-vdpa case. It would
be a performance tuning parameter for high throughput workloads.

[1] https://lore.kernel.org/netdev/e8f25a35-9d45-69f9-795d-bdbbb90337a3@redhat.com/

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-12-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21 03:02:32 -04:00
Xie Yongji d4438d23ee vduse: Delay iova domain creation
Delay creating iova domain until the vduse device is
registered to vdpa bus.

This is a preparation for adding sysfs interface to
support specifying bounce buffer size for the iova
domain.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-11-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21 03:02:32 -04:00
Xie Yongji e38632dd71 vduse: Signal vq trigger eventfd directly if possible
Now the vdpa callback will associate an trigger
eventfd in some cases. For performance reasons,
VDUSE can signal it directly during irq injection.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-10-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21 03:02:32 -04:00
Xie Yongji 66640f4a6f vduse: Add sysfs interface for irq callback affinity
Add sysfs interface for each vduse virtqueue to
get/set the affinity for irq callback. This might
be useful for performance tuning when the irq callback
affinity mask contains more than one CPU.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-8-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21 03:02:32 -04:00
Xie Yongji bfae1648ec vduse: Support get_vq_affinity callback
This implements get_vq_affinity callback so that
the virtio-blk driver can build the blk-mq queues
based on the irq callback affinity.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-7-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21 03:02:32 -04:00
Xie Yongji 28f6288eb6 vduse: Support set_vq_affinity callback
Since virtio-vdpa bus driver already support interrupt
affinity spreading mechanism, let's implement the
set_vq_affinity callback to bring it to vduse device.
After we get the virtqueue's affinity, we can spread
IRQs between CPUs in the affinity mask, in a round-robin
manner, to run the irq callback.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-6-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21 03:02:32 -04:00
Xie Yongji 78885597b9 vduse: Refactor allocation for vduse virtqueues
Allocate memory for vduse virtqueues one by one instead of
doing one allocation for all of them.

This is a preparation for adding sysfs interface for virtqueues.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-5-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21 03:02:31 -04:00
Greg Kroah-Hartman 1aaba11da9 driver core: class: remove module * from class_create()
The module pointer in class_create() never actually did anything, and it
shouldn't have been requred to be set as a parameter even if it did
something.  So just remove it and fix up all callers of the function in
the kernel tree at the same time.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:16:33 +01:00
Harshit Mogalapalli 937c783aa3 vduse: Validate vq_num in vduse_validate_config()
Add a limit to 'config->vq_num' which is user controlled data which
comes from an vduse_ioctl to prevent large memory allocations.

Micheal says  - This limit is somewhat arbitrary.
However, currently virtio pci and ccw are limited to a 16 bit vq number.
While MMIO isn't it is also isn't used with lots of VQs due to
current lack of support for per-vq interrupts.
Thus, the 0xffff limit on number of VQs corresponding
to a 16-bit VQ number seems sufficient for now.

This is found using static analysis with smatch.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Message-Id: <20221128155717.2579992-1-harshit.m.mogalapalli@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-12-28 05:28:11 -05:00
Greg Kroah-Hartman ff62b8e658 driver core: make struct class.devnode() take a const *
The devnode() in struct class should not be modifying the device that is
passed into it, so mark it as a const * and propagate the function
signature changes out into all relevant subsystems that use this
callback.

Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Justin Sanders <justin@coraid.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: John Stultz <jstultz@google.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Sean Young <sean@mess.org>
Cc: Frank Haverkamp <haver@linux.ibm.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Xie Yongji <xieyongji@bytedance.com>
Cc: Gautam Dawar <gautam.dawar@xilinx.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Eli Cohen <elic@nvidia.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: alsa-devel@alsa-project.org
Cc: dri-devel@lists.freedesktop.org
Cc: kvm@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-block@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20221123122523.1332370-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-24 17:12:27 +01:00
Maxime Coquelin 46f8a29272 vduse: prevent uninitialized memory accesses
If the VDUSE application provides a smaller config space
than the driver expects, the driver may use uninitialized
memory from the stack.

This patch prevents it by initializing the buffer passed by
the driver to store the config value.

This fix addresses CVE-2022-2308.

Cc: stable@vger.kernel.org # v5.15+
Fixes: c8a6153b6c ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20220831154923.97809-1-maxime.coquelin@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-09-27 18:32:45 -04:00
Xie Yongji ad146355bf vduse: Support querying information of IOVA regions
This introduces a new ioctl: VDUSE_IOTLB_GET_INFO to
support querying some information of IOVA regions.

Now it can be used to query whether the IOVA region
supports userspace memory registration.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20220803045523.23851-6-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11 04:26:08 -04:00
Xie Yongji 79a463be9e vduse: Support registering userspace memory for IOVA regions
Introduce two ioctls: VDUSE_IOTLB_REG_UMEM and
VDUSE_IOTLB_DEREG_UMEM to support registering
and de-registering userspace memory for IOVA
regions.

Now it only supports registering userspace memory
for bounce buffer region in virtio-vdpa case.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220803045523.23851-5-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:26:08 -04:00
Parav Pandit 0e0348ac3f vduse: Tie vduse mgmtdev and its device
vduse devices are not backed by any real devices such as PCI. Hence it
doesn't have any parent device linked to it.

Kernel driver model in [1] suggests to avoid an empty device
release callback.

Hence tie the mgmtdevice object's life cycle to an allocate dummy struct
device instead of static one.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/core-api/kobject.rst?h=v5.18-rc7#n284

Signed-off-by: Parav Pandit <parav@nvidia.com>
Message-Id: <20220613195223.473966-1-parav@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-06-24 02:49:48 -04:00
Xie Yongji b27ee76c74 vduse: Fix NULL pointer dereference on sysfs access
The control device has no drvdata. So we will get a
NULL pointer dereference when accessing control
device's msg_timeout attribute via sysfs:

[ 132.841881][ T3644] BUG: kernel NULL pointer dereference, address: 00000000000000f8
[ 132.850619][ T3644] RIP: 0010:msg_timeout_show (drivers/vdpa/vdpa_user/vduse_dev.c:1271)
[ 132.869447][ T3644] dev_attr_show (drivers/base/core.c:2094)
[ 132.870215][ T3644] sysfs_kf_seq_show (fs/sysfs/file.c:59)
[ 132.871164][ T3644] ? device_remove_bin_file (drivers/base/core.c:2088)
[ 132.872082][ T3644] kernfs_seq_show (fs/kernfs/file.c:164)
[ 132.872838][ T3644] seq_read_iter (fs/seq_file.c:230)
[ 132.873578][ T3644] ? __vmalloc_area_node (mm/vmalloc.c:3041)
[ 132.874532][ T3644] kernfs_fop_read_iter (fs/kernfs/file.c:238)
[ 132.875513][ T3644] __kernel_read (fs/read_write.c:440 (discriminator 1))
[ 132.876319][ T3644] kernel_read (fs/read_write.c:459)
[ 132.877129][ T3644] kernel_read_file (fs/kernel_read_file.c:94)
[ 132.877978][ T3644] kernel_read_file_from_fd (include/linux/file.h:45 fs/kernel_read_file.c:186)
[ 132.879019][ T3644] __do_sys_finit_module (kernel/module.c:4207)
[ 132.879930][ T3644] __ia32_sys_finit_module (kernel/module.c:4189)
[ 132.880930][ T3644] do_int80_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:132)
[ 132.881847][ T3644] entry_INT80_compat (arch/x86/entry/entry_64_compat.S:419)

To fix it, don't create the unneeded attribute for
control device anymore.

Fixes: c8a6153b6c ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20220426073656.229-1-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-06-08 08:56:03 -04:00
Gautam Dawar db9adcbf42 vdpa: multiple address spaces support
This patches introduces the multiple address spaces support for vDPA
device. This idea is to identify a specific address space via an
dedicated identifier - ASID.

During vDPA device allocation, vDPA device driver needs to report the
number of address spaces supported by the device then the DMA mapping
ops of the vDPA device needs to be extended to support ASID.

This helps to isolate the environments for the virtqueue that will not
be assigned directly. E.g in the case of virtio-net, the control
virtqueue will not be assigned directly to guest.

As a start, simply claim 1 virtqueue groups and 1 address spaces for
all vDPA devices. And vhost-vDPA will simply reject the device with
more than 1 virtqueue groups or address spaces.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Gautam Dawar <gdawar@xilinx.com>
Message-Id: <20220330180436.24644-7-gdawar@xilinx.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31 12:44:27 -04:00
Gautam Dawar d4821902e4 vdpa: introduce virtqueue groups
This patch introduces virtqueue groups to vDPA device. The virtqueue
group is the minimal set of virtqueues that must share an address
space. And the address space identifier could only be attached to
a specific virtqueue group.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Gautam Dawar <gdawar@xilinx.com>
Message-Id: <20220330180436.24644-6-gdawar@xilinx.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31 12:44:27 -04:00
Eli Cohen a64917bc2e vdpa: Provide interface to read driver features
Provide an interface to read the negotiated features. This is needed
when building the netlink message in vdpa_dev_net_config_fill().

Also fix the implementation of vdpa_dev_net_config_fill() to use the
negotiated features instead of the device features.

To make APIs clearer, make the following name changes to struct
vdpa_config_ops so they better describe their operations:

get_features -> get_device_features
set_features -> set_driver_features

Finally, add get_driver_features to return the negotiated features and
add implementation to all the upstream drivers.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:53 -05:00
Guanjun b4d80c8dda vduse: moving kvfree into caller
This free action should be moved into caller 'vduse_ioctl' in
concert with the allocation.

No functional change.

Signed-off-by: Guanjun <guanjun@linux.alibaba.com>
Link: https://lore.kernel.org/r/1638780498-55571-1-git-send-email-guanjun@linux.alibaba.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:52 -05:00
Dan Carpenter dc1db0060c vduse: check that offset is within bounds in get_config()
This condition checks "len" but it does not check "offset" and that
could result in an out of bounds read if "offset > dev->config_size".
The problem is that since both variables are unsigned the
"dev->config_size - offset" subtraction would result in a very high
unsigned value.

I think these checks might not be necessary because "len" and "offset"
are supposed to already have been validated using the
vhost_vdpa_config_validate() function.  But I do not know the code
perfectly, and I like to be safe.

Fixes: c8a6153b6c ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20211208150956.GA29160@kili
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
2021-12-08 14:53:15 -05:00
Dan Carpenter ff9f9c6e74 vduse: fix memory corruption in vduse_dev_ioctl()
The "config.offset" comes from the user.  There needs to a check to
prevent it being out of bounds.  The "config.offset" and
"dev->config_size" variables are both type u32.  So if the offset if
out of bounds then the "dev->config_size - config.offset" subtraction
results in a very high u32 value.  The out of bounds offset can result
in memory corruption.

Fixes: c8a6153b6c ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20211208103307.GA3778@kili
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
2021-12-08 14:53:15 -05:00
Parav Pandit d8ca2fa5be vdpa: Enable user to set mac and mtu of vdpa device
$ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu 9000

$ vdpa dev config show
bar: mac 00:11:22:33:44:55 link up link_announce false mtu 9000

$ vdpa dev config show -jp
{
    "config": {
        "bar": {
            "mac": "00:11:22:33:44:55",
            "link ": "up",
            "link_announce ": false,
            "mtu": 9000,
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>

Link: https://lore.kernel.org/r/20211026175519.87795-5-parav@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-11-01 05:26:49 -04:00
Xie Yongji 0943aacf5a vduse: Fix race condition between resetting and irq injecting
The interrupt might be triggered after a reset since there is
no synchronization between resetting and irq injecting. And it
might break something if the interrupt is delayed until a new
round of device initialization.

Fixes: c8a6153b6c ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210929083050.88-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-22 06:49:14 -04:00
Xie Yongji 1394103fd7 vduse: Disallow injecting interrupt before DRIVER_OK is set
The interrupt callback should not be triggered before DRIVER_OK
is set. Otherwise, it might break the virtio device driver.
So let's add a check to avoid the unexpected behavior.

Fixes: c8a6153b6c ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210923075722.98-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-22 06:49:14 -04:00
Xie Yongji 7bb5fb2073 vduse: Cleanup the old kernel states after reset failure
We should cleanup the old kernel states e.g. interrupt callback
no matter whether the userspace handle the reset correctly or not
since virtio-vdpa can't handle the reset failure now.

Otherwise, the old state might be used after reset which might
break something, e.g. the old interrupt callback might be triggered
by userspace after reset, which can break the virtio device driver.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210906142158.181-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-09-14 18:10:43 -04:00
Dan Carpenter 6243e3c78a vduse: missing error code in vduse_init()
This should return -ENOMEM if alloc_workqueue() fails.  Currently it
returns success.

Fixes: b66219796563 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20210907073223.GA18254@kili
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-09-14 18:10:42 -04:00
Linus Torvalds 78e709522d virtio,vdpa,vhost: features, fixes
vduse driver supporting blk
 virtio-vsock support for end of record with SEQPACKET
 vdpa: mac and mq support for ifcvf and mlx5
 vdpa: management netlink for ifcvf
 virtio-i2c, gpio dt bindings
 
 misc fixes, cleanups
 
 NB: when merging this with
 b542e383d8 ("eventfd: Make signal recursion protection a task bit")
 from Linus' tree, replace eventfd_signal_count with
 eventfd_signal_allowed, and drop the export of eventfd_wake_count from
 ("eventfd: Export eventfd_wake_count to modules").
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmE1+awPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpt6EIAJy0qrc62lktNA0IiIVJSLbUbTMmFj8MzkGR
 8UxZdhpjWqBPJPyaOuNeksAqTGm/UAPEYx3C2c95Jhej7anFpy7dbCtIXcPHLJME
 DjcJg+EDrlNCj8m0FcsHpHWsFzPMERJpyEZNxgB5WazQbv+yWhGrg2FN5DCnF0Ro
 ZFYeKSVty148pQ0nHl8X0JM2XMtqit+O+LvKN2HQZ+fubh7BCzMxzkHY0QLHIzUS
 UeZqd3Qm8YcbqnlX38P5D6k+NPiTEgknmxaBLkPxg6H3XxDAmaIRFb8Ldd1rsgy1
 zTLGDiSGpVDIpawRnuEAzqJThV3Y5/MVJ1WD+mDYQ96tmhfp+KY=
 =DBH/
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - vduse driver ("vDPA Device in Userspace") supporting emulated virtio
   block devices

 - virtio-vsock support for end of record with SEQPACKET

 - vdpa: mac and mq support for ifcvf and mlx5

 - vdpa: management netlink for ifcvf

 - virtio-i2c, gpio dt bindings

 - misc fixes and cleanups

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits)
  Documentation: Add documentation for VDUSE
  vduse: Introduce VDUSE - vDPA Device in Userspace
  vduse: Implement an MMU-based software IOTLB
  vdpa: Support transferring virtual addressing during DMA mapping
  vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()
  vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()
  vhost-iotlb: Add an opaque pointer for vhost IOTLB
  vhost-vdpa: Handle the failure of vdpa_reset()
  vdpa: Add reset callback in vdpa_config_ops
  vdpa: Fix some coding style issues
  file: Export receive_fd() to modules
  eventfd: Export eventfd_wake_count to modules
  iova: Export alloc_iova_fast() and free_iova_fast()
  virtio-blk: remove unneeded "likely" statements
  virtio-balloon: Use virtio_find_vqs() helper
  vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro
  vsock_test: update message bounds test for MSG_EOR
  af_vsock: rename variables in receive loop
  virtio/vsock: support MSG_EOR bit processing
  vhost/vsock: support MSG_EOR bit processing
  ...
2021-09-11 14:48:42 -07:00
Xie Yongji 7bc7f61897 Documentation: Add documentation for VDUSE
VDUSE (vDPA Device in Userspace) is a framework to support
implementing software-emulated vDPA devices in userspace. This
document is intended to clarify the VDUSE design and usage.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-14-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 07:20:58 -04:00
Xie Yongji c8a6153b6c vduse: Introduce VDUSE - vDPA Device in Userspace
This VDUSE driver enables implementing software-emulated vDPA
devices in userspace. The vDPA device is created by
ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device
interface (/dev/vduse/$NAME) is exported to userspace for device
emulation.

In order to make the device emulation more secure, the device's
control path is handled in kernel. A message mechnism is introduced
to forward some dataplane related control messages to userspace.

And in the data path, the DMA buffer will be mapped into userspace
address space through different ways depending on the vDPA bus to
which the vDPA device is attached. In virtio-vdpa case, the MMU-based
software IOTLB is used to achieve that. And in vhost-vdpa case, the
DMA buffer is reside in a userspace memory region which can be shared
to the VDUSE userspace processs via transferring the shmfd.

For more details on VDUSE design and usage, please see the follow-on
Documentation commit.

NB(mst): when merging this with
b542e383d8 ("eventfd: Make signal recursion protection a task bit")
replace eventfd_signal_count with eventfd_signal_allowed,
and drop the previous
("eventfd: Export eventfd_wake_count to modules").

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-13-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 07:20:58 -04:00