linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-27 21:03:32 +00:00

Author	SHA1	Message	Date
Conrad Kostecki	6cd8adc3e1	ahci: asm1064: asm1166: don't limit reported ports Previously, patches have been added to limit the reported count of SATA ports for asm1064 and asm1166 SATA controllers, as those controllers do report more ports than physically having. While it is allowed to report more ports than physically having in CAP.NP, it is not allowed to report more ports than physically having in the PI (Ports Implemented) register, which is what these HBAs do. (This is a AHCI spec violation.) Unfortunately, it seems that the PMP implementation in these ASMedia HBAs is also violating the AHCI and SATA-IO PMP specification. What these HBAs do is that they do not report that they support PMP (CAP.SPM (Supports Port Multiplier) is not set). Instead, they have decided to add extra "virtual" ports in the PI register that is used if a port multiplier is connected to any of the physical ports of the HBA. Enumerating the devices behind the PMP as specified in the AHCI and SATA-IO specifications, by using PMP READ and PMP WRITE commands to the physical ports of the HBA is not possible, you have to use the "virtual" ports. This is of course bad, because this gives us no way to detect the device and vendor ID of the PMP actually connected to the HBA, which means that we can not apply the proper PMP quirks for the PMP that is connected to the HBA. Limiting the port map will thus stop these controllers from working with SATA Port Multipliers. This patch reverts both patches for asm1064 and asm1166, so old behavior is restored and SATA PMP will work again, but it will also reintroduce the (minutes long) extra boot time for the ASMedia controllers that do not have a PMP connected (either on the PCIe card itself, or an external PMP). However, a longer boot time for some, is the lesser evil compared to some other users not being able to detect their drives at all. Fixes: `0077a504e1` ("ahci: asm1166: correct count of reported ports") Fixes: `9815e39617` ("ahci: asm1064: correct count of reported ports") Cc: stable@vger.kernel.org Reported-by: Matt <cryptearth@googlemail.com> Signed-off-by: Conrad Kostecki <conikost@gentoo.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> [cassel: rewrote commit message] Signed-off-by: Niklas Cassel <cassel@kernel.org>	2024-03-19 12:06:54 +01:00
Jakub Kicinski	9966e329d6	tools: ynl: add header guards for nlctrl I "extracted" YNL C into a GitHub repo to make it easier to use in other projects: https://github.com/linux-netdev/ynl-c GitHub actions use Ubuntu by default, and the kernel headers there are missing `f329a0ebea` ("genetlink: correct uAPI defines"). Add the direct include workaround for nlctrl. Fixes: `768e044a5f` ("doc/netlink/specs: Add spec for nlctrl netlink family") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240315002108.523232-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:33:02 +01:00
Paolo Abeni	710fe438e3	Merge branch 'wireguard-fixes-for-6-9-rc1' Jason A. Donenfeld says: ==================== wireguard fixes for 6.9-rc1 This series has four WireGuard fixes: 1) Annotate a data race that KCSAN found by using READ_ONCE/WRITE_ONCE, which has been causing syzkaller noise. 2) Use the generic netdev tstats allocation and stats getters instead of doing this within the driver. 3) Explicitly check a flag variable instead of an empty list in the netlink code, to prevent a UaF situation when paging through GET results during a remove-all SET operation. 4) Set a flag in the RISC-V CI config so the selftests continue to boot. ==================== Link: https://lore.kernel.org/r/20240314224911.6653-1-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:54 +01:00
Jason A. Donenfeld	e995f5dd9a	wireguard: selftests: set RISCV_ISA_FALLBACK on riscv{32,64} This option is needed to continue booting with QEMU. Recent changes that made this optional meant that it gets unset in the test harness, and so WireGuard CI has been broken. Fix this by simply setting this option. Cc: stable@vger.kernel.org Fixes: `496ea826d1` ("RISC-V: provide Kconfig & commandline options to control parsing "riscv,isa"") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:50 +01:00
Jason A. Donenfeld	71cbd32e3d	wireguard: netlink: access device through ctx instead of peer The previous commit fixed a bug that led to a NULL peer->device being dereferenced. It's actually easier and faster performance-wise to instead get the device from ctx->wg. This semantically makes more sense too, since ctx->wg->peer_allowedips.seq is compared with ctx->allowedips_seq, basing them both in ctx. This also acts as a defence in depth provision against freed peers. Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:50 +01:00
Jason A. Donenfeld	55b6c73867	wireguard: netlink: check for dangling peer via is_dead instead of empty list If all peers are removed via wg_peer_remove_all(), rather than setting peer_list to empty, the peer is added to a temporary list with a head on the stack of wg_peer_remove_all(). If a netlink dump is resumed and the cursored peer is one that has been removed via wg_peer_remove_all(), it will iterate from that peer and then attempt to dump freed peers. Fix this by instead checking peer->is_dead, which was explictly created for this purpose. Also move up the device_update_lock lockdep assertion, since reading is_dead relies on that. It can be reproduced by a small script like: echo "Setting config..." ip link add dev wg0 type wireguard wg setconf wg0 /big-config ( while true; do echo "Showing config..." wg showconf wg0 > /dev/null done ) & sleep 4 wg setconf wg0 <(printf "[Peer]\nPublicKey=$(wg genkey)\n") Resulting in: BUG: KASAN: slab-use-after-free in __lock_acquire+0x182a/0x1b20 Read of size 8 at addr ffff88811956ec70 by task wg/59 CPU: 2 PID: 59 Comm: wg Not tainted 6.8.0-rc2-debug+ #5 Call Trace: <TASK> dump_stack_lvl+0x47/0x70 print_address_description.constprop.0+0x2c/0x380 print_report+0xab/0x250 kasan_report+0xba/0xf0 __lock_acquire+0x182a/0x1b20 lock_acquire+0x191/0x4b0 down_read+0x80/0x440 get_peer+0x140/0xcb0 wg_get_device_dump+0x471/0x1130 Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Reported-by: Lillian Berry <lillian@star-ark.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:50 +01:00
Breno Leitao	df9bbb5e77	wireguard: device: remove generic .ndo_get_stats64 Commit `3e2f544dd8` ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:49 +01:00
Breno Leitao	db2952dfbd	wireguard: device: leverage core stats allocator With commit `34d21de99c` ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in this driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:49 +01:00
Nikita Zhandarovich	bba045dc4d	wireguard: receive: annotate data-race around receiving_counter.counter Syzkaller with KCSAN identified a data-race issue when accessing keypair->receiving_counter.counter. Use READ_ONCE() and WRITE_ONCE() annotations to mark the data race as intentional. BUG: KCSAN: data-race in wg_packet_decrypt_worker / wg_packet_rx_poll write to 0xffff888107765888 of 8 bytes by interrupt on cpu 0: counter_validate drivers/net/wireguard/receive.c:321 [inline] wg_packet_rx_poll+0x3ac/0xf00 drivers/net/wireguard/receive.c:461 __napi_poll+0x60/0x3b0 net/core/dev.c:6536 napi_poll net/core/dev.c:6605 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6738 __do_softirq+0xc4/0x279 kernel/softirq.c:553 do_softirq+0x5e/0x90 kernel/softirq.c:454 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x6c5/0x700 drivers/net/wireguard/receive.c:499 process_one_work kernel/workqueue.c:2633 [inline] ... read to 0xffff888107765888 of 8 bytes by task 3196 on cpu 1: decrypt_packet drivers/net/wireguard/receive.c:252 [inline] wg_packet_decrypt_worker+0x220/0x700 drivers/net/wireguard/receive.c:501 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 ... Fixes: `a9e90d9931` ("wireguard: noise: separate receive counter from send counter") Reported-by: syzbot+d1de830e4ecdaac83d89@syzkaller.appspotmail.com Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:49 +01:00
Eric Dumazet	f6e0a4984c	net: move dev->state into net_device_read_txrx group dev->state can be read in rx and tx fast paths. netif_running() which needs dev->state is called from - enqueue_to_backlog() [RX path] - __dev_direct_xmit() [TX path] Fixes: `43a71cd66b` ("net-device: reorganize net_device fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240314200845.3050179-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 10:47:47 +01:00
Frederic Weisbecker	0387703986	timers: Fix removed self-IPI on global timer's enqueue in nohz_full While running in nohz_full mode, a task may enqueue a timer while the tick is stopped. However the only places where the timer wheel, alongside the timer migration machinery's decision, may reprogram the next event accordingly with that new timer's expiry are the idle loop or any IRQ tail. However neither the idle task nor an interrupt may run on the CPU if it resumes busy work in userspace for a long while in full dynticks mode. To solve this, the timer enqueue path raises a self-IPI that will re-evaluate the timer wheel on its IRQ tail. This asynchronous solution avoids potential locking inversion. This is supposed to happen both for local and global timers but commit: `b2cf7507e1` ("timers: Always queue timers on the local CPU") broke the global timers case with removing the ->is_idle field handling for the global base. As a result, global timers enqueue may go unnoticed in nohz_full. Fix this with restoring the idle tracking of the global timer's base, allowing self-IPIs again on enqueue time. Fixes: `b2cf7507e1` ("timers: Always queue timers on the local CPU") Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240318230729.15497-3-frederic@kernel.org	2024-03-19 10:14:55 +01:00
Frederic Weisbecker	f55acb1e44	timers/migration: Fix endless timer requeue after idle interrupts When a CPU is an idle migrator, but another CPU wakes up before it, becomes an active migrator and handles the queue, the initial idle migrator may end up endlessly reprogramming its clockevent, chasing ghost timers forever such as in the following scenario: [GRP0:0] migrator = 0 active = 0 nextevt = T1 / \ 0 1 active idle (T1) 0) CPU 1 is idle and has a timer queued (T1), CPU 0 is active and is the active migrator. [GRP0:0] migrator = NONE active = NONE nextevt = T1 / \ 0 1 idle idle (T1) wakeup = T1 1) CPU 0 is now idle and is therefore the idle migrator. It has programmed its next timer interrupt to handle T1. [GRP0:0] migrator = 1 active = 1 nextevt = KTIME_MAX / \ 0 1 idle active wakeup = T1 2) CPU 1 has woken up, it is now active and it has just handled its own timer T1. 3) CPU 0 gets a timer interrupt to handle T1 but tmigr_handle_remote() realize it is not the migrator anymore. So it early returns without observing that T1 has been expired already and therefore without updating its ->wakeup value. 4) CPU 0 goes into tmigr_cpu_new_timer() which also early returns because it doesn't queue a timer of its own. So ->wakeup is left unchanged and the next timer is programmed to fire now. 5) goto 3) forever This results in timer interrupt storms in idle and also in nohz_full (as observed in rcutorture's TREE07 scenario). Fix this with forcing a re-evaluation of tmc->wakeup while trying remote timer handling when the CPU isn't the migrator anymmore. The check is inherently racy but in the worst case the CPU just races setting the KTIME_MAX value that a remote expiry also tries to set. Fixes: `7ee9887703` ("timers: Implement the hierarchical pull model") Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240318230729.15497-2-frederic@kernel.org	2024-03-19 10:14:55 +01:00
Yuli Wang	fea1c949f6	LoongArch/crypto: Clean up useless assignment operations The LoongArch CRC32 hw acceleration is based on arch/mips/crypto/ crc32-mips.c. While the MIPS code supports both MIPS32 and MIPS64, but LoongArch32 lacks the CRC instruction. As a result, the line "len -= sizeof(u32)" is unnecessary. Removing it can make context code style more unified and improve code readability. Cc: stable@vger.kernel.org Reviewed-by: WANG Xuerui <git@xen0n.name> Suggested-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	9c68ece8b2	LoongArch: Define the __io_aw() hook as mmiowb() Commit `fb24ea52f7` ("drivers: Remove explicit invocations of mmiowb()") remove all mmiowb() in drivers, but it says: "NOTE: mmiowb() has only ever guaranteed ordering in conjunction with spin_unlock(). However, pairing each mmiowb() removal in this patch with the corresponding call to spin_unlock() is not at all trivial, so there is a small chance that this change may regress any drivers incorrectly relying on mmiowb() to order MMIO writes between CPUs using lock-free synchronisation." The mmio in radeon_ring_commit() is protected by a mutex rather than a spinlock, but in the mutex fastpath it behaves similar to spinlock. We can add mmiowb() calls in the radeon driver but the maintainer says he doesn't like such a workaround, and radeon is not the only example of mutex protected mmio. So we should extend the mmiowb tracking system from spinlock to mutex, and maybe other locking primitives. This is not easy and error prone, so we solve it in the architectural code, by simply defining the __io_aw() hook as mmiowb(). And we no longer need to override queued_spin_unlock() so use the generic definition. Without this, we get such an error when run 'glxgears' on weak ordering architectures such as LoongArch: radeon 0000:04:00.0: ring 0 stalled for more than 10324msec radeon 0000:04:00.0: ring 3 stalled for more than 10240msec radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3) radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@amd.com/T/#t Link: https://lore.kernel.org/linux-arch/20240301130532.3953167-1-chenhuacai@loongson.cn/T/#t Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	82bf60a6fe	LoongArch: Remove superfluous flush_dcache_page() definition LoongArch doesn't have cache aliases, so flush_dcache_page() is a no-op. There is a generic implementation for this case in include/asm-generic/ cacheflush.h. So remove the superfluous flush_dcache_page() definition, which also silences such build warnings: In file included from crypto/scompress.c:12: include/crypto/scatterwalk.h: In function 'scatterwalk_pagedone': include/crypto/scatterwalk.h:76:30: warning: variable 'page' set but not used [-Wunused-but-set-variable] 76 \| struct page page; \| ^~~~ crypto/scompress.c: In function 'scomp_acomp_comp_decomp': >> crypto/scompress.c:174:38: warning: unused variable 'dst_page' [-Wunused-variable] 174 \| struct page dst_page = sg_page(req->dst); \| Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202403091614.NeUw5zcv-lkp@intel.com/ Suggested-by: Barry Song <baohua@kernel.org> Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Max Kellermann	d42ab9af60	LoongArch: Move {dmw,tlb}_virt_to_page() definition to page.h These two functions are implemented in pgtable.c, and they are needed only by the virt_to_page() macro in page.h. Having the prototypes in pgtable.h causes a circular dependency between page.h and pgtable.h, because the virt_to_page() macro in page.h needs pgtable.h for these two functions, while pgtable.h needs various definitions from page.h (e.g. pte_t and pgt_t). Let's avoid this circular dependency by moving the function prototypes to page.h. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	c87e12e0e8	LoongArch: Change __my_cpu_offset definition to avoid mis-optimization From GCC commit 3f13154553f8546a ("df-scan: remove ad-hoc handling of global regs in asms"), global registers will no longer be forced to add to the def-use chain. Then current_thread_info(), current_stack_pointer and __my_cpu_offset may be lifted out of the loop because they are no longer treated as "volatile variables". This optimization is still correct for the current_thread_info() and current_stack_pointer usages because they are associated to a thread. However it is wrong for __my_cpu_offset because it is associated to a CPU rather than a thread: if the thread migrates to a different CPU in the loop, __my_cpu_offset should be changed. Change __my_cpu_offset definition to treat it as a "volatile variable", in order to avoid such a mis-optimization. Cc: stable@vger.kernel.org Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn> Reported-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Xing Li <lixing@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Rui Wang <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	f48ad26e5e	LoongArch: Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig This allocates the VM flag needed to support the userfaultfd minor fault functionality. See commit `7677f7fd8b` ("userfaultfd: add minor fault registration mode") for more information. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	8b5db5e533	LoongArch: Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig LoongArch has implemented the current_stack_pointer macro, so select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig. This will let it be used in non-arch places (like HARDENED_USERCOPY). Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:27 +08:00
Xuan Zhuo	5da7137de7	virtio_net: rename free_old_xmit_skbs to free_old_xmit Since free_old_xmit_skbs not only deals with skb, but also xdp frame and subsequent added xsk, so change the name of this function to free_old_xmit. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240229072044.77388-19-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 03:19:22 -04:00
Xuan Zhuo	b1dc24aba7	virtio_net: unify the code for recycling the xmit ptr There are two completely similar and independent implementations. This is inconvenient for the subsequent addition of new types. So extract a function from this piece of code and call this function uniformly to recover old xmit ptr. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240229072044.77388-18-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 03:19:22 -04:00
Jason Wang	0d197a1471	virtio-net: add cond_resched() to the command waiting loop Adding cond_resched() to the command waiting loop for a better co-operation with the scheduler. This allows to give CPU a breath to run other task(workqueue) instead of busy looping when preemption is not allowed on a device whose CVQ might be slow. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230720083839.481487-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>	2024-03-19 03:19:22 -04:00
Jason Wang	b9f7425239	virtio-net: convert rx mode setting to use workqueue This patch convert rx mode setting to be done in a workqueue, this is a must for allow to sleep when waiting for the cvq command to response since current code is executed under addr spin lock. Note that we need to disable and flush the workqueue during freeze, this means the rx mode setting is lost after resuming. This is not the bug of this patch as we never try to restore rx mode setting during resume. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230720083839.481487-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>	2024-03-19 03:19:22 -04:00
Xuan Zhuo	d5c0ed17fe	virtio: packed: fix unmap leak for indirect desc table When use_dma_api and premapped are true, then the do_unmap is false. Because the do_unmap is false, vring_unmap_extra_packed is not called by detach_buf_packed. if (unlikely(vq->do_unmap)) { curr = id; for (i = 0; i < state->num; i++) { vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr]); curr = vq->packed.desc_extra[curr].next; } } So the indirect desc table is not unmapped. This causes the unmap leak. So here, we check vq->use_dma_api instead. Synchronously, dma info is updated based on use_dma_api judgment This bug does not occur, because no driver use the premapped with indirect. Fixes: `b319940f83` ("virtio_ring: skip unmap for premapped") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20240223071833.26095-1-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 03:19:22 -04:00
Zhu Lingshan	1ac61ddfee	vDPA: report virtio-blk flush info to user space This commit reports whether a virtio-blk device support cache flush command to user space Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-11-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	ae1374b7f7	vDPA: report virtio-block read-only info to user space This commit report read-only information of virtio-blk devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-10-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	6bdc7846e6	vDPA: report virtio-block write zeroes configuration to user space This commits reports write zeroes configuration of virtio-block devices to user space, includes: 1)maximum write zeroes sectors size 2)maximum write zeroes segment number Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-9-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	65848f46e1	vDPA: report virtio-block discarding configuration to user space This commit reports virtio-blk discarding configuration to user space,includes: 1) the maximum discard sectors 2) maximum number of discard segments for the block driver to use 3) the alignment for splitting a discarding request Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	c9d989b4ab	vDPA: report virtio-block topology info to user space This commit allows vDPA reporting topology information of virtio-blk devices to user space, includes: 1) the number of logical blocks per physical block 2) offset of first aligned logical block 3) suggested minimum I/O size in blocks 4) optimal (suggested maximum) I/O size in blocks Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	54fb04b02e	vDPA: report virtio-block MQ info to user space This commits allows vDPA reporting virtio-block multi-queue configuration to user sapce. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	81f64e1d91	vDPA: report virtio-block max segments in a request to user space This commit allows vDPA reporting the maximum number of segments in a request of virtio-block devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	3a1d33fb7e	vDPA: report virtio-block block-size to user space This commit allows reporting the block size of a virtio-block device to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	330b8aea69	vDPA: report virtio-block max segment size to user space This commit allows reporting the max size of any single segment of virtio-block devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	c2475a9a78	vDPA: report virtio-block capacity to user space This commit allows userspace to query capacity of a virtio-block device. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Ricardo B. Marliere	2b666ee296	virtio: make virtio_bus const Now that the driver core can properly handle constant struct bus_type, move the virtio_bus variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Message-Id: <20240204-bus_cleanup-virtio-v1-1-3bcb2212aaa0@marliere.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jason Wang <jasowang@redhat.com>	2024-03-19 02:45:50 -04:00
Ricardo B. Marliere	8169ed6220	vdpa: make vdpa_bus const Now that the driver core can properly handle constant struct bus_type, move the vdpa_bus variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Message-Id: <20240204-bus_cleanup-vdpa-v1-1-1745eccb0a5c@marliere.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	56d61ae558	vDPA/ifcvf: implement vdpa_config_ops.get_vq_num_min IFCVF HW supports operation with vq size less than the max size, as the spec required. This commit implements vdpa_config_ops.get_vq_num_min to report the minimal size of the virtqueues, which gives vDPA framework a chance to reduce the vring size. We need at least one descriptor to be functional, but it is better no less than 64 to meet ceratin performance requirements. Actually the framework would allocate at least a PAGE for the vq. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-11-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	cd21470674	vDPA/ifcvf: get_max_vq_size to return max size Since we already implemented vdpa_config_ops.get_vq_size, so get_max_vq_size can return the acutal max size of the virtqueues other than the max allowed safe size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-10-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	273ae08fa0	virtio_vdpa: create vqs with the actual size The size of a virtqueue is a per vq configuration, this commit allows virtio_vdpa to create virtqueues with the actual size of a specific vq size that supported by the backend device. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-9-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	47e62e6d62	vduse: implement vdpa_config_ops.get_vq_size for vduse This commit implements get_vq_size for vdpa_config_ops. This new interface is used to report per vq size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	f6fa2f7ea0	vdpa_sim: implement vdpa_config_ops.get_vq_size for vDPA simulator This commit implements vdpa_config_ops.get_vq_size for vDPA simulator, this new interface can help report per vq size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	1da13e6470	eni_vdpa: implement vdpa_config_ops.get_vq_size This commit implements get_vq_size which report per vq size in vdpa_config_ops Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	a97f9c8fcc	vp_vdpa: implement vdpa_config_ops.get_vq_size This commit implements vdpa_config_ops.get_vq_size in vp_vdpa, which reports per virtqueue size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	36503e5e06	vDPA/ifcvf: implement vdpa_config_ops.get_vq_size This commit implements vdpa_ops.get_vq_size to report the size of a specific virtqueue. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	0a926fc972	vDPA: introduce get_vq_size to vdpa_config_ops This commit introduces a new interface get_vq_size to vDPA config ops, this new interface intends to report the size of a specific virtqueue Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:50 -04:00
Zhu Lingshan	1496c47065	vhost-vdpa: uapi to support reporting per vq size The size of a virtqueue is a per vq configuration. This commit introduce a new ioctl uAPI to support this flexibility. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:49 -04:00
Shannon Nelson	ba6faaa68d	vdpa/pds: fixes for VF vdpa flr-aer handling This addresses a couple of things found while testing the FLR and AER handling with the VFs. - release irqs before calling vp_modern_remove() - make sure we have a valid struct pointer before using it to release irqs - make sure the FW is alive before trying to add a new device Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20240220011050.30913-1-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:49 -04:00
Maxime Coquelin	d7b4e3287c	vduse: implement DMA sync callbacks Since commit `295525e29a` ("virtio_net: merge dma operations when filling mergeable buffers"), VDUSE device require support for DMA's .sync_single_for_cpu() operation as the memory is non-coherent between the device and CPU because of the use of a bounce buffer. This patch implements both .sync_single_for_cpu() and .sync_single_for_device() callbacks, and also skip bounce buffer copies during DMA map and unmap operations if the DMA_ATTR_SKIP_CPU_SYNC attribute is set to avoid extra copies of the same buffer. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20240219170606.587290-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:49 -04:00
Jonah Palmer	749a401683	vdpa/mlx5: Allow CVQ size changes The MLX driver was not updating its control virtqueue size at set_vq_num and instead always initialized to MLX5_CVQ_MAX_ENT (16) at setup_cvq_vring. Qemu would try to set the size to 64 by default, however, because the CVQ size always was initialized to 16, an error would be thrown when sending >16 control messages (as used-ring entry 17 is initialized to 0). For example, starting a guest with x-svq=on and then executing the following command would produce the error below: # for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done qemu-system-x86_64: Insufficient written data (0) [ 435.331223] virtio_net virtio0: Failed to set mac address by vq command. SIOCSIFHWADDR: Invalid argument Acked-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20240216142502.78095-1-jonah.palmer@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Fixes: `5262912ef3` ("vdpa/mlx5: Add support for control VQ and MAC setting")	2024-03-19 02:45:49 -04:00
Steve Sistare	c4e8b5aed7	vdpa: skip suspend/resume ops if not DRIVER_OK If a vdpa device is not in state DRIVER_OK, then there is no driver state to preserve, so no need to call the suspend and resume driver ops. Suggested-by: Eugenio Perez Martin <eperezma@redhat.com>" Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Message-Id: <1707834358-165470-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>	2024-03-19 02:45:49 -04:00

... 3 4 5 6 7 ...

1264510 commits