No description
Find a file
Dave Airlie fff1386cc8 nouveau: fix instmem race condition around ptr stores
Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.

BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
FS:  00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:

...

 ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
 ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
 nvkm_vmm_iter+0x351/0xa20 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __lock_acquire+0x3ed/0x2170
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]

Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.

Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.

If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.

v2: use paired smp_rmb/smp_wmb.

Cc: <stable@vger.kernel.org>
Fixes: be55287aa5 ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com
2024-04-15 10:51:28 +02:00
arch EFI fixes for v6.9 #2 2024-03-24 13:54:06 -07:00
block vfs-6.9-rc1.fixes 2024-03-18 09:15:50 -07:00
certs This update includes the following changes: 2023-11-02 16:15:30 -10:00
crypto RISC-V Patches for the 6.9 Merge Window 2024-03-22 10:41:13 -07:00
Documentation A set of x86 fixes: 2024-03-24 11:13:56 -07:00
drivers nouveau: fix instmem race condition around ptr stores 2024-04-15 10:51:28 +02:00
fs A patch to minimize blockage when processing very large batches of 2024-03-22 11:15:45 -07:00
include Revert "drm/qxl: simplify qxl_fence_wait" 2024-04-05 15:10:31 +02:00
init RISC-V Patches for the 6.9 Merge Window 2024-03-22 10:41:13 -07:00
io_uring io_uring/sqpoll: early exit thread if task_context wasn't allocated 2024-03-18 20:22:42 -06:00
ipc sysctl changes for v6.9-rc1 2024-03-18 14:59:13 -07:00
kernel dma-mapping fixes for Linux 6.9 2024-03-24 10:45:31 -07:00
lib hardening fixes for v6.9-rc1 2024-03-23 08:43:21 -07:00
LICENSES
mm RISC-V Patches for the 6.9 Merge Window 2024-03-22 10:41:13 -07:00
net Including fixes from CAN, netfilter, wireguard and IPsec. 2024-03-21 14:50:39 -07:00
rust Kbuild updates for v6.9 2024-03-21 14:41:00 -07:00
samples Tracing updates for 6.9: 2024-03-18 15:11:44 -07:00
scripts LoongArch changes for v6.9 2024-03-22 10:22:45 -07:00
security - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min 2024-03-14 18:03:09 -07:00
sound sound fixes #2 for 6.9-rc2 2024-03-22 09:44:19 -07:00
tools RISC-V Patches for the 6.9 Merge Window 2024-03-22 10:41:13 -07:00
usr Kbuild updates for v6.8 2024-01-18 17:57:07 -08:00
virt KVM Xen and pfncache changes for 6.9: 2024-03-11 10:42:55 -04:00
.clang-format clang-format: Update with v6.7-rc4's for_each macro list 2023-12-08 23:54:38 +01:00
.cocciconfig
.editorconfig Add .editorconfig file for basic formatting 2023-12-28 16:22:47 +09:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes
.gitignore kbuild: create a list of all built DTB files 2024-02-19 18:20:39 +09:00
.mailmap Char/Misc and other driver subsystem updates for 6.9-rc1 2024-03-21 13:21:31 -07:00
.rustfmt.toml
COPYING
CREDITS Not a ton of stuff happening in the clk framework in this pull request. We got 2024-03-15 11:48:01 -07:00
Kbuild
Kconfig
MAINTAINERS RISC-V Patches for the 6.9 Merge Window 2024-03-22 10:41:13 -07:00
Makefile Linux 6.9-rc1 2024-03-24 14:10:05 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.