linux-stable/arch/loongarch/include/asm/Kbuild

# SPDX-License-Identifier: GPL-2.0
generated-y += orc_hash.h

generic-y += dma-contiguous.h
generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += early_ioremap.h
generic-y += qrwlock.h
generic-y += qspinlock.h
generic-y += rwsem.h
generic-y += segment.h
generic-y += user.h
generic-y += stat.h
generic-y += fcntl.h
generic-y += ioctl.h
generic-y += ioctls.h
generic-y += mman.h
generic-y += msgbuf.h
generic-y += sembuf.h
generic-y += shmbuf.h
generic-y += statfs.h
generic-y += socket.h
generic-y += sockios.h
generic-y += termbits.h
generic-y += poll.h
generic-y += param.h
generic-y += posix_types.h
generic-y += resource.h
generic-y += kvm_para.h
LoongArch: Add build infrastructure Add Kbuild, Makefile, Kconfig and link script for LoongArch build infrastructure. Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> 2022-05-31 10:04:11 +00:00			`# SPDX-License-Identifier: GPL-2.0`
LoongArch: Add ORC stack unwinder support The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is similar in concept to a DWARF unwinder. The difference is that the format of the ORC data is much simpler than DWARF, which in turn allows the ORC unwinder to be much simpler and faster. The ORC data consists of unwind tables which are generated by objtool. After analyzing all the code paths of a .o file, it determines information about the stack state at each instruction address in the file and outputs that information to the .orc_unwind and .orc_unwind_ip sections. The per-object ORC sections are combined at link time and are sorted and post-processed at boot time. The unwinder uses the resulting data to correlate instruction addresses with their stack states at run time. Most of the logic are similar with x86, in order to get ra info before ra is saved into stack, add ra_reg and ra_offset into orc_entry. At the same time, modify some arch-specific code to silence the objtool warnings. Co-developed-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Co-developed-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> 2024-03-11 14:23:47 +00:00			`generated-y += orc_hash.h`

LoongArch: Add build infrastructure Add Kbuild, Makefile, Kconfig and link script for LoongArch build infrastructure. Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> 2022-05-31 10:04:11 +00:00			`generic-y += dma-contiguous.h`
LoongArch: Add qspinlock support On NUMA system, the performance of qspinlock is better than generic spinlock. Below is the UnixBench test results on a 8 nodes (4 cores per node, 32 cores in total) machine. A. With generic spinlock: System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 449574022.5 38523.9 Double-Precision Whetstone 55.0 85190.4 15489.2 Execl Throughput 43.0 14696.2 3417.7 File Copy 1024 bufsize 2000 maxblocks 3960.0 143157.8 361.5 File Copy 256 bufsize 500 maxblocks 1655.0 37631.8 227.4 File Copy 4096 bufsize 8000 maxblocks 5800.0 444814.2 766.9 Pipe Throughput 12440.0 5047490.7 4057.5 Pipe-based Context Switching 4000.0 2021545.7 5053.9 Process Creation 126.0 23829.8 1891.3 Shell Scripts (1 concurrent) 42.4 33756.7 7961.5 Shell Scripts (8 concurrent) 6.0 4062.9 6771.5 System Call Overhead 15000.0 2479748.6 1653.2 ======== System Benchmarks Index Score 2955.6 B. With qspinlock: System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 449467876.9 38514.8 Double-Precision Whetstone 55.0 85174.6 15486.3 Execl Throughput 43.0 14769.1 3434.7 File Copy 1024 bufsize 2000 maxblocks 3960.0 146150.5 369.1 File Copy 256 bufsize 500 maxblocks 1655.0 37496.8 226.6 File Copy 4096 bufsize 8000 maxblocks 5800.0 447527.0 771.6 Pipe Throughput 12440.0 5175989.2 4160.8 Pipe-based Context Switching 4000.0 2207747.8 5519.4 Process Creation 126.0 25125.5 1994.1 Shell Scripts (1 concurrent) 42.4 33461.2 7891.8 Shell Scripts (8 concurrent) 6.0 4024.7 6707.8 System Call Overhead 15000.0 2917278.6 1944.9 ======== System Benchmarks Index Score 3040.1 Signed-off-by: Rui Wang <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> 2022-10-12 08:36:14 +00:00			`generic-y += mcs_spinlock.h`
LoongArch: Add build infrastructure Add Kbuild, Makefile, Kconfig and link script for LoongArch build infrastructure. Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> 2022-05-31 10:04:11 +00:00			`generic-y += parport.h`
			`generic-y += early_ioremap.h`
			`generic-y += qrwlock.h`
LoongArch: Define the __io_aw() hook as mmiowb() Commit fb24ea52f78e0d595852e ("drivers: Remove explicit invocations of mmiowb()") remove all mmiowb() in drivers, but it says: "NOTE: mmiowb() has only ever guaranteed ordering in conjunction with spin_unlock(). However, pairing each mmiowb() removal in this patch with the corresponding call to spin_unlock() is not at all trivial, so there is a small chance that this change may regress any drivers incorrectly relying on mmiowb() to order MMIO writes between CPUs using lock-free synchronisation." The mmio in radeon_ring_commit() is protected by a mutex rather than a spinlock, but in the mutex fastpath it behaves similar to spinlock. We can add mmiowb() calls in the radeon driver but the maintainer says he doesn't like such a workaround, and radeon is not the only example of mutex protected mmio. So we should extend the mmiowb tracking system from spinlock to mutex, and maybe other locking primitives. This is not easy and error prone, so we solve it in the architectural code, by simply defining the __io_aw() hook as mmiowb(). And we no longer need to override queued_spin_unlock() so use the generic definition. Without this, we get such an error when run 'glxgears' on weak ordering architectures such as LoongArch: radeon 0000:04:00.0: ring 0 stalled for more than 10324msec radeon 0000:04:00.0: ring 3 stalled for more than 10240msec radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3) radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@amd.com/T/#t Link: https://lore.kernel.org/linux-arch/20240301130532.3953167-1-chenhuacai@loongson.cn/T/#t Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> 2024-03-19 07:50:34 +00:00			`generic-y += qspinlock.h`
LoongArch: Add build infrastructure Add Kbuild, Makefile, Kconfig and link script for LoongArch build infrastructure. Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> 2022-05-31 10:04:11 +00:00			`generic-y += rwsem.h`
			`generic-y += segment.h`
			`generic-y += user.h`
			`generic-y += stat.h`
			`generic-y += fcntl.h`
			`generic-y += ioctl.h`
			`generic-y += ioctls.h`
			`generic-y += mman.h`
			`generic-y += msgbuf.h`
			`generic-y += sembuf.h`
			`generic-y += shmbuf.h`
			`generic-y += statfs.h`
			`generic-y += socket.h`
			`generic-y += sockios.h`
			`generic-y += termbits.h`
			`generic-y += poll.h`
			`generic-y += param.h`
			`generic-y += posix_types.h`
			`generic-y += resource.h`
			`generic-y += kvm_para.h`