Commit graph

802 commits

Author SHA1 Message Date
0cc4m
0c4d841cf1 Fix synchronization on AMD, add barriers for buffer ownership transfer, add debug flag and prints 2023-07-15 09:06:53 +02:00
0cc4m
8dd585e8cb Variable matmul kernel using specialization constants 2023-07-09 15:50:28 +02:00
0cc4m
3bc7a80ca6 Rework command buffer handling 2023-07-09 11:37:32 +02:00
0cc4m
0ef62f511a Fix validation errors, improve compatibility with AMD GPUs 2023-07-08 20:40:19 +02:00
0cc4m
c7c761a2b7 Add split-k optimization for small matrix multiplication
Use semaphores for synchronization instead of fences or waitidle

Rework async write/read for synchronization
2023-07-08 17:27:05 +02:00
0cc4m
c3d947510b Optimize warptile matmul shader, replace blocktile with it 2023-07-07 07:13:47 +02:00
0cc4m
6d5a0ada8c
Merge pull request #2 from SlyEcho/vulkan
add cmake commands
2023-07-07 05:53:11 +02:00
0cc4m
ea06a2c321
Disable glslc optimization for CMake 2023-07-07 05:52:33 +02:00
0cc4m
869ae76764 Disable glslc optimization 2023-07-05 22:23:07 +02:00
0cc4m
244939029d Add WIP warp tile mat mul shaders 2023-07-05 22:18:12 +02:00
0cc4m
80b17e2f66 Fix trailing whitespace in vk_mem_alloc.h 2023-07-04 23:01:32 +02:00
0cc4m
e35d28fec3 Fix queue selection for AMD RADV 2023-07-04 22:57:08 +02:00
0cc4m
ae7325fdff Fix 2d write 2023-07-04 22:42:07 +02:00
0cc4m
ade9555c48 Add 2d write operation, profiling code 2023-07-04 22:31:47 +02:00
Henri Vasserman
3d7d8d00a4
add cmake commands 2023-07-04 17:02:22 +03:00
0cc4m
24eeb97d13 Add bounds checking to matmul kernels, improve implementation, fix command buffers not freed properly 2023-07-02 22:11:58 +02:00
0cc4m
36cd5d85e9 Avoid requesting dedicated memory, VMA can decide that by itself 2023-06-30 21:20:19 +02:00
0cc4m
4ea9b2fd4b Add VMA library 2023-06-30 21:15:06 +02:00
0cc4m
c8ff09bdc7 dequant_q4_0 kernel 2023-06-30 20:48:42 +02:00
0cc4m
cb5cb4d6e2 Fix f16_to_f32 kernel 2023-06-30 20:48:03 +02:00
0cc4m
df3cdbdac7 Output FP32 in fp16 matmul shader 2023-06-30 18:37:10 +02:00
0cc4m
40c8f843f2 Fix mulmat_f16 2023-06-30 18:37:10 +02:00
0cc4m
c31e14b2fd Enable device extensions properly, restore fp16 matmul op 2023-06-30 18:37:10 +02:00
0cc4m
fc5bb53b32 Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 kernel 2023-06-30 18:37:10 +02:00
0cc4m
3adc7b1d60 First FP16 attempt, disabled for now 2023-06-30 18:37:10 +02:00
0cc4m
2c70df985a Continue vulkan implementation and optimization 2023-06-30 18:36:42 +02:00
0cc4m
0c9cca00bd Write coalescing 2023-06-30 18:36:42 +02:00
0cc4m
7c6860b483 2D Blocktiling 2023-06-30 18:36:42 +02:00
0cc4m
1b4863c2b9 1D Blocktiling 2023-06-30 18:36:42 +02:00
0cc4m
baf9ff536b GEMM Kernel optimization 2023-06-30 18:36:42 +02:00
0cc4m
a42376e7ec First matmul success 2023-06-30 18:36:42 +02:00
0cc4m
8ce84c2747 Continue implementation 2023-06-30 18:36:42 +02:00
0cc4m
2471728a9d Add aligned malloc and free for VMA 2023-06-30 18:36:42 +02:00
0cc4m
fc4f207cfb Matmul call 2023-06-30 18:36:41 +02:00
0cc4m
b0e65855d1 Vulkan development 2023-06-30 18:36:41 +02:00
0cc4m
a4004d4fa8 Vulkan memory management 2023-06-30 18:36:41 +02:00
0cc4m
88d4ec05a8 Continue implementation 2023-06-30 18:36:41 +02:00
0cc4m
4a96d0eb7f Fix matmul kernel, continue implementation 2023-06-30 18:36:41 +02:00
0cc4m
061246fb07 Vulkan loader code 2023-06-30 18:36:41 +02:00
Howard Su
b8c8dda75f
Use unsigned for random seed (#2006)
* Use unsigned for random seed. Keep -1 as the value to use a time based seed.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-29 06:15:15 -07:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL (#1966)
* Added broken new q4k quant

* xx + ib0

* Fix q2_k fast kernel

* Use preprocessor for QK_K

* Add q6_k fast matmul kernel

* ported q3k speedup successfully

* ported q2k and q5k speedups

* remove old dot kernels and template

* fixed global const struct types

* fixing address spaces

* fixed string too long CI issue

---------

Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
m3ndax
d3494bb86b
llama : replacing auto &kv with const auto &kv (#2041)
* Replacing auto &kv with const auto &kv

* Create codacy.yml

* Delete codacy.yml
2023-06-28 21:39:08 +03:00
Salvador E. Tropea
5b351e94d0
cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028)
- Not used
2023-06-28 20:27:31 +03:00
Salvador E. Tropea
6432aabb6d
cuda : fix missing const qualifier in casts (#2027) 2023-06-28 20:26:26 +03:00
Howard Su
b922bc351b
llama : remove shards weight file support (#2000)
* Remove multiple shards

* Remove multiple file loaders

* Remove llama_load_tensor_shard class

* Simplify load logic

* Remove dead code guess_n_parts function

* Remove vocab_only from constructor of llama_model_loader

* Remove alignment_prevents_mmap which is not more needed.

* Remove useless check
2023-06-28 20:13:02 +03:00
Johannes Gäßler
7f9753fa12
CUDA GPU acceleration for LoRAs + f16 models (#1970) 2023-06-28 18:35:54 +02:00
ningshanwutuobang
cfa0750bc9
llama : support input embeddings directly (#1910)
* add interface for float input

* fixed inpL shape and type

* add examples of input floats

* add test example for embd input

* fixed sampling

* add free for context

* fixed add end condition for generating

* add examples for llava.py

* add READMD for llava.py

* add READMD for llava.py

* add example of PandaGPT

* refactor the interface and fixed the styles

* add cmake build for embd-input

* add cmake build for embd-input

* Add MiniGPT-4 example

* change the order of the args of llama_eval_internal

* fix ci error
2023-06-28 18:53:37 +03:00
Erik Scholz
9d23589d63
fix pthreads setaffinity usage on android (#2020) 2023-06-27 19:06:33 +02:00
Howard Su
0be54f75a6
baby-llama : fix build after ggml_rope change (#2016) 2023-06-27 08:07:13 +03:00
Georgi Gerganov
181e8d9755
llama : fix rope usage after ChatGLM change 2023-06-27 00:37:33 +03:00