0cc4m
8dd585e8cb
Variable matmul kernel using specialization constants
2023-07-09 15:50:28 +02:00
0cc4m
3bc7a80ca6
Rework command buffer handling
2023-07-09 11:37:32 +02:00
0cc4m
0ef62f511a
Fix validation errors, improve compatibility with AMD GPUs
2023-07-08 20:40:19 +02:00
0cc4m
c7c761a2b7
Add split-k optimization for small matrix multiplication
...
Use semaphores for synchronization instead of fences or waitidle
Rework async write/read for synchronization
2023-07-08 17:27:05 +02:00
0cc4m
c3d947510b
Optimize warptile matmul shader, replace blocktile with it
2023-07-07 07:13:47 +02:00
0cc4m
6d5a0ada8c
Merge pull request #2 from SlyEcho/vulkan
...
add cmake commands
2023-07-07 05:53:11 +02:00
0cc4m
ea06a2c321
Disable glslc optimization for CMake
2023-07-07 05:52:33 +02:00
0cc4m
869ae76764
Disable glslc optimization
2023-07-05 22:23:07 +02:00
0cc4m
244939029d
Add WIP warp tile mat mul shaders
2023-07-05 22:18:12 +02:00
0cc4m
80b17e2f66
Fix trailing whitespace in vk_mem_alloc.h
2023-07-04 23:01:32 +02:00
0cc4m
e35d28fec3
Fix queue selection for AMD RADV
2023-07-04 22:57:08 +02:00
0cc4m
ae7325fdff
Fix 2d write
2023-07-04 22:42:07 +02:00
0cc4m
ade9555c48
Add 2d write operation, profiling code
2023-07-04 22:31:47 +02:00
Henri Vasserman
3d7d8d00a4
add cmake commands
2023-07-04 17:02:22 +03:00
0cc4m
24eeb97d13
Add bounds checking to matmul kernels, improve implementation, fix command buffers not freed properly
2023-07-02 22:11:58 +02:00
0cc4m
36cd5d85e9
Avoid requesting dedicated memory, VMA can decide that by itself
2023-06-30 21:20:19 +02:00
0cc4m
4ea9b2fd4b
Add VMA library
2023-06-30 21:15:06 +02:00
0cc4m
c8ff09bdc7
dequant_q4_0 kernel
2023-06-30 20:48:42 +02:00
0cc4m
cb5cb4d6e2
Fix f16_to_f32 kernel
2023-06-30 20:48:03 +02:00
0cc4m
df3cdbdac7
Output FP32 in fp16 matmul shader
2023-06-30 18:37:10 +02:00
0cc4m
40c8f843f2
Fix mulmat_f16
2023-06-30 18:37:10 +02:00
0cc4m
c31e14b2fd
Enable device extensions properly, restore fp16 matmul op
2023-06-30 18:37:10 +02:00
0cc4m
fc5bb53b32
Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 kernel
2023-06-30 18:37:10 +02:00
0cc4m
3adc7b1d60
First FP16 attempt, disabled for now
2023-06-30 18:37:10 +02:00
0cc4m
2c70df985a
Continue vulkan implementation and optimization
2023-06-30 18:36:42 +02:00
0cc4m
0c9cca00bd
Write coalescing
2023-06-30 18:36:42 +02:00
0cc4m
7c6860b483
2D Blocktiling
2023-06-30 18:36:42 +02:00
0cc4m
1b4863c2b9
1D Blocktiling
2023-06-30 18:36:42 +02:00
0cc4m
baf9ff536b
GEMM Kernel optimization
2023-06-30 18:36:42 +02:00
0cc4m
a42376e7ec
First matmul success
2023-06-30 18:36:42 +02:00
0cc4m
8ce84c2747
Continue implementation
2023-06-30 18:36:42 +02:00
0cc4m
2471728a9d
Add aligned malloc and free for VMA
2023-06-30 18:36:42 +02:00
0cc4m
fc4f207cfb
Matmul call
2023-06-30 18:36:41 +02:00
0cc4m
b0e65855d1
Vulkan development
2023-06-30 18:36:41 +02:00
0cc4m
a4004d4fa8
Vulkan memory management
2023-06-30 18:36:41 +02:00
0cc4m
88d4ec05a8
Continue implementation
2023-06-30 18:36:41 +02:00
0cc4m
4a96d0eb7f
Fix matmul kernel, continue implementation
2023-06-30 18:36:41 +02:00
0cc4m
061246fb07
Vulkan loader code
2023-06-30 18:36:41 +02:00
Howard Su
b8c8dda75f
Use unsigned for random seed ( #2006 )
...
* Use unsigned for random seed. Keep -1 as the value to use a time based seed.
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-29 06:15:15 -07:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL ( #1966 )
...
* Added broken new q4k quant
* xx + ib0
* Fix q2_k fast kernel
* Use preprocessor for QK_K
* Add q6_k fast matmul kernel
* ported q3k speedup successfully
* ported q2k and q5k speedups
* remove old dot kernels and template
* fixed global const struct types
* fixing address spaces
* fixed string too long CI issue
---------
Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
m3ndax
d3494bb86b
llama : replacing auto &kv with const auto &kv ( #2041 )
...
* Replacing auto &kv with const auto &kv
* Create codacy.yml
* Delete codacy.yml
2023-06-28 21:39:08 +03:00
Salvador E. Tropea
5b351e94d0
cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 ( #2028 )
...
- Not used
2023-06-28 20:27:31 +03:00
Salvador E. Tropea
6432aabb6d
cuda : fix missing const qualifier in casts ( #2027 )
2023-06-28 20:26:26 +03:00
Howard Su
b922bc351b
llama : remove shards weight file support ( #2000 )
...
* Remove multiple shards
* Remove multiple file loaders
* Remove llama_load_tensor_shard class
* Simplify load logic
* Remove dead code guess_n_parts function
* Remove vocab_only from constructor of llama_model_loader
* Remove alignment_prevents_mmap which is not more needed.
* Remove useless check
2023-06-28 20:13:02 +03:00
Johannes Gäßler
7f9753fa12
CUDA GPU acceleration for LoRAs + f16 models ( #1970 )
2023-06-28 18:35:54 +02:00
ningshanwutuobang
cfa0750bc9
llama : support input embeddings directly ( #1910 )
...
* add interface for float input
* fixed inpL shape and type
* add examples of input floats
* add test example for embd input
* fixed sampling
* add free for context
* fixed add end condition for generating
* add examples for llava.py
* add READMD for llava.py
* add READMD for llava.py
* add example of PandaGPT
* refactor the interface and fixed the styles
* add cmake build for embd-input
* add cmake build for embd-input
* Add MiniGPT-4 example
* change the order of the args of llama_eval_internal
* fix ci error
2023-06-28 18:53:37 +03:00
Erik Scholz
9d23589d63
fix pthreads setaffinity usage on android ( #2020 )
2023-06-27 19:06:33 +02:00
Howard Su
0be54f75a6
baby-llama : fix build after ggml_rope change ( #2016 )
2023-06-27 08:07:13 +03:00
Georgi Gerganov
181e8d9755
llama : fix rope usage after ChatGLM change
2023-06-27 00:37:33 +03:00
Georgi Gerganov
d9779021bd
ggml : add support for ChatGLM RoPE
2023-06-27 00:06:51 +03:00