Commit graph

793 commits

Author SHA1 Message Date
0cc4m
869ae76764 Disable glslc optimization 2023-07-05 22:23:07 +02:00
0cc4m
244939029d Add WIP warp tile mat mul shaders 2023-07-05 22:18:12 +02:00
0cc4m
80b17e2f66 Fix trailing whitespace in vk_mem_alloc.h 2023-07-04 23:01:32 +02:00
0cc4m
e35d28fec3 Fix queue selection for AMD RADV 2023-07-04 22:57:08 +02:00
0cc4m
ae7325fdff Fix 2d write 2023-07-04 22:42:07 +02:00
0cc4m
ade9555c48 Add 2d write operation, profiling code 2023-07-04 22:31:47 +02:00
0cc4m
24eeb97d13 Add bounds checking to matmul kernels, improve implementation, fix command buffers not freed properly 2023-07-02 22:11:58 +02:00
0cc4m
36cd5d85e9 Avoid requesting dedicated memory, VMA can decide that by itself 2023-06-30 21:20:19 +02:00
0cc4m
4ea9b2fd4b Add VMA library 2023-06-30 21:15:06 +02:00
0cc4m
c8ff09bdc7 dequant_q4_0 kernel 2023-06-30 20:48:42 +02:00
0cc4m
cb5cb4d6e2 Fix f16_to_f32 kernel 2023-06-30 20:48:03 +02:00
0cc4m
df3cdbdac7 Output FP32 in fp16 matmul shader 2023-06-30 18:37:10 +02:00
0cc4m
40c8f843f2 Fix mulmat_f16 2023-06-30 18:37:10 +02:00
0cc4m
c31e14b2fd Enable device extensions properly, restore fp16 matmul op 2023-06-30 18:37:10 +02:00
0cc4m
fc5bb53b32 Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 kernel 2023-06-30 18:37:10 +02:00
0cc4m
3adc7b1d60 First FP16 attempt, disabled for now 2023-06-30 18:37:10 +02:00
0cc4m
2c70df985a Continue vulkan implementation and optimization 2023-06-30 18:36:42 +02:00
0cc4m
0c9cca00bd Write coalescing 2023-06-30 18:36:42 +02:00
0cc4m
7c6860b483 2D Blocktiling 2023-06-30 18:36:42 +02:00
0cc4m
1b4863c2b9 1D Blocktiling 2023-06-30 18:36:42 +02:00
0cc4m
baf9ff536b GEMM Kernel optimization 2023-06-30 18:36:42 +02:00
0cc4m
a42376e7ec First matmul success 2023-06-30 18:36:42 +02:00
0cc4m
8ce84c2747 Continue implementation 2023-06-30 18:36:42 +02:00
0cc4m
2471728a9d Add aligned malloc and free for VMA 2023-06-30 18:36:42 +02:00
0cc4m
fc4f207cfb Matmul call 2023-06-30 18:36:41 +02:00
0cc4m
b0e65855d1 Vulkan development 2023-06-30 18:36:41 +02:00
0cc4m
a4004d4fa8 Vulkan memory management 2023-06-30 18:36:41 +02:00
0cc4m
88d4ec05a8 Continue implementation 2023-06-30 18:36:41 +02:00
0cc4m
4a96d0eb7f Fix matmul kernel, continue implementation 2023-06-30 18:36:41 +02:00
0cc4m
061246fb07 Vulkan loader code 2023-06-30 18:36:41 +02:00
Howard Su
b8c8dda75f
Use unsigned for random seed (#2006)
* Use unsigned for random seed. Keep -1 as the value to use a time based seed.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-29 06:15:15 -07:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL (#1966)
* Added broken new q4k quant

* xx + ib0

* Fix q2_k fast kernel

* Use preprocessor for QK_K

* Add q6_k fast matmul kernel

* ported q3k speedup successfully

* ported q2k and q5k speedups

* remove old dot kernels and template

* fixed global const struct types

* fixing address spaces

* fixed string too long CI issue

---------

Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
m3ndax
d3494bb86b
llama : replacing auto &kv with const auto &kv (#2041)
* Replacing auto &kv with const auto &kv

* Create codacy.yml

* Delete codacy.yml
2023-06-28 21:39:08 +03:00
Salvador E. Tropea
5b351e94d0
cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028)
- Not used
2023-06-28 20:27:31 +03:00
Salvador E. Tropea
6432aabb6d
cuda : fix missing const qualifier in casts (#2027) 2023-06-28 20:26:26 +03:00
Howard Su
b922bc351b
llama : remove shards weight file support (#2000)
* Remove multiple shards

* Remove multiple file loaders

* Remove llama_load_tensor_shard class

* Simplify load logic

* Remove dead code guess_n_parts function

* Remove vocab_only from constructor of llama_model_loader

* Remove alignment_prevents_mmap which is not more needed.

* Remove useless check
2023-06-28 20:13:02 +03:00
Johannes Gäßler
7f9753fa12
CUDA GPU acceleration for LoRAs + f16 models (#1970) 2023-06-28 18:35:54 +02:00
ningshanwutuobang
cfa0750bc9
llama : support input embeddings directly (#1910)
* add interface for float input

* fixed inpL shape and type

* add examples of input floats

* add test example for embd input

* fixed sampling

* add free for context

* fixed add end condition for generating

* add examples for llava.py

* add READMD for llava.py

* add READMD for llava.py

* add example of PandaGPT

* refactor the interface and fixed the styles

* add cmake build for embd-input

* add cmake build for embd-input

* Add MiniGPT-4 example

* change the order of the args of llama_eval_internal

* fix ci error
2023-06-28 18:53:37 +03:00
Erik Scholz
9d23589d63
fix pthreads setaffinity usage on android (#2020) 2023-06-27 19:06:33 +02:00
Howard Su
0be54f75a6
baby-llama : fix build after ggml_rope change (#2016) 2023-06-27 08:07:13 +03:00
Georgi Gerganov
181e8d9755
llama : fix rope usage after ChatGLM change 2023-06-27 00:37:33 +03:00
Georgi Gerganov
d9779021bd
ggml : add support for ChatGLM RoPE 2023-06-27 00:06:51 +03:00
Roman Parykin
d38e451578
readme : add Scala 3 bindings repo (#2010) 2023-06-26 22:47:59 +03:00
David Yang
eaa6ca5a61
ggml : increase max tensor name + clean up compiler warnings in train-text (#1988)
* Clean up compiler warnings in train-text

Some brackets to disambiguate order of operations

* Increase GGML_MAX_NAME

Avoiding strncpy danger in train-text-from-scratch and reducing potential future name length issues
2023-06-26 22:45:32 +03:00
Gustavo Rocha Dias
aa777abbb7
readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007)
* docs - Alternative way to build at Android, with CLBlast.

* doc - LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux.

* doc- fix typo
2023-06-26 22:34:45 +03:00
Georgi Gerganov
c824d2e368
ggml : avoid conv 2d kernel round up 2023-06-26 21:03:59 +03:00
zrm
b853d45601
ggml : add NUMA support (#1556)
* detect NUMA systems and pin work threads to nodes (linux)

* disable mmap prefetch/readahead for NUMA systems

* avoid sending finalize op to thread pool if it does nothing

* silence robot

* fix args

* make --numa a param

* recommendation that n_nodes evenly divide n_threads did not warrant such aggressive enforcement

* lower synchronization overhead

* statically allocate

* move numa state to g_state

* add description for --numa

* ggml : minor style changes

* ggml : minor style + try fix sanitizer build

* llama : allow to initialize backend with NUMA support

* llama : avoid ggml include in llama-util.h

* ggml : style / formatting

* ggml : fix handling of ops with n_threads > n_tasks > 1

* server : utilize numa parameter

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-26 20:57:59 +03:00
Georgi Gerganov
9225baef71
k-quants : fix indentation 2023-06-26 20:10:52 +03:00
katsu560
a84ab1da8d
tests : fix quantize perf (#1990)
* fix test quantize perf

* avoid the global state
2023-06-26 19:47:02 +03:00
katsu560
5743ca8092
k-quants : add AVX support to dot functions (#1916)
* k_quants : add AVX support

* k_quants : apply review comments
2023-06-26 19:46:07 +03:00