Commit graph

898 commits

Author SHA1 Message Date
0cc4m
24eeb97d13 Add bounds checking to matmul kernels, improve implementation, fix command buffers not freed properly 2023-07-02 22:11:58 +02:00
Georgi Gerganov
46088f7231 ggml : fix build with OpenBLAS (close #2066) 2023-07-02 09:46:46 +03:00
Johannes Gäßler
0bc2cdfc87
Better CUDA synchronization logic (#2057) 2023-07-01 21:49:44 +02:00
Johannes Gäßler
befb3a3562
Test-based VRAM scratch size + context adjustment (#2056) 2023-07-01 21:47:26 +02:00
Daniel Drake
b213227067
cmake : don't force -mcpu=native on aarch64 (#2063)
It's currently not possible to cross-compile llama.cpp for aarch64
because CMakeLists.txt forces -mcpu=native for that target.

-mcpu=native doesn't make sense if your build host is not the
target architecture, and clang rejects it for that reason, aborting the
build. This can be easily reproduced using the current Android NDK to build
for aarch64 on an x86_64 host.

If there is not a specific CPU-tuning target for aarch64 then -mcpu
should be omitted completely. I think that makes sense, there is not
enough variance in the aarch64 instruction set to warrant a fixed -mcpu
optimization at this point. And if someone is building natively and wishes
to enable any possible optimizations for the host device, then there is
already the LLAMA_NATIVE option available.

Fixes #495.
2023-07-01 21:31:44 +03:00
Aaron Miller
2f8cd979ec
metal : release buffers when freeing metal context (#2062) 2023-07-01 21:14:59 +03:00
Judd
471aab6e4c
convert : add support of baichuan-7b (#2055)
Co-authored-by: Judd <foldl@boxvest.com>
2023-07-01 20:00:25 +03:00
Georgi Gerganov
463f2f4c4f
llama : fix return value of llama_load_session_file_internal (#2022) 2023-07-01 19:05:09 +03:00
Rand Xie
cb44dbc7de
llama : catch llama_load_session_file_internal exceptions (#2022)
* convert checks in llama_load_session_file to throw and handle them

* make llama_load_session_file_internal static

* address feedbacks to avoid using exceptions
2023-07-01 19:02:58 +03:00
Georgi Gerganov
79f634a19d
embd-input : fix returning ptr to temporary 2023-07-01 18:46:00 +03:00
Georgi Gerganov
04606a1599
train : fix compile warning 2023-07-01 18:45:44 +03:00
Qingyou Meng
b1ca8f36a9
ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995)
Will not be scheduled unless explicitly enabled.
2023-07-01 18:42:43 +03:00
0cc4m
36cd5d85e9 Avoid requesting dedicated memory, VMA can decide that by itself 2023-06-30 21:20:19 +02:00
0cc4m
4ea9b2fd4b Add VMA library 2023-06-30 21:15:06 +02:00
0cc4m
c8ff09bdc7 dequant_q4_0 kernel 2023-06-30 20:48:42 +02:00
0cc4m
cb5cb4d6e2 Fix f16_to_f32 kernel 2023-06-30 20:48:03 +02:00
0cc4m
df3cdbdac7 Output FP32 in fp16 matmul shader 2023-06-30 18:37:10 +02:00
0cc4m
40c8f843f2 Fix mulmat_f16 2023-06-30 18:37:10 +02:00
0cc4m
c31e14b2fd Enable device extensions properly, restore fp16 matmul op 2023-06-30 18:37:10 +02:00
0cc4m
fc5bb53b32 Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 kernel 2023-06-30 18:37:10 +02:00
0cc4m
3adc7b1d60 First FP16 attempt, disabled for now 2023-06-30 18:37:10 +02:00
0cc4m
2c70df985a Continue vulkan implementation and optimization 2023-06-30 18:36:42 +02:00
0cc4m
0c9cca00bd Write coalescing 2023-06-30 18:36:42 +02:00
0cc4m
7c6860b483 2D Blocktiling 2023-06-30 18:36:42 +02:00
0cc4m
1b4863c2b9 1D Blocktiling 2023-06-30 18:36:42 +02:00
0cc4m
baf9ff536b GEMM Kernel optimization 2023-06-30 18:36:42 +02:00
0cc4m
a42376e7ec First matmul success 2023-06-30 18:36:42 +02:00
0cc4m
8ce84c2747 Continue implementation 2023-06-30 18:36:42 +02:00
0cc4m
2471728a9d Add aligned malloc and free for VMA 2023-06-30 18:36:42 +02:00
0cc4m
fc4f207cfb Matmul call 2023-06-30 18:36:41 +02:00
0cc4m
b0e65855d1 Vulkan development 2023-06-30 18:36:41 +02:00
0cc4m
a4004d4fa8 Vulkan memory management 2023-06-30 18:36:41 +02:00
0cc4m
88d4ec05a8 Continue implementation 2023-06-30 18:36:41 +02:00
0cc4m
4a96d0eb7f Fix matmul kernel, continue implementation 2023-06-30 18:36:41 +02:00
0cc4m
061246fb07 Vulkan loader code 2023-06-30 18:36:41 +02:00
Howard Su
b8c8dda75f
Use unsigned for random seed (#2006)
* Use unsigned for random seed. Keep -1 as the value to use a time based seed.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-29 06:15:15 -07:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL (#1966)
* Added broken new q4k quant

* xx + ib0

* Fix q2_k fast kernel

* Use preprocessor for QK_K

* Add q6_k fast matmul kernel

* ported q3k speedup successfully

* ported q2k and q5k speedups

* remove old dot kernels and template

* fixed global const struct types

* fixing address spaces

* fixed string too long CI issue

---------

Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
m3ndax
d3494bb86b
llama : replacing auto &kv with const auto &kv (#2041)
* Replacing auto &kv with const auto &kv

* Create codacy.yml

* Delete codacy.yml
2023-06-28 21:39:08 +03:00
Salvador E. Tropea
5b351e94d0
cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028)
- Not used
2023-06-28 20:27:31 +03:00
Salvador E. Tropea
6432aabb6d
cuda : fix missing const qualifier in casts (#2027) 2023-06-28 20:26:26 +03:00
Howard Su
b922bc351b
llama : remove shards weight file support (#2000)
* Remove multiple shards

* Remove multiple file loaders

* Remove llama_load_tensor_shard class

* Simplify load logic

* Remove dead code guess_n_parts function

* Remove vocab_only from constructor of llama_model_loader

* Remove alignment_prevents_mmap which is not more needed.

* Remove useless check
2023-06-28 20:13:02 +03:00
Johannes Gäßler
7f9753fa12
CUDA GPU acceleration for LoRAs + f16 models (#1970) 2023-06-28 18:35:54 +02:00
ningshanwutuobang
cfa0750bc9
llama : support input embeddings directly (#1910)
* add interface for float input

* fixed inpL shape and type

* add examples of input floats

* add test example for embd input

* fixed sampling

* add free for context

* fixed add end condition for generating

* add examples for llava.py

* add READMD for llava.py

* add READMD for llava.py

* add example of PandaGPT

* refactor the interface and fixed the styles

* add cmake build for embd-input

* add cmake build for embd-input

* Add MiniGPT-4 example

* change the order of the args of llama_eval_internal

* fix ci error
2023-06-28 18:53:37 +03:00
Erik Scholz
9d23589d63
fix pthreads setaffinity usage on android (#2020) 2023-06-27 19:06:33 +02:00
Howard Su
0be54f75a6
baby-llama : fix build after ggml_rope change (#2016) 2023-06-27 08:07:13 +03:00
Georgi Gerganov
181e8d9755
llama : fix rope usage after ChatGLM change 2023-06-27 00:37:33 +03:00
Georgi Gerganov
d9779021bd
ggml : add support for ChatGLM RoPE 2023-06-27 00:06:51 +03:00
Roman Parykin
d38e451578
readme : add Scala 3 bindings repo (#2010) 2023-06-26 22:47:59 +03:00
David Yang
eaa6ca5a61
ggml : increase max tensor name + clean up compiler warnings in train-text (#1988)
* Clean up compiler warnings in train-text

Some brackets to disambiguate order of operations

* Increase GGML_MAX_NAME

Avoiding strncpy danger in train-text-from-scratch and reducing potential future name length issues
2023-06-26 22:45:32 +03:00
Gustavo Rocha Dias
aa777abbb7
readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007)
* docs - Alternative way to build at Android, with CLBlast.

* doc - LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux.

* doc- fix typo
2023-06-26 22:34:45 +03:00