Commit graph

1409 commits

Author SHA1 Message Date
Concedo
3d2907d208 make gptneox and gptj work with extended context too 2023-07-02 18:28:09 +08:00
Concedo
d6b47e6a5b Merge branch 'master' into concedo_experimental 2023-07-02 17:26:39 +08:00
Concedo
e17c8497cf switched to NTK aware scaling 2023-07-02 17:25:08 +08:00
Concedo
e19483ca0f increase scratch for above 4096 2023-07-02 14:55:08 +08:00
Georgi Gerganov
46088f7231 ggml : fix build with OpenBLAS (close #2066) 2023-07-02 09:46:46 +03:00
Concedo
b85ea580d3 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-07-02 14:45:25 +08:00
Johannes Gäßler
0bc2cdfc87
Better CUDA synchronization logic (#2057) 2023-07-01 21:49:44 +02:00
Johannes Gäßler
befb3a3562
Test-based VRAM scratch size + context adjustment (#2056) 2023-07-01 21:47:26 +02:00
Daniel Drake
b213227067
cmake : don't force -mcpu=native on aarch64 (#2063)
It's currently not possible to cross-compile llama.cpp for aarch64
because CMakeLists.txt forces -mcpu=native for that target.

-mcpu=native doesn't make sense if your build host is not the
target architecture, and clang rejects it for that reason, aborting the
build. This can be easily reproduced using the current Android NDK to build
for aarch64 on an x86_64 host.

If there is not a specific CPU-tuning target for aarch64 then -mcpu
should be omitted completely. I think that makes sense, there is not
enough variance in the aarch64 instruction set to warrant a fixed -mcpu
optimization at this point. And if someone is building natively and wishes
to enable any possible optimizations for the host device, then there is
already the LLAMA_NATIVE option available.

Fixes #495.
2023-07-01 21:31:44 +03:00
Aaron Miller
2f8cd979ec
metal : release buffers when freeing metal context (#2062) 2023-07-01 21:14:59 +03:00
Judd
471aab6e4c
convert : add support of baichuan-7b (#2055)
Co-authored-by: Judd <foldl@boxvest.com>
2023-07-01 20:00:25 +03:00
Concedo
ef3b8dc0d9 GPU accel for rwkv is slow, disable it 2023-07-02 00:41:46 +08:00
Concedo
e1a7042943 try out the new rwkv but it seems worse, may revert 2023-07-02 00:10:56 +08:00
Georgi Gerganov
463f2f4c4f
llama : fix return value of llama_load_session_file_internal (#2022) 2023-07-01 19:05:09 +03:00
Rand Xie
cb44dbc7de
llama : catch llama_load_session_file_internal exceptions (#2022)
* convert checks in llama_load_session_file to throw and handle them

* make llama_load_session_file_internal static

* address feedbacks to avoid using exceptions
2023-07-01 19:02:58 +03:00
Georgi Gerganov
79f634a19d
embd-input : fix returning ptr to temporary 2023-07-01 18:46:00 +03:00
Georgi Gerganov
04606a1599
train : fix compile warning 2023-07-01 18:45:44 +03:00
Qingyou Meng
b1ca8f36a9
ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995)
Will not be scheduled unless explicitly enabled.
2023-07-01 18:42:43 +03:00
Concedo
632bf27b65 more granular context size selections 2023-07-01 11:02:44 +08:00
Concedo
eda663f15f update lite and up ver 2023-07-01 00:15:26 +08:00
Concedo
0cb8a9eab3 Merge remote-tracking branch 'Johannes/cuda-scratch-size-adjust' into concedo_experimental
# Conflicts:
#	llama.cpp
2023-06-30 23:29:38 +08:00
Concedo
67cb0b2760 Merge branch 'master' into concedo_experimental 2023-06-30 23:25:40 +08:00
Concedo
d16926dff4 Merge branch 'concedo' into concedo_experimental 2023-06-30 23:06:21 +08:00
Concedo
baf6325907 added flag for building kquants in tools 2023-06-30 23:06:11 +08:00
YellowRoseCx
30ea774e2c
Update CMakeLists.txt with dmmv_x/y/f16 (#277) 2023-06-30 22:52:32 +08:00
bebopkim
1129d66ca9
To fix build problem on Apple Metal LLAMA_METAL=1 (#282) 2023-06-30 22:50:38 +08:00
JohannesGaessler
600bf6d929 Test-based VRAM scratch size + context adjustment 2023-06-30 11:35:30 +02:00
Concedo
86469d15c4 fix for yr-rocm, large gpu scratch 2023-06-30 12:40:08 +08:00
Concedo
1347d3acc0 another missing flag? 2023-06-30 00:02:18 +08:00
Concedo
396f857021 make platform appropriate library 2023-06-29 23:50:48 +08:00
Concedo
f50c73a0b2 readme 2023-06-29 23:45:57 +08:00
Concedo
ad945e2c41 make instructions clearer 2023-06-29 22:13:39 +08:00
Concedo
64aba0a151 update readme 2023-06-29 21:42:04 +08:00
Howard Su
b8c8dda75f
Use unsigned for random seed (#2006)
* Use unsigned for random seed. Keep -1 as the value to use a time based seed.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-29 06:15:15 -07:00
Concedo
f09debb1ec remove debug 2023-06-29 20:54:56 +08:00
Concedo
966d736582 revert cublasLt removal 2023-06-29 20:51:02 +08:00
Concedo
10a2bdfaf1 Merge remote-tracking branch 'upstream/ik/context_extend' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
2023-06-29 20:35:17 +08:00
Concedo
c7c6e522e7 bigger scratch buffers for bigger context 2023-06-29 19:43:23 +08:00
Concedo
86b061b98c wip on unified cublas integration, add all the small libraries but exclude the large ones 2023-06-29 18:35:31 +08:00
Concedo
c2f1ed6556 fix compile errors 2023-06-29 17:54:12 +08:00
Concedo
dff5575647 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.gitignore
#	Makefile
#	ggml-opencl.cpp
#	llama.cpp
2023-06-29 17:35:28 +08:00
Concedo
4b3a1282f0 Add flag for lowvram directly into cublas launch param
Merge remote-tracking branch 'yellowrose/pr/open/LostRuins/koboldcpp/lowvram' into concedo_experimental

# Conflicts:
#	koboldcpp.py
2023-06-29 17:07:31 +08:00
Concedo
746f5fa9e9 update lite 2023-06-29 16:44:39 +08:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL (#1966)
* Added broken new q4k quant

* xx + ib0

* Fix q2_k fast kernel

* Use preprocessor for QK_K

* Add q6_k fast matmul kernel

* ported q3k speedup successfully

* ported q2k and q5k speedups

* remove old dot kernels and template

* fixed global const struct types

* fixing address spaces

* fixed string too long CI issue

---------

Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
m3ndax
d3494bb86b
llama : replacing auto &kv with const auto &kv (#2041)
* Replacing auto &kv with const auto &kv

* Create codacy.yml

* Delete codacy.yml
2023-06-28 21:39:08 +03:00
Salvador E. Tropea
5b351e94d0
cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028)
- Not used
2023-06-28 20:27:31 +03:00
Salvador E. Tropea
6432aabb6d
cuda : fix missing const qualifier in casts (#2027) 2023-06-28 20:26:26 +03:00
Howard Su
b922bc351b
llama : remove shards weight file support (#2000)
* Remove multiple shards

* Remove multiple file loaders

* Remove llama_load_tensor_shard class

* Simplify load logic

* Remove dead code guess_n_parts function

* Remove vocab_only from constructor of llama_model_loader

* Remove alignment_prevents_mmap which is not more needed.

* Remove useless check
2023-06-28 20:13:02 +03:00
Johannes Gäßler
7f9753fa12
CUDA GPU acceleration for LoRAs + f16 models (#1970) 2023-06-28 18:35:54 +02:00
ningshanwutuobang
cfa0750bc9
llama : support input embeddings directly (#1910)
* add interface for float input

* fixed inpL shape and type

* add examples of input floats

* add test example for embd input

* fixed sampling

* add free for context

* fixed add end condition for generating

* add examples for llava.py

* add READMD for llava.py

* add READMD for llava.py

* add example of PandaGPT

* refactor the interface and fixed the styles

* add cmake build for embd-input

* add cmake build for embd-input

* Add MiniGPT-4 example

* change the order of the args of llama_eval_internal

* fix ci error
2023-06-28 18:53:37 +03:00