Concedo
784628a2be
Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental
2023-07-04 16:38:32 +08:00
Concedo
ca9a11697c
possibly slower, but cannot use larger batches without modifying ggml library.
2023-07-04 00:35:02 +08:00
Concedo
bfeb3471d7
fix typos
2023-07-03 21:36:42 +08:00
Ycros
309534dcd0
implement sampler order, expose sampler order and mirostat in api
2023-07-02 18:15:34 +00:00
Concedo
3d2907d208
make gptneox and gptj work with extended context too
2023-07-02 18:28:09 +08:00
Concedo
d6b47e6a5b
Merge branch 'master' into concedo_experimental
2023-07-02 17:26:39 +08:00
Concedo
e17c8497cf
switched to NTK aware scaling
2023-07-02 17:25:08 +08:00
Concedo
e19483ca0f
increase scratch for above 4096
2023-07-02 14:55:08 +08:00
Georgi Gerganov
46088f7231
ggml : fix build with OpenBLAS ( close #2066 )
2023-07-02 09:46:46 +03:00
Concedo
b85ea580d3
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
2023-07-02 14:45:25 +08:00
Johannes Gäßler
0bc2cdfc87
Better CUDA synchronization logic ( #2057 )
2023-07-01 21:49:44 +02:00
Johannes Gäßler
befb3a3562
Test-based VRAM scratch size + context adjustment ( #2056 )
2023-07-01 21:47:26 +02:00
Daniel Drake
b213227067
cmake : don't force -mcpu=native on aarch64 ( #2063 )
...
It's currently not possible to cross-compile llama.cpp for aarch64
because CMakeLists.txt forces -mcpu=native for that target.
-mcpu=native doesn't make sense if your build host is not the
target architecture, and clang rejects it for that reason, aborting the
build. This can be easily reproduced using the current Android NDK to build
for aarch64 on an x86_64 host.
If there is not a specific CPU-tuning target for aarch64 then -mcpu
should be omitted completely. I think that makes sense, there is not
enough variance in the aarch64 instruction set to warrant a fixed -mcpu
optimization at this point. And if someone is building natively and wishes
to enable any possible optimizations for the host device, then there is
already the LLAMA_NATIVE option available.
Fixes #495 .
2023-07-01 21:31:44 +03:00
Aaron Miller
2f8cd979ec
metal : release buffers when freeing metal context ( #2062 )
2023-07-01 21:14:59 +03:00
Judd
471aab6e4c
convert : add support of baichuan-7b ( #2055 )
...
Co-authored-by: Judd <foldl@boxvest.com>
2023-07-01 20:00:25 +03:00
Concedo
ef3b8dc0d9
GPU accel for rwkv is slow, disable it
2023-07-02 00:41:46 +08:00
Concedo
e1a7042943
try out the new rwkv but it seems worse, may revert
2023-07-02 00:10:56 +08:00
Georgi Gerganov
463f2f4c4f
llama : fix return value of llama_load_session_file_internal ( #2022 )
2023-07-01 19:05:09 +03:00
Rand Xie
cb44dbc7de
llama : catch llama_load_session_file_internal exceptions ( #2022 )
...
* convert checks in llama_load_session_file to throw and handle them
* make llama_load_session_file_internal static
* address feedbacks to avoid using exceptions
2023-07-01 19:02:58 +03:00
Georgi Gerganov
79f634a19d
embd-input : fix returning ptr to temporary
2023-07-01 18:46:00 +03:00
Georgi Gerganov
04606a1599
train : fix compile warning
2023-07-01 18:45:44 +03:00
Qingyou Meng
b1ca8f36a9
ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default ( #1995 )
...
Will not be scheduled unless explicitly enabled.
2023-07-01 18:42:43 +03:00
Concedo
632bf27b65
more granular context size selections
2023-07-01 11:02:44 +08:00
Concedo
eda663f15f
update lite and up ver
2023-07-01 00:15:26 +08:00
Concedo
0cb8a9eab3
Merge remote-tracking branch 'Johannes/cuda-scratch-size-adjust' into concedo_experimental
...
# Conflicts:
# llama.cpp
2023-06-30 23:29:38 +08:00
Concedo
67cb0b2760
Merge branch 'master' into concedo_experimental
2023-06-30 23:25:40 +08:00
Concedo
d16926dff4
Merge branch 'concedo' into concedo_experimental
2023-06-30 23:06:21 +08:00
Concedo
baf6325907
added flag for building kquants in tools
2023-06-30 23:06:11 +08:00
YellowRoseCx
30ea774e2c
Update CMakeLists.txt with dmmv_x/y/f16 ( #277 )
2023-06-30 22:52:32 +08:00
bebopkim
1129d66ca9
To fix build problem on Apple Metal LLAMA_METAL=1 ( #282 )
2023-06-30 22:50:38 +08:00
JohannesGaessler
600bf6d929
Test-based VRAM scratch size + context adjustment
2023-06-30 11:35:30 +02:00
Concedo
86469d15c4
fix for yr-rocm, large gpu scratch
2023-06-30 12:40:08 +08:00
Concedo
1347d3acc0
another missing flag?
2023-06-30 00:02:18 +08:00
Concedo
396f857021
make platform appropriate library
2023-06-29 23:50:48 +08:00
Concedo
f50c73a0b2
readme
2023-06-29 23:45:57 +08:00
Concedo
ad945e2c41
make instructions clearer
2023-06-29 22:13:39 +08:00
Concedo
64aba0a151
update readme
2023-06-29 21:42:04 +08:00
Howard Su
b8c8dda75f
Use unsigned for random seed ( #2006 )
...
* Use unsigned for random seed. Keep -1 as the value to use a time based seed.
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-29 06:15:15 -07:00
Concedo
f09debb1ec
remove debug
2023-06-29 20:54:56 +08:00
Concedo
966d736582
revert cublasLt removal
2023-06-29 20:51:02 +08:00
Concedo
10a2bdfaf1
Merge remote-tracking branch 'upstream/ik/context_extend' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
2023-06-29 20:35:17 +08:00
Concedo
c7c6e522e7
bigger scratch buffers for bigger context
2023-06-29 19:43:23 +08:00
Concedo
86b061b98c
wip on unified cublas integration, add all the small libraries but exclude the large ones
2023-06-29 18:35:31 +08:00
Concedo
c2f1ed6556
fix compile errors
2023-06-29 17:54:12 +08:00
Concedo
dff5575647
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .gitignore
# Makefile
# ggml-opencl.cpp
# llama.cpp
2023-06-29 17:35:28 +08:00
Concedo
4b3a1282f0
Add flag for lowvram directly into cublas launch param
...
Merge remote-tracking branch 'yellowrose/pr/open/LostRuins/koboldcpp/lowvram' into concedo_experimental
# Conflicts:
# koboldcpp.py
2023-06-29 17:07:31 +08:00
Concedo
746f5fa9e9
update lite
2023-06-29 16:44:39 +08:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL ( #1966 )
...
* Added broken new q4k quant
* xx + ib0
* Fix q2_k fast kernel
* Use preprocessor for QK_K
* Add q6_k fast matmul kernel
* ported q3k speedup successfully
* ported q2k and q5k speedups
* remove old dot kernels and template
* fixed global const struct types
* fixing address spaces
* fixed string too long CI issue
---------
Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
m3ndax
d3494bb86b
llama : replacing auto &kv with const auto &kv ( #2041 )
...
* Replacing auto &kv with const auto &kv
* Create codacy.yml
* Delete codacy.yml
2023-06-28 21:39:08 +03:00
Salvador E. Tropea
5b351e94d0
cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 ( #2028 )
...
- Not used
2023-06-28 20:27:31 +03:00