Concedo
910744e2c0
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# flake.nix
# llama.cpp
2023-07-23 22:37:38 +08:00
Concedo
c28ab4e1b7
update lite, try support k80
2023-07-23 21:50:35 +08:00
wzy
57921ca6db
common : n_threads == -1 uses std: 🧵 :hardware_concurrency() ( #2347 )
...
* Fix #2345 , fix incorrect n_threads
* Update examples/common.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-23 16:33:02 +03:00
slaren
3602ac4255
fix n_tasks ( #2342 )
...
ggml-ci
2023-07-23 15:19:39 +02:00
slaren
95a6c595e7
ggml: move op parameters from tensors to ggml_tensor::op_params ( #2333 )
...
* ggml: move op parameters from tensors to ggml_tensor::op_params
* alibi: use memcpy for float params
* remove `src[1] = NULL` in ops
2023-07-23 14:36:02 +02:00
Georgi Gerganov
e76d630df1
llama : grouped-query attention + LLaMAv2 70B support ( #2276 )
...
* CUDA: GQA implementation
* llama : support for GQA and LLaMAv2 70B
ggml-ci
* py : fix hparams parsing (if-else blocks)
ggml-ci
* py : oh boy ..
ggml-ci
* help : fix gqa value for 70B
ggml-ci
---------
Co-authored-by: JohannesGaessler <johannesg@5d6.de>
2023-07-23 15:09:47 +03:00
maddes8cht
1d0824b247
llama : print help to stdout ( #2338 )
2023-07-23 14:59:48 +03:00
wzy
bc3ec2cdc9
flake : support nix build '.#opencl'
( #2337 )
2023-07-23 14:57:02 +03:00
Christian Demsar
a940458e48
llama : print max tensor size to stderr ( #2336 )
2023-07-23 14:56:34 +03:00
Jose Maldonado
91171b8072
make : fix CLBLAST compile support in FreeBSD ( #2331 )
...
* Fix Makefile for CLBLAST compile support and instructions for compile llama.cpp FreeBSD
* More general use-case for CLBLAST support (Linux and FreeBSD)
2023-07-23 14:52:08 +03:00
AustinMroz
355c80f49e
examples : simplify vim plugin ( #2327 )
...
Uses builtin json_encode and json_decode functions to simplify escaping
Removes the need for temp files
2023-07-23 14:16:48 +03:00
Jiahao Li
83a00ce69b
metal : support bcast add & dup & cont op ( #2323 )
2023-07-23 14:00:37 +03:00
Concedo
2e84eac7f6
Merge branch 'master' into concedo_experimental
2023-07-23 16:23:00 +08:00
Concedo
aa05eadb6f
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# llama.cpp
2023-07-23 16:22:44 +08:00
Kawrakow
d2a43664f9
Speed up Q4_K ( #2322 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-23 08:49:20 +03:00
Concedo
1108232e30
Merge branch 'concedo' into concedo_experimental
2023-07-23 09:59:58 +08:00
Concedo
0cca0726fe
reduce number of retries, fixed maxlength > maxctx bug
2023-07-23 09:59:34 +08:00
Ycros
56995caa48
Fix mirostatv2. ( #338 )
2023-07-23 09:52:03 +08:00
Johannes Gäßler
b9b7d94fc1
CUDA: Fixed 7b q3_K_S with mul_mat_vec_q ( #2313 )
2023-07-22 21:27:34 +02:00
Georgi Gerganov
b47b8a9cfe
llama : optimize memory buffers ( #2325 )
2023-07-22 21:17:57 +03:00
Concedo
fa0270df7c
added some checks to skip generation if busy
2023-07-22 23:10:04 +08:00
Concedo
2807d98fd4
touchup (+2 squashed commit)
...
Squashed commit:
[8b06458] fixed broken param order
[7eabdc0] very broken, do not use
2023-07-22 22:57:56 +08:00
klosax
b5fe67f8c6
Perplexity: Compute scores correlated to HellaSwag ( #2312 )
...
* Add parameter --perplexity-lines to perplexity.cpp
2023-07-22 14:21:24 +02:00
whoreson
24baa54ac1
examples : basic VIM plugin
...
VIM plugin for server exe
2023-07-22 13:34:51 +03:00
Concedo
3aec3038d4
bump scratch buffers
2023-07-22 18:12:18 +08:00
Georgi Gerganov
dd6c67d3cb
ci : fix args
2023-07-22 12:00:56 +03:00
Georgi Gerganov
5d500e8ccf
ci : add 7B CUDA tests ( #2319 )
...
* ci : add 7B CUDA tests
ggml-ci
* ci : add Q2_K to the tests
* ci : bump CUDA ppl chunks
ggml-ci
* ci : increase CUDA TG len + add --ignore-eos
* ci : reduce CUDA ppl cunks down to 4 to save time
2023-07-22 11:48:22 +03:00
Concedo
52c5856a08
auto populate horde model name
2023-07-22 16:03:12 +08:00
Concedo
dd3f8dabed
updated cluster to horde.koboldai.net
2023-07-22 12:42:40 +08:00
Concedo
236d0e8955
add tip about using other workers
2023-07-22 12:29:22 +08:00
Concedo
701bf0a6cd
reduce sleep time between jobs
2023-07-22 11:56:43 +08:00
Concedo
343ae756fa
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .gitignore
# CMakeLists.txt
# Makefile
# README.md
# flake.nix
# ggml-cuda.cu
2023-07-22 11:51:30 +08:00
Concedo
52c98228aa
bugfixes for missing params
2023-07-22 11:37:44 +08:00
Concedo
d7ab6adbc1
embedded horde worker is ready
2023-07-22 11:21:32 +08:00
Richard Roberson
7d5f18468c
examples : add easy python script to create quantized (k-bit support) GGML models from local HF Transformer models ( #2311 )
...
* Resync my fork with new llama.cpp commits
* examples : rename to use dash instead of underscore
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-21 22:01:10 +03:00
Concedo
75064b4ada
wip on embedded horde worker
2023-07-22 01:30:25 +08:00
Kawrakow
d924522a46
Custom RoPE + bettter memory management for CUDA ( #2295 )
...
* Custom RoPE + bettter memory management for CUDA
* Adjusted look ahead in ggml_cuda_pool_malloc to 5%
This is sufficient it seems.
We end up using about 200 MB less VRAM that way when running
the 13B model with context 8192.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-21 17:27:51 +03:00
Kawrakow
4d76a5f49b
Faster Q3_K implementation on Metal ( #2307 )
...
* Faster Q3_K on Metal
* Additional Q3_K speedup on Metal
* Q3_K for QK_K = 64
* Better Q3_K for QK_K = 64
21.6 ms/t -> 21.1 ms/t
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-21 17:05:30 +03:00
Georgi Gerganov
0db14fef06
ggml : fix the rope fix ( 513f861953
)
2023-07-21 15:16:55 +03:00
Ikko Eltociear Ashimine
03e566977b
examples : fix typo in minigpt4.py ( #2298 )
...
promt -> prompt
2023-07-21 14:53:07 +03:00
Georgi Gerganov
513f861953
ggml : fix rope args order + assert ( #2054 )
2023-07-21 14:51:34 +03:00
Georgi Gerganov
3973b25a64
gitignore : fix final newline
2023-07-21 14:42:41 +03:00
Guillaume "Vermeille" Sanchez
ab0e26bdfb
llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale ( #2280 )
2023-07-21 13:58:36 +03:00
Jose Maldonado
73643f5fb1
gitignore : changes for Poetry users + chat examples ( #2284 )
...
A fix in Makefile for FreeBSD users. In the platfrom x86_64 is amd64. This fix resolve compilation using CFLAGS and CXXFLAGS with -march=native and -mtune=native
Add two examples for interactive mode using Llama2 models (thx TheBloke for models)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-21 13:53:27 +03:00
Georgi Gerganov
a814d04f81
make : fix indentation
2023-07-21 13:50:55 +03:00
Georgi Gerganov
4c013bb738
ci : fix MNT realpath usage ( #2250 )
2023-07-21 13:49:18 +03:00
Sky Yan
42c7c2e2e9
make : support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN ( #2275 )
...
Under certain environment, nvcc and gcc is installed under customized path but not standard path
Co-authored-by: Yan Lin <yanlin@baidu.com>
2023-07-21 13:38:57 +03:00
wzy
78a3d13424
flake : remove intel mkl from flake.nix due to missing files ( #2277 )
...
NixOS's mkl misses some libraries like mkl-sdl.pc. See #2261
Currently NixOS doesn't have intel C compiler (icx, icpx). See https://discourse.nixos.org/t/packaging-intel-math-kernel-libraries-mkl/975
So remove it from flake.nix
Some minor changes:
- Change pkgs.python310 to pkgs.python3 to keep latest
- Add pkgconfig to devShells.default
- Remove installPhase because we have `cmake --install` from #2256
2023-07-21 13:26:34 +03:00
Georgi Gerganov
ae178ab46b
llama : make tensor_split ptr instead of array ( #2272 )
2023-07-21 13:10:51 +03:00
Jiří Podivín
54e3bc76fe
make : add new target for test binaries ( #2244 )
...
Programs in the tests directory are now build with target tests
and placed in the same location.
* clean target was expanded to remove new binaries
* test target binaries are listed in a variable
* Locations of binaries were added to the .gitignore
Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-21 13:09:16 +03:00