Concedo
66e4b5141e
fix horde worker host and client agent
2023-07-25 18:18:41 +08:00
Concedo
48c27a9ce1
hotfix for 70b broadcast issues
2023-07-25 01:32:47 +08:00
Александр Герман
9731682ad6
Update Makefile ( #345 )
...
fix requirements for idiotic source file concatenation (lol)
2023-07-25 00:21:32 +08:00
Concedo
d8d2449bfb
better label (+1 squashed commits)
...
Squashed commits:
[f573b2c] cuda 3 target arch
2023-07-24 23:07:31 +08:00
Concedo
7555dae4cc
ditch advanced subparsers
2023-07-24 22:40:36 +08:00
Concedo
8a9b40840b
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# tests/test-grad0.c
# tests/test-opt.c
2023-07-24 20:51:28 +08:00
Concedo
6d71e100fe
buff buffers
2023-07-24 20:33:17 +08:00
Georgi Gerganov
5b2b2dc6ae
ggml : sync (unary ops refactor, static-correctness) ( #2370 )
...
* ggml : sync (unary ops, tests)
ggml-ci
* tests : remove unnecessary funcs
2023-07-24 14:46:21 +03:00
Concedo
825e34baa3
default horde name and better handling for horde (+3 squashed commit)
...
Squashed commit:
[fadfa60] better idle handling for horde worker
[a3971e6] updated lite
[2ca2b79] seems to not generate rubbish
2023-07-24 18:41:41 +08:00
Kawrakow
42f70cb2f6
Fix scalar version of Q5_K when QK_K = 64 ( #2362 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-24 12:55:02 +03:00
Concedo
c7136f03d9
added support for tensor_split parameter as an advanced parameter.
2023-07-24 17:16:19 +08:00
Concedo
66328fcd80
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
2023-07-24 15:44:26 +08:00
Concedo
94499dba25
added support for 70b llama 2
2023-07-24 15:20:18 +08:00
Concedo
993ba3b026
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
2023-07-24 11:59:00 +08:00
Evan Jones
84e09a7d8b
llama : add grammar-based sampling ( #1773 )
...
* llama, main : constrain sampling to grammar
* allow loading grammar from file
* fix whitespace errors
* handle & print parser errors
* add comments to grammar syntax and allow newlines where unambiguous
* add missing include
* support alternates in root rule
* fix bugs with empty token and EOS
* adjust JSON grammar
* remove swp file
* rewrite ternary expressions
Co-authored-by: Henri Vasserman <henv@hot.ee>
* use struct for grammar elements and add Unicode support
* add unicode escapes
* add inverse char ranges
* only sample full tokens (no peeking or truncation)
* llama : minor style changes
blindly applied in online editor - hopefully I didn't break something
* update help text
* add warning message if EOS is disabled
---------
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-23 23:58:10 -04:00
Concedo
280abaf029
added stop reason in the perf endpoint
2023-07-24 11:55:35 +08:00
Kawrakow
2f9cf974a0
Some more Q4_K and Q5_K speedup on CUDA ( #2346 )
...
* Faster Q5_K on CUDA
* Small Q5_K improvement on older GPUs
* Spped up Q4_K on CUDA
GTX1660: 29.5 ms/t -> 25.6 ms/t
RTX4080: 8.40 ms/t -> 8.25 ms/t
* Spped up Q4_K on CUDA
GTX1660: 36.7 ms/t -> 35.6 ms/t
RTX4080: 9.8 ms/t -> 9.5 ms/t
* Address PR comments
* Add some comments to satisfy PR reviewer
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-24 00:19:47 +03:00
IgnacioFDM
4f06592cc6
Add gqa parameter support to the server ( #2351 )
...
* Add gqa parameter support to the server
* Change help from stderr to stdout
2023-07-23 23:31:17 +03:00
Johannes Gäßler
70d26ac388
Fix __dp4a documentation ( #2348 )
2023-07-23 17:49:06 +02:00
Concedo
910744e2c0
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# flake.nix
# llama.cpp
2023-07-23 22:37:38 +08:00
Concedo
c28ab4e1b7
update lite, try support k80
2023-07-23 21:50:35 +08:00
wzy
57921ca6db
common : n_threads == -1 uses std: 🧵 :hardware_concurrency() ( #2347 )
...
* Fix #2345 , fix incorrect n_threads
* Update examples/common.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-23 16:33:02 +03:00
slaren
3602ac4255
fix n_tasks ( #2342 )
...
ggml-ci
2023-07-23 15:19:39 +02:00
slaren
95a6c595e7
ggml: move op parameters from tensors to ggml_tensor::op_params ( #2333 )
...
* ggml: move op parameters from tensors to ggml_tensor::op_params
* alibi: use memcpy for float params
* remove `src[1] = NULL` in ops
2023-07-23 14:36:02 +02:00
Georgi Gerganov
e76d630df1
llama : grouped-query attention + LLaMAv2 70B support ( #2276 )
...
* CUDA: GQA implementation
* llama : support for GQA and LLaMAv2 70B
ggml-ci
* py : fix hparams parsing (if-else blocks)
ggml-ci
* py : oh boy ..
ggml-ci
* help : fix gqa value for 70B
ggml-ci
---------
Co-authored-by: JohannesGaessler <johannesg@5d6.de>
2023-07-23 15:09:47 +03:00
maddes8cht
1d0824b247
llama : print help to stdout ( #2338 )
2023-07-23 14:59:48 +03:00
wzy
bc3ec2cdc9
flake : support nix build '.#opencl'
( #2337 )
2023-07-23 14:57:02 +03:00
Christian Demsar
a940458e48
llama : print max tensor size to stderr ( #2336 )
2023-07-23 14:56:34 +03:00
Jose Maldonado
91171b8072
make : fix CLBLAST compile support in FreeBSD ( #2331 )
...
* Fix Makefile for CLBLAST compile support and instructions for compile llama.cpp FreeBSD
* More general use-case for CLBLAST support (Linux and FreeBSD)
2023-07-23 14:52:08 +03:00
AustinMroz
355c80f49e
examples : simplify vim plugin ( #2327 )
...
Uses builtin json_encode and json_decode functions to simplify escaping
Removes the need for temp files
2023-07-23 14:16:48 +03:00
Jiahao Li
83a00ce69b
metal : support bcast add & dup & cont op ( #2323 )
2023-07-23 14:00:37 +03:00
Concedo
2e84eac7f6
Merge branch 'master' into concedo_experimental
2023-07-23 16:23:00 +08:00
Concedo
aa05eadb6f
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# llama.cpp
2023-07-23 16:22:44 +08:00
Kawrakow
d2a43664f9
Speed up Q4_K ( #2322 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-23 08:49:20 +03:00
Concedo
1108232e30
Merge branch 'concedo' into concedo_experimental
2023-07-23 09:59:58 +08:00
Concedo
0cca0726fe
reduce number of retries, fixed maxlength > maxctx bug
2023-07-23 09:59:34 +08:00
Ycros
56995caa48
Fix mirostatv2. ( #338 )
2023-07-23 09:52:03 +08:00
Johannes Gäßler
b9b7d94fc1
CUDA: Fixed 7b q3_K_S with mul_mat_vec_q ( #2313 )
2023-07-22 21:27:34 +02:00
Georgi Gerganov
b47b8a9cfe
llama : optimize memory buffers ( #2325 )
2023-07-22 21:17:57 +03:00
Concedo
fa0270df7c
added some checks to skip generation if busy
2023-07-22 23:10:04 +08:00
Concedo
2807d98fd4
touchup (+2 squashed commit)
...
Squashed commit:
[8b06458] fixed broken param order
[7eabdc0] very broken, do not use
2023-07-22 22:57:56 +08:00
klosax
b5fe67f8c6
Perplexity: Compute scores correlated to HellaSwag ( #2312 )
...
* Add parameter --perplexity-lines to perplexity.cpp
2023-07-22 14:21:24 +02:00
whoreson
24baa54ac1
examples : basic VIM plugin
...
VIM plugin for server exe
2023-07-22 13:34:51 +03:00
Concedo
3aec3038d4
bump scratch buffers
2023-07-22 18:12:18 +08:00
Georgi Gerganov
dd6c67d3cb
ci : fix args
2023-07-22 12:00:56 +03:00
Georgi Gerganov
5d500e8ccf
ci : add 7B CUDA tests ( #2319 )
...
* ci : add 7B CUDA tests
ggml-ci
* ci : add Q2_K to the tests
* ci : bump CUDA ppl chunks
ggml-ci
* ci : increase CUDA TG len + add --ignore-eos
* ci : reduce CUDA ppl cunks down to 4 to save time
2023-07-22 11:48:22 +03:00
Concedo
52c5856a08
auto populate horde model name
2023-07-22 16:03:12 +08:00
Concedo
dd3f8dabed
updated cluster to horde.koboldai.net
2023-07-22 12:42:40 +08:00
Concedo
236d0e8955
add tip about using other workers
2023-07-22 12:29:22 +08:00
Concedo
701bf0a6cd
reduce sleep time between jobs
2023-07-22 11:56:43 +08:00