Christian Zhou-Zheng
86842b20e5
fix compiler warnings
2024-05-31 22:25:46 -04:00
Christian Zhou-Zheng
db3ba108e7
code aestheticization
2024-05-31 21:38:02 -04:00
Christian Zhou-Zheng
62560367aa
add command-line args for num threads, num completions file lines, always reload model
...
refactored a few things and did what the commit message says on the tin
2024-05-31 21:27:14 -04:00
Christian Zhou-Zheng
4d7d71bc43
fix square_diff matmul index range and CRLF->LF line endings
...
fixed a logic error where square_diff would not multiply all rows
fixed a formatting error where the provided completions.txt had CRLF line endings
2024-05-31 21:08:25 -04:00
Georgi Gerganov
a323ec60af
server : update js ( #7670 )
2024-05-31 22:23:04 +03:00
Christian Zhou-Zheng
4d88cd1af1
fix zero output & param parsing, functional templating
...
fixed a bug where the output file had no tensor data/was all zero
fixed a bug where single hyphen flags were not being correctly parsed
implements creation of templated prompts from input (still need to adapt based on model)
2024-05-31 12:40:35 -04:00
Galunid
0515ad93f4
convert-hf : Handle NotImplementedError in convert-hf-to-gguf ( #7660 )
2024-05-31 17:42:33 +02:00
Johannes Gäßler
c8047d538f
scripts: update compare_llama_bench.py [no ci] ( #7673 )
2024-05-31 16:26:21 +02:00
Daniele
30e238b246
Improve HIP compatibility ( #7672 )
2024-05-31 16:00:29 +02:00
Georgi Gerganov
16926dff92
readme : link homebrew discussion
2024-05-31 15:04:58 +03:00
Georgi Gerganov
0c27e6f62e
ggml : fix loongson compile warnings ( #7537 )
...
* ggml : fix loongson compile warnings
ggml-ci
* Fix loongarch quantize test fail.
Fix unexpected error introduced during rebase code.
* tests : disable json test due to lack of python on the CI node
ggml-ci
---------
Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>
2024-05-31 14:17:10 +03:00
Galunid
2e32f874e6
Somehow '**' got lost ( #7663 )
2024-05-31 18:24:41 +10:00
Galunid
1af511fc22
Add convert.py removal to hot topics ( #7662 )
2024-05-31 10:09:20 +02:00
Christian Zhou-Zheng
fa85ba6ae3
preliminary template/multiprompt support
...
model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish
2024-05-30 23:39:59 -04:00
Christian Zhou-Zheng
31f153fe9c
fix matrix transpose multiplication
...
you have got to be kidding me
2024-05-30 21:36:17 -04:00
Sertaç Özercan
0541f06296
[no ci] docs: add aikit to readme ( #7650 )
...
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-31 09:57:16 +10:00
ngxson
d446c6d887
add debugs
2024-05-31 00:41:12 +02:00
ngxson
287da25f48
fix mem error
2024-05-31 00:06:45 +02:00
ngxson
447023fc43
add multi prompts, multi-thread for PCA
2024-05-30 23:58:32 +02:00
JohnnyB
9022c33646
Fixed painfully slow single process builds. ( #7326 )
...
* Fixed painfully slow single process builds.
* Added nproc for systems that don't default to nproc
2024-05-30 22:32:38 +02:00
Christian Zhou-Zheng
dc46264ff0
example template completions
...
Implements an example template set built from the positive/negative prompts like the control vector Python implementation.
2024-05-30 13:12:54 -04:00
Georgi Gerganov
5921b8f089
llama : cache llama_token_to_piece ( #7587 )
...
* llama : cache llama_token_to_piece
ggml-ci
* llama : use vectors and avoid has_cache
ggml-ci
* llama : throw on unknown tokenizer types
ggml-ci
* llama : print a log of the total cache size
2024-05-31 02:01:41 +10:00
Christian Zhou-Zheng
f58f6af133
param parsing, refactor, comments
...
Added basic command-line parameters for outfile and one each positive/negative prompt.
Refactored some messy code in PCA computation and GGUF exporting.
Left a bunch of comments regarding further work needed.
2024-05-30 11:31:45 -04:00
Martin Delille
5dcdf94676
Fix conan badge display [no ci] ( #7645 )
2024-05-31 01:07:39 +10:00
Manuel
2e2340de17
Add brew installation instruction to README [no ci] ( #7616 )
2024-05-31 00:58:15 +10:00
Martin Delille
7846540bd2
readme : add Conan badge ( #7638 )
2024-05-30 15:52:50 +03:00
Brian
e6157f94c8
github: add contact links to issues and convert question into research [no ci] ( #7612 )
2024-05-30 21:55:36 +10:00
Galunid
9c4c9cc83f
Move convert.py to examples/convert-legacy-llama.py ( #7430 )
...
* Move convert.py to examples/convert-no-torch.py
* Fix CI, scripts, readme files
* convert-no-torch -> convert-legacy-llama
* Move vocab thing to vocab.py
* Fix convert-no-torch -> convert-legacy-llama
* Fix lost convert.py in ci/run.sh
* Fix imports
* Fix gguf not imported correctly
* Fix flake8 complaints
* Fix check-requirements.sh
* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE
* Review fixes
2024-05-30 21:40:00 +10:00
Chris Elrod
59b0d07766
faster avx512 exp implementation ( #7551 )
...
* faster avx512 exp implementation
* x->r
* improve accuracy, handle special cases
* remove `e`
2024-05-30 21:32:55 +10:00
junchao-loongson
d5c05821f3
ggml : fix loongarch build (O2 issue) ( #7636 )
2024-05-30 12:30:10 +03:00
Johannes Gäßler
972b555ab9
README: explain parallel build [no ci] ( #7618 )
2024-05-30 09:52:39 +02:00
Meng, Hengyu
3854c9d07f
[SYCL] fix intel docker ( #7630 )
...
* Update main-intel.Dockerfile
* workaround for https://github.com/intel/oneapi-containers/issues/70
* reset intel docker in CI
* add missed in server
2024-05-30 16:19:08 +10:00
Christian Zhou-Zheng
73747fe8eb
proof-of-concept stdlib implementation
...
Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish.
2024-05-30 00:31:29 -04:00
Galunid
eb57fee51f
gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py ( #7627 )
2024-05-30 02:10:40 +02:00
Georgi Gerganov
55d62262a9
metal : remove invalid asserts ( #7617 )
2024-05-29 22:21:20 +03:00
Georgi Gerganov
975ec63ff2
metal : add missing asserts ( #7617 )
2024-05-29 20:45:25 +03:00
Georgi Gerganov
fb76ec31a9
ggml : fix YARN + add tests + add asserts ( #7617 )
...
* tests : add rope tests
ggml-ci
* ggml : fixes (hopefully)
ggml-ci
* tests : add non-cont tests
ggml-ci
* cuda : add asserts for rope/norm + fix DS2
ggml-ci
* ggml : assert contiguousness
* tests : reduce RoPE tests
ggml-ci
2024-05-29 20:17:31 +03:00
Georgi Gerganov
cce3dcffc5
cuda : non-cont concat support ( #7610 )
...
* tests : add non-cont concat tests
* cuda : non-cont concat support
ggml-ci
2024-05-29 15:38:26 +03:00
Radoslav Gerganov
210d99173d
llama-bench : add support for the RPC backend ( #7435 )
2024-05-29 14:45:44 +03:00
slaren
87bdf2a199
ggml : use atomic_flag for critical section ( #7598 )
...
* ggml : use atomic_flag for critical section
* add windows shims
2024-05-29 13:36:39 +02:00
Georgi Gerganov
00281b7be3
scripts : remove mpi remnants
2024-05-29 14:31:18 +03:00
Georgi Gerganov
2ab977282b
sync : ggml
2024-05-29 14:29:52 +03:00
Georgi Gerganov
72de268bec
ggml : restore ggml_rope_xpos_inplace (ggml/0)
...
ggml-ci
2024-05-29 14:29:33 +03:00
Akarshan Biswas
0e8d8bfd6c
Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro ( #7605 )
2024-05-29 16:53:47 +10:00
zhouwg
504f0c340f
ggml : fix typo in ggml.c ( #7603 )
2024-05-29 04:09:31 +02:00
Meng, Hengyu
b864b50ce5
[SYCL] Align GEMM dispatch ( #7566 )
...
* align GEMM dispatch
2024-05-29 07:00:24 +08:00
jaime-m-p
02c1ecad07
Tokenizer WPM fixes ( #7500 )
...
* Update random test: add_bos_token.
* Update random test: add WPM models for testing.
* Build vocab.special_tokens_cache using vocab token types.
* Fix and improve WPM preprocessing.
- Fix unicode edge case combinations.
- Split by whitspace in the same pass.
* Discard all tokens when no matching found.
2024-05-28 21:46:34 +02:00
Georgi Gerganov
6bd12ce409
sycl : fix assert ( #7563 )
2024-05-28 22:22:50 +03:00
Giuseppe Scrivano
5442939fcc
llama : support small Granite models ( #7481 )
...
* Add optional MLP bias for Granite models
Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.
* llama: honor add_space_prefix from the model configuration
propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* llama: add support for small granite models
it works only for the small models 3b and 8b.
The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
---------
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Co-authored-by: Steffen Roecker <sroecker@redhat.com>
2024-05-28 21:49:49 +03:00
k.h.lai
56411a950f
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE ( #7552 )
2024-05-28 19:25:08 +02:00