Commit graph

3022 commits

Author SHA1 Message Date
ngxson
3223133cf5 default n_pca_batch to 20 2024-06-11 15:05:06 +02:00
ngxson
d41c719980 bring back n_completions 2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng
446da906d9 fix n_completions 2024-06-11 08:22:38 -04:00
ngxson
163916864c remember to copy back the last_eigenvector 2024-06-11 12:40:07 +02:00
ngxson
1a088fb0a5 working version 2024-06-11 12:37:05 +02:00
ngxson
9e39571fc2 add n_batch for pca 2024-06-11 11:45:16 +02:00
ngxson
6a5adf3d7c fix shape of v_diff_original 2024-06-11 01:33:16 +02:00
ngxson
c241b500a1 clean up PCA ggml implementation 2024-06-11 01:13:10 +02:00
ngxson
a710df749c (wip) refactor 2024-06-07 15:37:58 +02:00
Christian Zhou-Zheng
a42e783d75 update comments 2024-06-03 21:33:46 -04:00
Christian Zhou-Zheng
3815a0c306 pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped 2024-06-03 21:26:13 -04:00
Christian Zhou-Zheng
23fd1b587c update debug statements 2024-06-03 21:14:43 -04:00
Christian Zhou-Zheng
07dba13ab6 temporary commit while I move dev environments
it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent
2024-06-03 17:40:19 -04:00
ngxson
15d5c257a0 fix cb_eval 2024-06-02 10:58:11 +02:00
Christian Zhou-Zheng
a23c72e4c0 fix ggml errors and make new ones
at least it compiles and runs
2024-06-01 22:19:33 -04:00
Christian Zhou-Zheng
b67ea65983 tentatively translate the rest 2024-06-01 20:47:28 -04:00
Christian Zhou-Zheng
0e1f9734de translated everything but PCA (I think) 2024-06-01 19:50:46 -04:00
Christian Zhou-Zheng
df623fffe8 interim fix memory leak 2024-06-01 18:36:54 -04:00
Christian Zhou-Zheng
3090c485b6 remove unnecessary multithreading 2024-06-01 18:32:14 -04:00
Christian Zhou-Zheng
544268888b in-series multithreading for prompt embedding?
added commented-out code to attempt to start implementing mutlithreading for embedding in main
2024-06-01 17:25:21 -04:00
Christian Zhou-Zheng
86842b20e5 fix compiler warnings 2024-05-31 22:25:46 -04:00
Christian Zhou-Zheng
db3ba108e7 code aestheticization 2024-05-31 21:38:02 -04:00
Christian Zhou-Zheng
62560367aa add command-line args for num threads, num completions file lines, always reload model
refactored a few things and did what the commit message says on the tin
2024-05-31 21:27:14 -04:00
Christian Zhou-Zheng
4d7d71bc43 fix square_diff matmul index range and CRLF->LF line endings
fixed a logic error where square_diff would not multiply all rows

fixed a formatting error where the provided completions.txt had CRLF line endings
2024-05-31 21:08:25 -04:00
Christian Zhou-Zheng
4d88cd1af1 fix zero output & param parsing, functional templating
fixed a bug where the output file had no tensor data/was all zero

fixed a bug where single hyphen flags were not being correctly parsed

implements creation of templated prompts from input (still need to adapt based on model)
2024-05-31 12:40:35 -04:00
Christian Zhou-Zheng
fa85ba6ae3 preliminary template/multiprompt support
model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish
2024-05-30 23:39:59 -04:00
Christian Zhou-Zheng
31f153fe9c fix matrix transpose multiplication
you have got to be kidding me
2024-05-30 21:36:17 -04:00
ngxson
d446c6d887 add debugs 2024-05-31 00:41:12 +02:00
ngxson
287da25f48 fix mem error 2024-05-31 00:06:45 +02:00
ngxson
447023fc43 add multi prompts, multi-thread for PCA 2024-05-30 23:58:32 +02:00
Christian Zhou-Zheng
dc46264ff0 example template completions
Implements an example template set built from the positive/negative prompts like the control vector Python implementation.
2024-05-30 13:12:54 -04:00
Christian Zhou-Zheng
f58f6af133 param parsing, refactor, comments
Added basic command-line parameters for outfile and one each positive/negative prompt.

Refactored some messy code in PCA computation and GGUF exporting.

Left a bunch of comments regarding further work needed.
2024-05-30 11:31:45 -04:00
Christian Zhou-Zheng
73747fe8eb proof-of-concept stdlib implementation
Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish.
2024-05-30 00:31:29 -04:00
ngxson
b30bea3257 add comments 2024-05-24 22:50:03 +02:00
ngxson
c31c118d86 calc diff 2024-05-24 11:46:47 +02:00
ngxson
0a46d73056 add control-vector-generator 2024-05-24 11:11:55 +02:00
Georgi Gerganov
74f33adf5f
readme : remove trailing space (#7469) 2024-05-23 17:43:18 +03:00
Georgi Gerganov
1debe72737
ggml : silence UB sanitizer error during iq2_xxs quantization (#0) 2024-05-23 17:25:38 +03:00
Tristan Druyen
007489e895
Fix phi3 chat template confusion with zephyr (#7449)
* Fix phi3 template matching vs zephyr

* Add regression test for new phi3 chat template

* Implement review suggestions

* Fix phi3 jinja test templates & match by <|end|>

* Apply suggestion

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Add all phi3 template variants in tests

* Remove unneeded message trimming

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Fix tests to not expect trimmed messages

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-05-23 16:15:15 +02:00
Raj Hammeer Singh Hada
8b94e799df
readme : add Bunny in supported models [no ci] (#7469) 2024-05-23 15:30:13 +03:00
Daniel Bevenius
3015851c5a
llama : add getters for n_threads/n_threads_batch (#7464)
* llama : add getters for n_threads/n_threads_batch

This commit adds two new functions to the llama API. The functions
can be used to get the number of threads used for generating a single
token and the number of threads used for prompt and batch processing
(multiple tokens).

The motivation for this is that we want to be able to get the number of
threads that the a context is using. The main use case is for a
testing/verification that the number of threads is set correctly.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! llama : add getters for n_threads/n_threads_batch

Rename the getters to llama_n_threads and llama_n_threads_batch.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-05-23 15:29:26 +03:00
Georgi Gerganov
55ac3b7aea
ci : use Pythia models instead of OpenLlama (#7470)
* ci : start using Pythia models over OpenLlama

ggml-ci

* ci : disable q2_k ppl tests

* ci : use convert-hf-to-gguf.py

* ci : update gg_get_model

* ci : fix convert outfile name

ggml-ci

* llama : gptneox arch use F32 attn prec

ggml-ci
2024-05-23 15:28:14 +03:00
Victor Nogueira
dacfcebd60
readme : add GPT-NeoX + Pythia to the list of supported models (#7491) 2024-05-23 15:12:43 +03:00
fairydreaming
9b82476ee9
Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461)
* convert-hf : add conversion of bloom-style qkv tensor to gpt-style qkv (code borrowed from BloomModel)

* llama : add inference support for LLM_ARCH_GPTNEOX

* llama : add model types for every Pythia variant and GPT-NeoX

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-05-23 11:49:53 +02:00
Georgi Gerganov
a61a94e543
llama : rename n_ctx -> cache.size, less confusing (#0) 2024-05-23 12:38:18 +03:00
Brian
152da28ae5
labeler.yml: add embedding label detector [no ci] (#7482) 2024-05-23 17:40:43 +10:00
Georgi Gerganov
d48c88cbd5
ggml : remove ggml_flash_attn and ggml_flash_ff (#7463)
ggml-ci
2024-05-23 10:00:44 +03:00
Georgi Gerganov
e84b71c2c6
ggml : drop support for QK_K=64 (#7473)
* ggml : drop support for QK_K=64

ggml-ci

* opencl : restore QK_K=256 define
2024-05-23 10:00:21 +03:00
0cc4m
1b1e27cb49
Update vulkan rope implementation to support frequency factors (#7475) 2024-05-23 08:59:59 +02:00
Georgi Gerganov
fbf777d2b9
main : minor (#7462) 2024-05-23 09:43:49 +03:00