Commit graph

3288 commits

Author SHA1 Message Date
HanishKVC
b8590e3e57 ChatON:P5:meta json to hpp: Add required c++ inc and global var
Also comment to indicate that the hpp file is auto converted from
the chaton_meta.json file
2024-05-12 14:06:24 +05:30
HanishKVC
b5b274a44b ChatON:P4:meta json to hpp: Insert kv bool
Rename kv helpers to match their semantic.
* whether working with string or bool value
* whether two keys or a single key

Add support for kv with bool value

inturn add the kv boolean pairs used in the chaton_meta.json file

Add the closing bracket
2024-05-12 13:33:54 +05:30
HanishKVC
7b5fb0a2fa ChatON:P3:meta json to hpp: Retain esc seqs and more kv pairs
Use repr to retain the escape sequences in the read string.
And parallely skip the single quote around strings wrt repr.

Bring in more k-v pairs wrt chaton_meta.json
2024-05-12 13:06:22 +05:30
HanishKVC
078e04d32b ChatON:P2:meta json to hpp conversion - add k-v pairs skeleton 2024-05-12 12:42:53 +05:30
HanishKVC
0c21a0084f ChatON:p1: meta json to hpp conversion - Initial skeleton
load the json file and put the template ids
2024-05-12 12:42:23 +05:30
slaren
b228aba91a
remove convert-lora-to-ggml.py (#7204) 2024-05-12 02:29:33 +02:00
HanishKVC
1574201f71 ChatON:LoadJSon:ChatTemplates: revPrompt, system-user flags
WIP:NOTE:

Initial go converting from json driven flow to ChatTemplatesGroupKV
related flow done. Needs to be tested.

A optional helper added to load ChatTemplates from a specified
json file.

Need to add a compile time initialized MapOfMapOfVariants wrt
the chat template details of models/standards already known
to the program. So that one can use the llama.cpp and this new
chat template logic, even without json dependency, if one doesnt
want to.
2024-05-12 01:45:19 +05:30
HanishKVC
444d2ccf9c ChatON:LoadJSON: ChatTemplates - global/system/user/assistant
Manually iterate the json object items using begin-end explicitly,
because the implicit iteration for loop related helpers for the
used json lib gives only the values and not a key-value pair.
2024-05-12 01:35:31 +05:30
HanishKVC
2efc09f2d0 ChatON: Unnecessarily indirect nlohmann json
code used for exploring/testing commited just for future reference
2024-05-12 00:42:17 +05:30
Georgi Gerganov
7bd4ffb780
metal : fix warnings (skipme) (#0) 2024-05-11 21:38:13 +03:00
Georgi Gerganov
1622ac023f
sync : ggml 2024-05-11 21:35:05 +03:00
Georgi Gerganov
6aeff24f8b
metal : fix indent (ggml/0) 2024-05-11 21:34:21 +03:00
Georgi Gerganov
325756d28d
ggml : resolve merge (ggml/0)
ggml-ci
2024-05-11 21:33:08 +03:00
HanishKVC
b9d9700de3 CMakeLists.txt: Compile C++ code for -std=c++20 2024-05-11 23:42:08 +05:30
HanishKVC
b944d04d08 ChatON: Add constructor for ChatTemplates which chains into GKV 2024-05-11 23:42:08 +05:30
HanishKVC
d9959b74e7 GroupKV: Get ready for use in llama.cpp ++
Avoid defining GKV_TEST_PRG, used for self testing, by default

Add it to common library
2024-05-11 23:40:03 +05:30
Josh Ramer
fed0108491
Scripting & documenting debugging one test without anything else in the loop. (#7096)
* A little documentation that shares my quick tips for working in the repository.

* Update startup-testing-debugging.md

* script that shows a menu of tests to pick from & run the debugger on

* debug-test.sh: Refactor CLI help message

* debug-test.sh: documentation update

* debug-test.sh: CLI Help output corrections

* debug-test.sh: minor doc fix

---------

authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal>
Assisted-by: brian khuu <mofosyne@gmail.com>
2024-05-12 03:26:35 +10:00
HanishKVC
4a9a6ce256 ChatON: ChatONMetaDump switch to GKV/ChatTemplates based flow 2024-05-11 22:53:45 +05:30
Xuan Son Nguyen
72c177c1f6
fix system prompt handling (#7153) 2024-05-11 17:28:10 +02:00
HanishKVC
484c710eab GroupKV:Add GetValue which throws exception 2024-05-11 20:49:51 +05:30
compilade
5a419926b0
convert-hf : support bfloat16 conversion (#7158)
* convert-hf : support bfloat16 conversion

* gguf-py : flake8 fixes

* convert-hf : add missing space after comma

* convert-hf : get bit-exact same output as ./quantize

The quantization version was missing.

* convert-hf : don't round bf16 NANs

* convert-hf : save some memory with np.int16 intermediate bf16 weights

* convert-hf : more closely match llama.cpp with which weights to keep in f32

* convert-hf : add --outtype auto-f16

A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.

* convert-hf : remove a semicolon because flake8 doesn't like it

It's a reflex from when programming in C/C++, I guess.

* convert-hf : support outtype templating in outfile name

* convert-hf : rename --outtype auto-f16 to --outtype auto
2024-05-11 11:06:26 -04:00
HanishKVC
9d4450d51a GroupKV: Let dump return a string, rather than printing/logging 2024-05-11 19:43:34 +05:30
HanishKVC
e999934e91 ChatON:WIP: initial go at GroupKV based flow, instead of json 2024-05-11 19:41:58 +05:30
HanishKVC
f294fddf43 GroupKV: Add group_exists checker 2024-05-11 19:18:19 +05:30
HanishKVC
dde72df9d3 GroupKV: Rename the internal map 2024-05-11 18:23:06 +05:30
Georgi Gerganov
fae9d234b6 sync : ggml
ggml-ci
2024-05-11 15:38:34 +03:00
Justina Cho
f5ef34e428 feat: implemented sigmoid function (ggml/806)
* added sigmoid function

* implemented metal kernel for sigmoid

* implemented cuda kernel for sigmoid

* added sigmoid unary op and incremented count
2024-05-11 15:38:34 +03:00
Borislav Stanimirov
ef0d5e3ec9 build: fix and ignore msvc warnings (ggml/805) 2024-05-11 15:38:34 +03:00
CrispStrobe
3292733f95
convert : skip unaccessible HF repos (#7210) 2024-05-11 11:18:35 +03:00
Steve Grubb
988631335a
server : free llama_batch on exit (#7212)
* [server] Cleanup a memory leak on exit

There are a couple memory leaks on exit of the server. This hides others.
After cleaning this up, you can see leaks on slots. But that is another
patch to be sent after this.

* make tab into spaces
2024-05-11 11:13:02 +03:00
Haoxiang Fei
f99e1e456e
llama : lookup word in vocab before doing BPE merges (#7193)
* fix: llama-3 ignore_merges

* test: add test for llama-3 bpe ignore_merges

* fix: set ignore_merges only for llama-3

* fix: test-tokenizer-1-bpe --ingore-merges detection

* fix: copy to fix fallthrough

* fix: change ignore_merges to bool

* fix: add ignore merges tests to cmake

* llama : alternative merge ignore logic

---------

Co-authored-by: Haoxiang Fei <feihaoxiang@idea.edu.cn>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-11 11:12:06 +03:00
Johannes Gäßler
5ae3426b0b
server: fix reported top tokens for temperature 0 (#7203) 2024-05-11 10:11:28 +02:00
HanishKVC
fdefb39518 GroupKV:Make LDBUG macros conditional, avoid condition at usage site
Also change LWARN to LDBUG wrt previously GKV_DEBUG conditional
code
2024-05-11 13:30:56 +05:30
Joan Fontanals
b83cc3f5b3
llama : add Jina Embeddings architecture (#6826)
* feat: first things to do

* feat: create tensors for Jina architecture

* fix: use other tensors

* feat: embedding gets results

* fix: fix usage of ALIBI

* fix: clean prints

* fix: do some cleanup unused vars

* fix: revert changes to Makefile and CMakeLists

* fix: revert some changes

* fix: fix small detail

* fix: fix convert formatting

* fix: fix linting and editor

* feat: set proper vocab settings

* fix: JinaBertForMaskedLM registration

* feat: support q_normalization and k_normalization in Jina arch

* feat: handle gpt2 tokenizer with Jina architecture

* feat: example comments in embedding

* feat: rename Jina Bert to Jina Bert V2

* fix: add some changes as per review

* feat: proper KQ_pos for Jina embeddings

* feat: add capacity to load models ES and DE for Spanish

* llama : fix pre-tokenizers

* ggml : full ALiBi support

* ggml : update ggml_soft_max_ext() CUDA, SYCL

* ggml : ggml_flash_attn_ext() support ALiBi (CPU)

* ggml : ggml_flash_attn_ext() support ALiBi (Metal)

* ggml : fix warning

* ggml : ggml_flash_attn_ext() support ALiBi (CUDA)

ggml-ci

* minor : clean-up

* embedding : add warning about missing SEP

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-11 10:46:09 +03:00
Georgi Gerganov
9cb317f77e
ggml : full ALiBi support (#7192)
* ggml : full ALiBi support

* ggml : update ggml_soft_max_ext() CUDA, SYCL

* ggml : ggml_flash_attn_ext() support ALiBi (CPU)

* ggml : ggml_flash_attn_ext() support ALiBi (Metal)

* ggml : fix warning

* ggml : ggml_flash_attn_ext() support ALiBi (CUDA)

ggml-ci

* ggml : fix assert message

* vulkan : add dev notes

* ggml : require mask when using ALiBi

ggml-ci

* convert : fix convert for refact models
2024-05-11 10:32:41 +03:00
HanishKVC
7f03dd0d4b GroupKV: Add int32_t to variant list, to simplify int use
So that no need to explicitly specify <int64_t> or LL wrt int
literals, which dont need 64bit space by default.

Which also means one shouldnt/cant mix up type of value stored and
default type specified when getting.
2024-05-11 12:45:58 +05:30
HanishKVC
0342124946 GroupKV: Add to_str wrt vectors, help avoid compiler confusion 2024-05-11 12:27:42 +05:30
HanishKVC
7d7c59ec50 GroupKV:Simplify:P2: Rename tags, Make debug logs conditional
Rename all the log messages to have GKV and not SC.

The log messages in get_vector made conditional to GKV_DEBUG, this
was missed out earlier in simpcfg itself.
2024-05-11 11:57:27 +05:30
HanishKVC
d764a9d395 GroupKV: Simplify code to the minimal needed for GroupKV - P1 2024-05-11 11:37:06 +05:30
HanishKVC
86b842b172 GroupKV: Duplicate SimpCfg to chop down into GroupKV
IE a minimal MapOfMapOfVariant, with some basic helpers.

This can be the basis of a ChatTemplates object, as well as
SimpCfg built on top of it.
2024-05-11 10:57:32 +05:30
HanishKVC
c0506f94bf SimpCfg: Allow for direct initialization lists based init
This should pave way for having a default chat templates dataset
in the code, without needing to load it from a config file, if
one doesnt want to.

TODO: allow for loading config from json into simpcfg, so that
a program which uses llama.cpp can decide, whether it is ok with
what is already there in the internal dataset, or allow for loading
template info at runtime using the simpcfg's simple text file or
additionally include the json code to load template info at runtime
from json file.
2024-05-11 00:33:31 +05:30
HanishKVC
fe27902964 SimpCfg: Avoid iostream/cout and format for direct library use
It appears like std::format is not supported in older g++/lib still
in wide use like current debian stable, so avoiding same wrt direct
library use.

Allow for empty VAARGS

NOTE: However test program mode of the same uses cout and format
2024-05-10 22:27:07 +05:30
slaren
e849648888
llama-bench : add pp+tg test type (#7199) 2024-05-10 18:03:54 +02:00
HanishKVC
1f9a0eb8ce ChatON: Remove unneeded iostream 2024-05-10 21:10:44 +05:30
Georgi Gerganov
18e437665c
metal : fix flash attention kernel requirements (#7169)
* metal : fix flash attention kernel requirements

ggml-ci

* metal : fix ggml_metal_supports_op

ggml-ci
2024-05-10 18:20:10 +03:00
Georgi Gerganov
8c660242d7
convert : print "ignore_merges" field 2024-05-10 17:53:04 +03:00
slaren
25c6e82e7a
llama : use n_vocab to differentiate between mistral 7B and llama3 8B (#7200) 2024-05-10 14:28:01 +02:00
Justine Tunney
4e3880978f
Fix memory bug in grammar parser (#7194)
The llama.cpp grammar parser had a bug where forgetting to add a closing
quotation mark to strings would cause parsing to crash. Anyone running a
server on a public endpoint is advised to upgrade. To reproduce this bug

    ./llamafile -m foo.gguf -p bar --grammar 'root::="'

Credit for discovering and reporting this issue goes to Eclypsium
Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.
2024-05-10 21:01:08 +10:00
HanishKVC
f89fe2732c
Main+: optionally allow special tokens from user in interactive mode (#7097)
@hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.
2024-05-10 20:21:58 +10:00
HanishKVC
abb406b888 Merge branch 'master' into hkvc_chaton_v3
Have merged master branch has of 20240510IST12XY with chaton_v3
branch.

As part of same had to update the flow in examples/main/main.cpp
wrt conversion related commit in master branch and my chaton related
commits in this branch.
2024-05-10 13:14:26 +05:30