Rename kv helpers to match their semantic.
* whether working with string or bool value
* whether two keys or a single key
Add support for kv with bool value
inturn add the kv boolean pairs used in the chaton_meta.json file
Add the closing bracket
Use repr to retain the escape sequences in the read string.
And parallely skip the single quote around strings wrt repr.
Bring in more k-v pairs wrt chaton_meta.json
WIP:NOTE:
Initial go converting from json driven flow to ChatTemplatesGroupKV
related flow done. Needs to be tested.
A optional helper added to load ChatTemplates from a specified
json file.
Need to add a compile time initialized MapOfMapOfVariants wrt
the chat template details of models/standards already known
to the program. So that one can use the llama.cpp and this new
chat template logic, even without json dependency, if one doesnt
want to.
Manually iterate the json object items using begin-end explicitly,
because the implicit iteration for loop related helpers for the
used json lib gives only the values and not a key-value pair.
* A little documentation that shares my quick tips for working in the repository.
* Update startup-testing-debugging.md
* script that shows a menu of tests to pick from & run the debugger on
* debug-test.sh: Refactor CLI help message
* debug-test.sh: documentation update
* debug-test.sh: CLI Help output corrections
* debug-test.sh: minor doc fix
---------
authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal>
Assisted-by: brian khuu <mofosyne@gmail.com>
* convert-hf : support bfloat16 conversion
* gguf-py : flake8 fixes
* convert-hf : add missing space after comma
* convert-hf : get bit-exact same output as ./quantize
The quantization version was missing.
* convert-hf : don't round bf16 NANs
* convert-hf : save some memory with np.int16 intermediate bf16 weights
* convert-hf : more closely match llama.cpp with which weights to keep in f32
* convert-hf : add --outtype auto-f16
A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.
* convert-hf : remove a semicolon because flake8 doesn't like it
It's a reflex from when programming in C/C++, I guess.
* convert-hf : support outtype templating in outfile name
* convert-hf : rename --outtype auto-f16 to --outtype auto
* [server] Cleanup a memory leak on exit
There are a couple memory leaks on exit of the server. This hides others.
After cleaning this up, you can see leaks on slots. But that is another
patch to be sent after this.
* make tab into spaces
* feat: first things to do
* feat: create tensors for Jina architecture
* fix: use other tensors
* feat: embedding gets results
* fix: fix usage of ALIBI
* fix: clean prints
* fix: do some cleanup unused vars
* fix: revert changes to Makefile and CMakeLists
* fix: revert some changes
* fix: fix small detail
* fix: fix convert formatting
* fix: fix linting and editor
* feat: set proper vocab settings
* fix: JinaBertForMaskedLM registration
* feat: support q_normalization and k_normalization in Jina arch
* feat: handle gpt2 tokenizer with Jina architecture
* feat: example comments in embedding
* feat: rename Jina Bert to Jina Bert V2
* fix: add some changes as per review
* feat: proper KQ_pos for Jina embeddings
* feat: add capacity to load models ES and DE for Spanish
* llama : fix pre-tokenizers
* ggml : full ALiBi support
* ggml : update ggml_soft_max_ext() CUDA, SYCL
* ggml : ggml_flash_attn_ext() support ALiBi (CPU)
* ggml : ggml_flash_attn_ext() support ALiBi (Metal)
* ggml : fix warning
* ggml : ggml_flash_attn_ext() support ALiBi (CUDA)
ggml-ci
* minor : clean-up
* embedding : add warning about missing SEP
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
So that no need to explicitly specify <int64_t> or LL wrt int
literals, which dont need 64bit space by default.
Which also means one shouldnt/cant mix up type of value stored and
default type specified when getting.
Rename all the log messages to have GKV and not SC.
The log messages in get_vector made conditional to GKV_DEBUG, this
was missed out earlier in simpcfg itself.
This should pave way for having a default chat templates dataset
in the code, without needing to load it from a config file, if
one doesnt want to.
TODO: allow for loading config from json into simpcfg, so that
a program which uses llama.cpp can decide, whether it is ok with
what is already there in the internal dataset, or allow for loading
template info at runtime using the simpcfg's simple text file or
additionally include the json code to load template info at runtime
from json file.
It appears like std::format is not supported in older g++/lib still
in wide use like current debian stable, so avoiding same wrt direct
library use.
Allow for empty VAARGS
NOTE: However test program mode of the same uses cout and format
The llama.cpp grammar parser had a bug where forgetting to add a closing
quotation mark to strings would cause parsing to crash. Anyone running a
server on a public endpoint is advised to upgrade. To reproduce this bug
./llamafile -m foo.gguf -p bar --grammar 'root::="'
Credit for discovering and reporting this issue goes to Eclypsium
Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.
Have merged master branch has of 20240510IST12XY with chaton_v3
branch.
As part of same had to update the flow in examples/main/main.cpp
wrt conversion related commit in master branch and my chaton related
commits in this branch.