* optimize for ppc64le using VSX intrinsics
* 1. code clean up by removing comments about overflow concern.
2. fix typo in suffix of scaling.
* Continue to fix typo in suffix of scaling for QK_K <> 256
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
GroupKV dump adds needed ":" seperator on its own, so calling
functions can just pass the tag string they want in the log without
worrying about any demarkation.
* Add left recursion check: quit early instead of going into an infinite loop
* Remove custom enum, rename left recursion check and move to "grammar internal" section, add handling for edge case where a leftmost nonterminal may be empty
* Remove unnecessary declaration
- Change '--embedding' to '--embeddings' in the README
- Update the description to match the latest --help output
- Added a caution about defining physical batch size
Given that now the multi chat templating logic itself is used to
apply chat templating/tagging to a single chat message, so give
flexibility of deciding whether global tags if any should be
applied or not wrt the core tagging logic.
examples/main inturn updated to not apply global tags if any wrt
the system message. Also the user messages already dont apply
global tags if any, as its currently implemented to build on the
existing in-prefix/suffix and anitprompt flow.
To avoid having to duplicate any hardcoding in future, wrt any new
model/chat-template-standard, at multiple locations, remove the
single message templating code with a wrapper which does the same
but using the multi-msg templating helper.
* convert-hf : support q8_0 conversion
* convert-hf : add missing ftype
This was messing with the checksums otherwise.
* convert-hf : add missing ftype to Baichuan and Xverse
I didn't notice these on my first pass.
* convert.py: Outfile default name change and additional metadata support
* convert.py: don't stringify Metadata load method output
* convert.py: typo fix
* convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp
Reuse the code already moved into GroupKV
Add explicit get and set wrt int32_t, which was added after move
to GroupKV wrt basic MapOfMapOfVariant logic.
The json library generates less informative exception message,
which doesnt help one identify which key is missing, so switch to
the new json_get_str helper added in the last commit. It generates
more informative exception message.
Also has dump was using get_value calls with fallback to default,
so it wasnt identifying the missed field.
Have fixed both of those. Also reconverted meta json file.
Misc: interesting avesham and aattam
This should be ok, given that there is a version of the chat tmpl
meta data already included with the library.
So only if user wants to change the chat template info wrt a existing
model/template-standard or add a new one, then there is need to
pass a json file with info for that model/standard.
This should allow for using this generic chat templating code flow
along with the included chat template data, without needing to
load any json file at runtime.
However If user wants to change the already included chat template
data, or add new chat template standard/model related data, one can
explicitly load json file.
TODO: Need to cross check this flow once, but logically should work