C++17 provides a good enough variant as a standard feature, and
chaton uses the same at its core, instead of rolling out its own
struct of union based variant. And given that currently chaton
is part of common library and not the base llama library, so limit
the use of c++17 to common library. Initially while experimenting,
had set the flag for full llama, limitting it for now.
Also by now most embedded targets should be potentially having c++
compilers and libraries with support for c++17 features. So chances
are it is a ok enough path to take.
Rename chaton-meta hpp to cpp and include this cpp file which brings
in the compile time built-in global chaton configurable template data
into the common library, and avoid the nop hpp file references.
Update chaton.hpp to not include the meta-cpp, instead just make a
reference to the global ChatTemplates instance, so that the hpp can
be used as a header file proper.
Avoid pragma once in the chaton-meta.cpp, including the script, which
helps create it.
As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts.
This fixes the issue by avoiding the consective update counter from incrementing unnecessarily
for tokens in which cuda graphs are disabled due to batch size > 1.
* initial commit with CPU implementation of upscale to shape and test, cuda implementation next
* experimental commit to see if dst shape is correct
* test version
* test
* removed unnecessary params
* refactor
* fixed tests
* ggml : metal impl + cleanup + sycl dev warnings
* patched ggml_upscale cuda op to handle non-contiguous tensors, added test for non-contiguous behavior
* metal : fix upsacle op to support nb00 + style
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Any program which wants to use json file to update/extend the
chaton's configurable template data, can include this new file
chaton_json.hpp, to get the reqd functionality.
Update chaton_meta_ok, _chaton_meta_validate_dump and
chaton_meta_load_json to either work with a passed ChatTemplates
instance, or fallback to the compiled-in global instance of same.
The initial version was rooted around a json object, while the new
version is rooted around a MapOfMapOfVariant (GroupKV), which could
be preloaded with chat templates info at compile time itself and
used as is. Or optionally one could allow the configurable template
data to be extended/updated at runtime from a text(/SimpCfg)/json
file.
* optimize for ppc64le using VSX intrinsics
* 1. code clean up by removing comments about overflow concern.
2. fix typo in suffix of scaling.
* Continue to fix typo in suffix of scaling for QK_K <> 256
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
GroupKV dump adds needed ":" seperator on its own, so calling
functions can just pass the tag string they want in the log without
worrying about any demarkation.
* Add left recursion check: quit early instead of going into an infinite loop
* Remove custom enum, rename left recursion check and move to "grammar internal" section, add handling for edge case where a leftmost nonterminal may be empty
* Remove unnecessary declaration
- Change '--embedding' to '--embeddings' in the README
- Update the description to match the latest --help output
- Added a caution about defining physical batch size
Given that now the multi chat templating logic itself is used to
apply chat templating/tagging to a single chat message, so give
flexibility of deciding whether global tags if any should be
applied or not wrt the core tagging logic.
examples/main inturn updated to not apply global tags if any wrt
the system message. Also the user messages already dont apply
global tags if any, as its currently implemented to build on the
existing in-prefix/suffix and anitprompt flow.
To avoid having to duplicate any hardcoding in future, wrt any new
model/chat-template-standard, at multiple locations, remove the
single message templating code with a wrapper which does the same
but using the multi-msg templating helper.
* convert-hf : support q8_0 conversion
* convert-hf : add missing ftype
This was messing with the checksums otherwise.
* convert-hf : add missing ftype to Baichuan and Xverse
I didn't notice these on my first pass.