llama : move vocab, grammar and sampling into separate files (#8508)

* llama : move sampling code into llama-sampling

ggml-ci

* llama : move grammar code into llama-grammar

ggml-ci

* cont

ggml-ci

* cont : pre-fetch rules

* cont

ggml-ci

* llama : deprecate llama_sample_grammar

* llama : move tokenizers into llama-vocab

ggml-ci

* make : update llama.cpp deps [no ci]

* llama : redirect external API to internal APIs

ggml-ci

* llama : suffix the internal APIs with "_impl"

ggml-ci

* llama : clean-up
This commit is contained in:
Georgi Gerganov 2024-07-23 13:10:17 +03:00 committed by GitHub
parent 751fcfc6c3
commit 938943cdbf
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
18 changed files with 3521 additions and 2968 deletions

View file

@ -4,6 +4,8 @@
#include <string>
#include <vector>
// TODO: prefix all symbols with "llama_"
struct codepoint_flags {
enum {
UNDEFINED = 0x0001,
@ -46,6 +48,7 @@ struct codepoint_flags {
}
};
size_t unicode_len_utf8(char src);
std::string unicode_cpt_to_utf8(uint32_t cp);
uint32_t unicode_cpt_from_utf8(const std::string & utf8, size_t & offset);