Commit graph

4066 commits

Author SHA1 Message Date
Georgi Gerganov
589b48d41e
contrib : add Resources section (#9675) 2024-09-29 14:38:18 +03:00
ochafik
9ac4b04aa2 tool-call: add fs_list_files to common, w/ win32 impl for msys2 build 2024-09-29 00:42:52 +01:00
ochafik
cb7912ee74 chat-template: add phi-3.5-vision-instruct 2024-09-29 00:33:19 +01:00
ochafik
8738d94bbd minja: qualify std::nullptr_t type for msys2 build 2024-09-29 00:18:22 +01:00
ochafik
c87c12168a tool-call: fix memory leak in test 2024-09-28 23:44:28 +01:00
ochafik
22493c8e9e tests: fix test-chat-template run from build 2024-09-28 23:31:23 +01:00
ochafik
ad6719e2a7 tests: fix typo 2024-09-28 23:26:19 +01:00
ochafik
a072f30a8d tests: attempt to find assets for tests run from build subfolder 2024-09-28 23:15:36 +01:00
ochafik
bc3e0c0830 tool-call: Qwen 2.5 Instruct also requires object arguments 2024-09-28 23:05:35 +01:00
ochafik
b10ef04d8d chat-template: tweak --chat-template error message when --jinja is set 2024-09-28 22:36:38 +01:00
ochafik
dbda025f87 tool-call: test messages -> template -> grammar -> tool call parser 2024-09-28 22:32:47 +01:00
ochafik
0ae1112faa agent: try to fix pyright lint 2024-09-28 20:10:08 +01:00
ochafik
1b32ac129f chat-template: fix test-arg 2024-09-28 20:06:10 +01:00
ochafik
9358d1f62c minja: fix gcc8 build of test 2024-09-28 19:50:08 +01:00
ochafik
e6be59c2a0 antiprompts: fix gcc8 build (avoid recursive struct) 2024-09-28 19:39:52 +01:00
ochafik
ef2a020276 tool-call: make agent async 2024-09-28 19:11:09 +01:00
ochafik
05bbba9f8a tool-call: only match json eagerly for Llama 3.2 2024-09-28 19:05:10 +01:00
ochafik
6e0053a81b chat-template: enumerate files w/ C API rather than private using std::__fs::filesystem 2024-09-28 18:47:11 +01:00
ochafik
c657857e21 tool-call: cleanup tools.py 2024-09-28 18:33:40 +01:00
ochafik
55cf337560 tool-call: better error reporting for server tests 2024-09-28 18:33:40 +01:00
ochafik
7cef90cf9c tool-call: more eager function call parsing for Functionary & Llama (give a chance to 3B model) 2024-09-28 18:33:40 +01:00
ochafik
8b2cf3509f tool-call: fix grammar trigger crash 2024-09-28 18:30:01 +01:00
ochafik
d983516f40 tool-call: let the tool call handler expand chat template, moving builtin_tools down as extra_context 2024-09-28 17:46:36 +01:00
ochafik
0c85bc7a8f tool-call: test tool call style detection 2024-09-28 17:43:09 +01:00
Georgi Gerganov
f4d2b8846a
llama : add reranking support (#9510)
* py : add XLMRobertaForSequenceClassification [no ci]

* py : fix scalar-tensor conversion [no ci]

* py : fix position embeddings chop [no ci]

* llama : read new cls tensors [no ci]

* llama : add classigication head (wip) [no ci]

* llama : add "rank" pooling type

ggml-ci

* server : add rerank endpoint

ggml-ci

* llama : aboud ggml_repeat during classification

* rerank : cleanup + comments

* server : accept /rerank endpoint in addition to /v1/rerank [no ci]

* embedding : parse special tokens

* jina : support v1 reranker

* vocab : minor style

ggml-ci

* server : initiate tests for later

ggml-ci

* server : add docs

* llama : add comment [no ci]

* llama : fix uninitialized tensors

* ci : add rerank tests

ggml-ci

* add reranking test

* change test data

* Update examples/server/server.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* add `--reranking` argument

* update server docs

* llama : fix comment [no ci]

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-09-28 17:42:03 +03:00
slaren
1b2f992cd2
test-backend-ops : use flops for some performance tests (#9657)
* test-backend-ops : use flops for some performance tests

- parallelize tensor quantization

- use a different set of cases for performance and correctness tests

- run each test for at least one second
2024-09-28 14:32:46 +02:00
Georgi Gerganov
739842703e
llama : add comment about thread-safety [no ci] (#9449) 2024-09-28 15:13:42 +03:00
Zhenwei Jin
6102037bbb
vocab : refactor tokenizer to reduce init overhead (#9449)
* refactor tokenizer

* llama : make llm_tokenizer more private

ggml-ci

* refactor tokenizer

* refactor tokenizer

* llama : make llm_tokenizer more private

ggml-ci

* remove unused files

* remove unused fileds to avoid unused filed build error

* avoid symbol link error

* Update src/llama.cpp

* Update src/llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-28 15:10:58 +03:00
nopperl
9a913110cf
llama : add support for Chameleon (#8543)
* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <git@compilade.net>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <git@compilade.net>
2024-09-28 15:08:43 +03:00
Aarni Koskela
43bcdd9703
readme : add tool (#9655) 2024-09-28 15:07:14 +03:00
Dan Johansson
6a0f779484
ggml : add run-time detection of neon, i8mm and sve (#9331)
* ggml: Added run-time detection of neon, i8mm and sve

Adds run-time detection of the Arm instructions set features
neon, i8mm and sve for Linux and Apple build targets.

* ggml: Extend feature detection to include non aarch64 Arm arch

* ggml: Move definition of ggml_arm_arch_features to the global data section
2024-09-28 15:06:16 +03:00
Markus Tavenrath
89f9944981
Enable use to the rebar feature to upload buffers to the device. (#9251) 2024-09-28 12:05:05 +02:00
ochafik
887951beb0 minja: generate chat goldens w/ fixed date to support Llama-3.2-3B-Instruct (uses strftime_now) 2024-09-27 19:52:15 +01:00
ochafik
701b664551 minja: add indent filter to support command-r-plus's chat templates 2024-09-27 19:00:14 +01:00
Georgi Gerganov
b5de3b74a5
readme : update hot topics 2024-09-27 20:57:51 +03:00
ochafik
0093a5e527 minja: fix identifiers parsing (when start w/ not/is/etc) and lstrip_blocks corner case (needed by DeepSeek-V2.5 2024-09-27 18:30:44 +01:00
Borislav Stanimirov
44f59b4301
cmake : add option for common library (#9661) 2024-09-27 10:42:06 +03:00
ochafik
2f25ee30ef Update README.md 2024-09-27 07:18:07 +01:00
ochafik
86e4f99092 Update README.md 2024-09-27 07:15:25 +01:00
ochafik
e62b5de3cf tool-call: fix functionary-small-3.2 (first tool starts w/ name\n, subsequent are >>>name\n) 2024-09-27 07:06:33 +01:00
ochafik
e33b342da7 tool-call: fix passing of tools to template + allow agent to finish 2024-09-27 06:24:22 +01:00
ochafik
f62e688387 tool-call: fix crash / test non-tool call case (added llama_sampler_is_grammar_empty) 2024-09-27 06:04:41 +01:00
ochafik
0abfa36ca7 tool-call: move usage examples to examples/agent 2024-09-27 05:10:30 +01:00
ochafik
6610ecf965 server: rm bad debug code 2024-09-27 04:07:35 +01:00
ochafik
27cd07a056 json: fix grammar conversion typo 2024-09-27 03:57:48 +01:00
ochafik
9295ca95db tool-call: fix agent type lints 2024-09-27 03:53:56 +01:00
ochafik
1e5c0e747e chat-template: fix jinja tests (make safe a passthrough) 2024-09-27 03:50:04 +01:00
ochafik
f9c1743bb5 minja: fix iterables 2024-09-27 03:36:49 +01:00
ochafik
8299fac07c tool-call: adapt very simple agent + docker isolation from https://github.com/ggerganov/llama.cpp/pull/6389 2024-09-26 21:07:46 +01:00
ochafik
10f9fe8d49 tool-call: fix tool call return format 2024-09-26 21:01:04 +01:00