llama.cpp

Author	SHA1	Message	Date
ochafik	e6be59c2a0	`antiprompts`: fix gcc8 build (avoid recursive struct)	2024-09-28 19:39:52 +01:00
ochafik	ef2a020276	`tool-call`: make agent async	2024-09-28 19:11:09 +01:00
ochafik	05bbba9f8a	`tool-call`: only match json eagerly for Llama 3.2	2024-09-28 19:05:10 +01:00
ochafik	6e0053a81b	`chat-template`: enumerate files w/ C API rather than private using std::__fs::filesystem	2024-09-28 18:47:11 +01:00
ochafik	c657857e21	`tool-call`: cleanup tools.py	2024-09-28 18:33:40 +01:00
ochafik	55cf337560	`tool-call`: better error reporting for server tests	2024-09-28 18:33:40 +01:00
ochafik	7cef90cf9c	`tool-call`: more eager function call parsing for Functionary & Llama (give a chance to 3B model)	2024-09-28 18:33:40 +01:00
ochafik	8b2cf3509f	`tool-call`: fix grammar trigger crash	2024-09-28 18:30:01 +01:00
ochafik	d983516f40	`tool-call`: let the tool call handler expand chat template, moving builtin_tools down as extra_context	2024-09-28 17:46:36 +01:00
ochafik	0c85bc7a8f	`tool-call`: test tool call style detection	2024-09-28 17:43:09 +01:00
Georgi Gerganov	f4d2b8846a	llama : add reranking support (#9510 ) * py : add XLMRobertaForSequenceClassification [no ci] * py : fix scalar-tensor conversion [no ci] * py : fix position embeddings chop [no ci] * llama : read new cls tensors [no ci] * llama : add classigication head (wip) [no ci] * llama : add "rank" pooling type ggml-ci * server : add rerank endpoint ggml-ci * llama : aboud ggml_repeat during classification * rerank : cleanup + comments * server : accept /rerank endpoint in addition to /v1/rerank [no ci] * embedding : parse special tokens * jina : support v1 reranker * vocab : minor style ggml-ci * server : initiate tests for later ggml-ci * server : add docs * llama : add comment [no ci] * llama : fix uninitialized tensors * ci : add rerank tests ggml-ci * add reranking test * change test data * Update examples/server/server.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * add `--reranking` argument * update server docs * llama : fix comment [no ci] ggml-ci --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-09-28 17:42:03 +03:00
slaren	1b2f992cd2	test-backend-ops : use flops for some performance tests (#9657 ) * test-backend-ops : use flops for some performance tests - parallelize tensor quantization - use a different set of cases for performance and correctness tests - run each test for at least one second	2024-09-28 14:32:46 +02:00
Georgi Gerganov	739842703e	llama : add comment about thread-safety [no ci] (#9449 )	2024-09-28 15:13:42 +03:00
Zhenwei Jin	6102037bbb	vocab : refactor tokenizer to reduce init overhead (#9449 ) * refactor tokenizer * llama : make llm_tokenizer more private ggml-ci * refactor tokenizer * refactor tokenizer * llama : make llm_tokenizer more private ggml-ci * remove unused files * remove unused fileds to avoid unused filed build error * avoid symbol link error * Update src/llama.cpp * Update src/llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-28 15:10:58 +03:00
nopperl	9a913110cf	llama : add support for Chameleon (#8543 ) * convert chameleon hf to gguf * add chameleon tokenizer tests * fix lint * implement chameleon graph * add swin norm param * return qk norm weights and biases to original format * implement swin norm * suppress image token output * rem tabs * add comment to conversion * fix ci * check for k norm separately * adapt to new lora implementation * fix layer input for swin norm * move swin_norm in gguf writer * add comment regarding special token regex in chameleon pre-tokenizer * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * fix punctuation regex in chameleon pre-tokenizer (@compilade) Co-authored-by: compilade <git@compilade.net> * fix lint * trigger ci --------- Co-authored-by: compilade <git@compilade.net>	2024-09-28 15:08:43 +03:00
Aarni Koskela	43bcdd9703	readme : add tool (#9655 )	2024-09-28 15:07:14 +03:00
Dan Johansson	6a0f779484	ggml : add run-time detection of neon, i8mm and sve (#9331 ) * ggml: Added run-time detection of neon, i8mm and sve Adds run-time detection of the Arm instructions set features neon, i8mm and sve for Linux and Apple build targets. * ggml: Extend feature detection to include non aarch64 Arm arch * ggml: Move definition of ggml_arm_arch_features to the global data section	2024-09-28 15:06:16 +03:00
Markus Tavenrath	89f9944981	Enable use to the rebar feature to upload buffers to the device. (#9251 )	2024-09-28 12:05:05 +02:00
ochafik	887951beb0	`minja`: generate chat goldens w/ fixed date to support Llama-3.2-3B-Instruct (uses strftime_now)	2024-09-27 19:52:15 +01:00
ochafik	701b664551	`minja`: add `indent` filter to support command-r-plus's chat templates	2024-09-27 19:00:14 +01:00
Georgi Gerganov	b5de3b74a5	readme : update hot topics	2024-09-27 20:57:51 +03:00
ochafik	0093a5e527	`minja`: fix identifiers parsing (when start w/ not/is/etc) and lstrip_blocks corner case (needed by DeepSeek-V2.5	2024-09-27 18:30:44 +01:00
Borislav Stanimirov	44f59b4301	cmake : add option for common library (#9661 )	2024-09-27 10:42:06 +03:00
ochafik	2f25ee30ef	Update README.md	2024-09-27 07:18:07 +01:00
ochafik	86e4f99092	Update README.md	2024-09-27 07:15:25 +01:00
ochafik	e62b5de3cf	`tool-call`: fix functionary-small-3.2 (first tool starts w/ name\n, subsequent are >>>name\n)	2024-09-27 07:06:33 +01:00
ochafik	e33b342da7	`tool-call`: fix passing of tools to template + allow agent to finish	2024-09-27 06:24:22 +01:00
ochafik	f62e688387	`tool-call`: fix crash / test non-tool call case (added llama_sampler_is_grammar_empty)	2024-09-27 06:04:41 +01:00
ochafik	0abfa36ca7	`tool-call`: move usage examples to examples/agent	2024-09-27 05:10:30 +01:00
ochafik	6610ecf965	`server`: rm bad debug code	2024-09-27 04:07:35 +01:00
ochafik	27cd07a056	`json`: fix grammar conversion typo	2024-09-27 03:57:48 +01:00
ochafik	9295ca95db	`tool-call`: fix agent type lints	2024-09-27 03:53:56 +01:00
ochafik	1e5c0e747e	`chat-template`: fix jinja tests (make safe a passthrough)	2024-09-27 03:50:04 +01:00
ochafik	f9c1743bb5	`minja`: fix iterables	2024-09-27 03:36:49 +01:00
ochafik	8299fac07c	`tool-call`: adapt very simple agent + docker isolation from https://github.com/ggerganov/llama.cpp/pull/6389	2024-09-26 21:07:46 +01:00
ochafik	10f9fe8d49	`tool-call`: fix tool call return format	2024-09-26 21:01:04 +01:00
ochafik	c88c932d98	fix gcc error + lint	2024-09-26 19:18:40 +01:00
ochafik	2926089c5d	fix lints	2024-09-26 19:06:29 +01:00
ochafik	5840e10069	`tool-call`: merge & fix jinja template tests into test-chat-template	2024-09-26 19:05:00 +01:00
ochafik	50685f837f	`minja`: add str.title()	2024-09-26 19:03:59 +01:00
ochafik	296331bba3	`minja`: update chat template goldens w/ llama.3.1 arguments workaround	2024-09-26 18:10:27 +01:00
ochafik	9cfe4d7202	`tool-call`: refactor llama_chat_template class + use in validate_model_chat_template	2024-09-26 18:06:03 +01:00
ochafik	cf7bece6a7	`tool-call`: factor chat template away from legacy API	2024-09-26 17:19:29 +01:00
Neo Zhang Jianyu	95bc82fbc0	[SYCL] add missed dll file in package (#9577 ) * update oneapi to 2024.2 * use 2024.1 --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-09-26 17:38:31 +08:00
ochafik	d7ec84f78c	`tool-call`: allow <\|python_tag\|> in functionary-medium-3.1	2024-09-26 06:52:34 +01:00
ochafik	3d2650ce65	fix gcc build	2024-09-26 06:52:34 +01:00
ochafik	749a21c67a	gcc appeasement	2024-09-26 06:08:18 +01:00
ochafik	0c870133d8	`tool-call`: test/fix functionary-medium-v3.1's template (can "look" like llama3.1 template)	2024-09-26 05:56:15 +01:00
ochafik	8e4a9bad8a	`minja`: allow none input to selectattr, and add safe passthrough filter	2024-09-26 05:53:12 +01:00
ochafik	5f5be9cde7	`minja`: gcc tweaks	2024-09-26 05:06:11 +01:00

... 3 4 5 6 7 ...

4102 commits