llama.cpp

Author	SHA1	Message	Date
Olivier Chafik	c397bd1f5f	tweak delta logic	2025-02-03 17:57:38 +00:00
Olivier Chafik	df3474e2c2	tool-calls: r1: add missing <｜tool▁calls▁end｜> to grammar!	2025-02-03 17:33:14 +00:00
Olivier Chafik	08271b5505	Merge branch 'jinja-chatml' into r1-toolcall	2025-02-03 17:32:38 +00:00
Olivier Chafik	b2dd490926	add missing try catch around jinja parsing to default to chatml	2025-02-03 17:32:12 +00:00
Olivier Chafik	4cb0e1d873	Merge branch 'jinja-chatml' into r1-toolcall	2025-02-03 17:15:14 +00:00
Olivier Chafik	2b3c4829a3	fix build / rm diff	2025-02-03 16:34:43 +00:00
Olivier Chafik	aa98e59038	fix bad merge	2025-02-03 14:01:49 +00:00
Olivier Chafik	5d18d76b69	fix double bos issue (drop bos/eos tokens from jinja template)	2025-02-03 13:59:16 +00:00
Olivier Chafik	cf83623a47	fix typo	2025-02-03 13:58:46 +00:00
ochafik	a76073cf88	minimize diffs	2025-02-03 10:58:52 +00:00
ochafik	77ae97e7d6	Update test_tool_call.py	2025-02-03 10:28:30 +00:00
ochafik	1e9acd2d31	tool-call: allow `--jinja --chat-template chatml`	2025-02-03 04:07:11 +00:00
ochafik	5e6f2a21ae	add deepseek models to server tool call section in readme	2025-02-03 02:44:42 +00:00
ochafik	19bea4ecc3	tell DS R1 not to overthink (weather test)	2025-02-03 02:24:30 +00:00
ochafik	ae9d5812a7	tool-calls: add DeepSeek R1 Qwen 7B to server test_hello_world	2025-02-03 02:24:30 +00:00
ochafik	04be723b33	tool-call: fix command-r7b parsing when response is multiline	2025-02-03 02:24:30 +00:00
ochafik	73d08d49cf	tool-call: allow `--jinja --chat-template chatml`	2025-02-03 02:24:30 +00:00
ochafik	08716281f2	rename tests	2025-02-03 02:24:30 +00:00
ochafik	c80cb30938	update logs	2025-02-03 02:24:30 +00:00
ochafik	28345877e4	server/oai: ensure content is null when there are tool calls	2025-02-03 02:24:30 +00:00
ochafik	04d511b5b5	Avoid double bos w/ jinja	2025-02-03 02:24:30 +00:00
ochafik	130ca222c9	DeepSeek R1: parse thoughts / return in separate field in API (non streamed mode)	2025-02-03 02:24:30 +00:00
ochafik	87de852b7f	pass vocab to common_chat_params_init	2025-02-03 02:24:30 +00:00
ochafik	d3b60b8ad8	minja: enhance backfill of templates w/o tools description (use example tool call delta!)	2025-02-03 01:03:04 +00:00
uvos	4d0598e144	HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601 ) This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly	2025-02-02 22:08:05 +01:00
Olivier Chafik	90f9b88afb	nit: more informative crash when grammar sampler fails (#11593 )	2025-02-02 19:58:34 +00:00
Johannes Gäßler	864a0b67a6	CUDA: use mma PTX instructions for FlashAttention (#11583 ) * CUDA: use mma PTX instructions for FlashAttention * __shfl_sync workaround for movmatrix * add __shfl_sync to HIP Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-02-02 19:31:09 +01:00
Eric Curtin	84ec8a58f7	Name colors (#11573 ) It's more descriptive, use #define's so we can use compile-time concatenations. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-02-02 15:14:48 +00:00
Olivier Chafik	bfcce4d693	`tool-call`: support Command R7B (+ return tool_plan "thoughts" in API) (#11585 ) * `tool-call`: support Command R7B (w/ tool_plan return) * `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override * `tool-call`: test cleanup / handle lazy grammar triggers	2025-02-02 09:25:38 +00:00
Olivier Chafik	69804487e0	Fix exotic ci env that lacks ostringstream::str (#11581 )	2025-02-02 09:10:15 +00:00
Michał Moskal	ff227703d6	sampling : support for llguidance grammars (#10224 ) * initial porting of previous LLG patch * update for new APIs * build: integrate llguidance as an external project * use '%llguidance' as marker to enable llg lark syntax * add some docs * clarify docs * code style fixes * remove llguidance.h from .gitignore * fix tests when llg is enabled * pass vocab not model to llama_sampler_init_llg() * copy test-grammar-integration.cpp to test-llguidance.cpp * clang fmt * fix ref-count bug * build and run test * gbnf -> lark syntax * conditionally include llguidance test based on LLAMA_LLGUIDANCE flag * rename llguidance test file to test-grammar-llguidance.cpp * add gh action for llg test * align tests with LLG grammar syntax and JSON Schema spec * llama_tokenizer() in fact requires valid utf8 * update llg * format file * add $LLGUIDANCE_LOG_LEVEL support * fix whitespace * fix warning * include <cmath> for INFINITY * add final newline * fail llama_sampler_init_llg() at runtime * Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes * simplify #includes * improve doc string for LLAMA_LLGUIDANCE * typo in merge * bump llguidance to 0.6.12	2025-02-02 09:55:32 +02:00
piDack	0cec062a63	llama : add support for GLM-Edge and GLM-Edge-V series models (#10573 ) * add glm edge chat model * use config partial_rotary_factor as rope ratio * support for glm edge model * vision model support * remove debug info * fix format * llava.cpp trailing whitespace * remove unused AutoTokenizer * Update src/llama.cpp for not contain <\|end\|> or </s> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * add edge template * fix chat template * fix confict * fix confict * fix ci err * fix format err * fix template err * 9b hf chat support * format * format clip.cpp * fix format * Apply suggestions from code review * Apply suggestions from code review * Update examples/llava/clip.cpp * fix format * minor : style --------- Co-authored-by: liyuhang <yuhang.li@zhipuai.cn> Co-authored-by: piDack <pcdack@hotmail.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: liyuhang <yuhang.li@aminer.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-02 09:48:46 +02:00
Olivier Chafik	53debe6f3c	ci: use sccache on windows HIP jobs (#11553 )	2025-02-01 18:22:38 +00:00
Olivier Chafik	cfd74c86db	`sync`: minja (`418a2364b5`) (#11574 )	2025-02-01 12:24:51 +00:00
Eric Curtin	ecef206ccb	Implement s3:// protocol (#11511 ) For those that want to pull from s3 Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-02-01 10:30:54 +00:00
Olivier Chafik	5bbc7362cb	ci: simplify cmake build commands (#11548 )	2025-02-01 00:01:20 +00:00
Olivier Chafik	aa6fb13213	`ci`: use sccache on windows instead of ccache (#11545 ) * Use sccache on ci for windows * Detect sccache in cmake	2025-01-31 17:12:40 +00:00
Olivier Chafik	a83f528688	`tool-call`: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539 ) * An empty tool_call_id is better than none! * sync: minja (tool call name optional https://github.com/google/minja/pull/36) * Force-disable parallel_tool_calls if template doesn't support it * More debug logs * Llama 3.x tools: accept / trigger on more varied spaced outputs * Fix empty content for functionary v3.2 tool call * Add proper tool call docs to server README * readme: function calling is supported now * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-31 14:15:25 +00:00
Olivier Chafik	b1bcd309fc	fix stop regression (#11543 )	2025-01-31 13:48:31 +00:00
Olivier Chafik	5783575c9d	Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533 )	2025-01-31 08:24:29 +00:00
Olivier Chafik	4a2b196d03	server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531 )	2025-01-31 10:12:40 +02:00
Steve Grubb	1bd3047a93	common: Add missing va_end (#11529 ) The va_copy man page states that va_end must be called to revert whatever the copy did. For some implementaions, not calling va_end has no consequences. For others it could leak memory.	2025-01-31 07:58:55 +02:00
Daniel Bevenius	a2df2787b3	server : update help metrics processing/deferred (#11512 ) This commit updates the help text for the metrics `requests_processing` and `requests_deferred` to be more grammatically correct. Currently the returned metrics look like this: ```console \# HELP llamacpp:requests_processing Number of request processing. \# TYPE llamacpp:requests_processing gauge llamacpp:requests_processing 0 \# HELP llamacpp:requests_deferred Number of request deferred. \# TYPE llamacpp:requests_deferred gauge llamacpp:requests_deferred 0 ``` With this commit, the metrics will look like this: ```console \# HELP llamacpp:requests_processing Number of requests processing. \# TYPE llamacpp:requests_processing gauge llamacpp:requests_processing 0 \# HELP llamacpp:requests_deferred Number of requests deferred. \# TYPE llamacpp:requests_deferred gauge llamacpp:requests_deferred 0 ``` This is also consistent with the description of the metrics in the server examples [README.md](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-metrics-prometheus-compatible-metrics-exporter).	2025-01-31 06:04:53 +01:00
Olivier Chafik	553f1e46e9	`ci`: ccache for all github worfklows (#11516 )	2025-01-30 22:01:06 +00:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
uvos	27d135c970	HIP: require at least HIP 5.5	2025-01-30 16:25:44 +01:00
uvos	6af1ca48cb	HIP: Prepare reduction operators for wave 64	2025-01-30 16:25:44 +01:00
uvos	c300e68ef4	CUDA/HIP: add warp_size to cuda_device_info	2025-01-30 16:25:44 +01:00
Olivier Chafik	3d804dec76	sync: minja (#11499 )	2025-01-30 10:30:27 +00:00
mgroeber9110	ffd0821c57	vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496 )	2025-01-30 12:10:59 +02:00

1 2 3 4 5 ...

4643 commits