llama.cpp

Author	SHA1	Message	Date
Eric Curtin	05f63cc9ee	Update documentation (#11373 ) To show -n, -ngl, --ngl is acceptable. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-23 20:04:31 +00:00
Eric Curtin	f7fb43cd0b	Add -ngl (#11372 ) Most other llama.cpp cli tools accept -ngl with a single dash. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-23 16:16:18 +00:00
Xuan Son Nguyen	5845661640	server : add more clean up when cancel_tasks is called (#11340 ) * server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if	2025-01-23 13:56:05 +01:00
Eric Curtin	f211d1dc10	Treat hf.co/ prefix the same as hf:// (#11350 ) ollama uses hf.co/ to specify huggingface prefix, like RamaLama uses hf:// Treat them similarly. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-23 10:38:20 +00:00
amd-dwang	955a6c2d91	Vulkan-run-test: fix mmq_wg_denoms (#11343 ) There should be a copy-and-paste error here. mmq_wg_denoms should be used together with warptile_mmq, instead of wg_denoms.	2025-01-23 08:14:28 +01:00
Jeff Bolz	1971adf55e	vulkan: sort shaders for more deterministic binary (#11315 ) Fixes #11306.	2025-01-23 08:07:50 +01:00
Jeff Bolz	5245729e33	vulkan: fix diag_mask_inf (#11323 ) With robustbufferaccess disabled, this shader was showing OOB stores. There is a bounds check in the code, but the workgrouop dimensions were reversed vs CUDA and it was running the wrong number of threads. So fix the workgroup dimensions and disable robustness for this pipeline.	2025-01-23 08:01:17 +01:00
Olivier Chafik	46415d7a51	Fix lazy trigger handling	2025-01-22 19:08:19 +00:00
Olivier Chafik	c2d836f9d0	Update real tool call tests (use less models)	2025-01-22 18:47:32 +00:00
Olivier Chafik	a46de6a03a	Add grammar options + rename builder to common_grammar_builder	2025-01-22 18:36:04 +00:00
Olivier Chafik	cdfa8b9d4f	Update chat-template.hpp	2025-01-22 18:35:24 +00:00
Olivier Chafik	5e358ade59	fix msg init warning	2025-01-22 18:35:20 +00:00
Diego Devesa	6152129d05	main : update README documentation for batch size (#11353 ) * main : update README documentation for batch size * fix formatting * minor	2025-01-22 19:22:20 +01:00
Georgi Gerganov	16d3df7ab0	readme : add plugin links (#11355 )	2025-01-22 19:44:26 +02:00
Diego Devesa	12c2bdf2de	server : fix draft context not being released (#11354 )	2025-01-22 17:44:40 +01:00
Olivier Chafik	f0231a586e	fix common_chat_msg invocations	2025-01-22 16:25:51 +00:00
Olivier Chafik	d186721e41	Merge remote-tracking branch 'origin/master' into tool-call	2025-01-22 16:22:16 +00:00
Olivier Chafik	c64d2becb1	`minja`: sync at `0f5f7f2b37` (#11352 )	2025-01-22 16:16:27 +00:00
Olivier Chafik	9ccc62b3c9	Sync minja after https://github.com/google/minja/pull/29	2025-01-22 14:32:18 +00:00
Jiří Podivín	96f4053934	Adding logprobs to /v1/completions (#11344 ) Signed-off-by: Jiri Podivin <jpodivin@redhat.com>	2025-01-22 12:51:32 +01:00
Olivier Chafik	30d33d9f68	Update test_chat_completion.py	2025-01-22 11:42:36 +00:00
Olivier Chafik	c6a22edc57	Greedy sampling in tool call tests	2025-01-22 11:41:43 +00:00
Olivier Chafik	cce1166b37	Update tool-call.cpp	2025-01-22 11:25:26 +00:00
Olivier Chafik	a4226365bf	nits	2025-01-22 11:23:37 +00:00
Olivier Chafik	63387c6dca	smaller diff	2025-01-22 11:14:25 +00:00
Olivier Chafik	82b6e9a5c3	merge common_tool_calls into common_chat_msg	2025-01-22 11:05:05 +00:00
Olivier Chafik	01b345be0f	Merge remote-tracking branch 'origin/master' into tool-call	2025-01-22 10:02:23 +00:00
Olivier Chafik	a94f3b2727	`common`: utils to split / join / repeat strings (from json converter) (#11342 ) * Factor string_join, string_split, string_repeat into common * json: refactor to surface a versatile builder * Update common.cpp	2025-01-22 09:51:44 +00:00
tc-mb	3e3357fd77	llava : support Minicpm-omni (#11289 ) * init * add readme * update readme * no use make * update readme * update fix code * fix editorconfig-checker * no change convert py * use clip_image_u8_free	2025-01-22 09:35:48 +02:00
Olivier Chafik	2dd09c792f	more cleanups	2025-01-22 03:20:47 +00:00
Olivier Chafik	28cac497a6	drop llama_sampler_accept_str	2025-01-22 02:38:04 +00:00
Olivier Chafik	e211629b89	Merge branch 'string_utils' into tool-call	2025-01-22 02:27:10 +00:00
Olivier Chafik	5140d7a00b	Update common.cpp	2025-01-22 02:25:09 +00:00
Olivier Chafik	41a613bbd3	Merge branch 'string_utils' into tool-call	2025-01-22 02:22:20 +00:00
Olivier Chafik	03fe80f1bb	drop unused fs_list_files	2025-01-22 02:22:03 +00:00
Olivier Chafik	4de5cf8a10	json: refactor to surface a versatile builder	2025-01-22 02:19:23 +00:00
Olivier Chafik	9a5acbb4a3	Factor string_join, string_split, string_repeat into common	2025-01-22 02:17:34 +00:00
Olivier Chafik	9e8b43f993	follow enum naming style for tool call styles	2025-01-22 02:13:02 +00:00
Olivier Chafik	5268ec8947	Refactor string helpers into common	2025-01-22 02:08:18 +00:00
Olivier Chafik	d77fecc3dc	shrink diff in json conversion code	2025-01-22 01:54:17 +00:00
Olivier Chafik	3972945798	common_tool_call rename	2025-01-22 01:54:08 +00:00
Olivier Chafik	ef61a4c79e	minimize diffs	2025-01-22 01:46:51 +00:00
Olivier Chafik	dbf841b0d2	Push laziness down to grammar impl	2025-01-22 01:25:54 +00:00
Olivier Chafik	77f4098c83	Delete update_jinja_goldens.py	2025-01-21 14:41:59 +00:00
Olivier Chafik	f6e73dac43	Remove examples/agent (moved to https://gist.github.com/ochafik/9246d289b7d38d49e1ee2755698d6c79 )	2025-01-21 14:41:56 +00:00
Olivier Chafik	b49d0521e9	rm tests/test-minja from makefile	2025-01-21 14:12:38 +00:00
Olivier Chafik	fec0260366	Merge remote-tracking branch 'origin/master' into tool-call	2025-01-21 13:44:58 +00:00
Olivier Chafik	6171c9d258	Add Jinja template support (#11016 ) * Copy minja from `58f0ca6dd7` * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (https://github.com/google/minja/pull/22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to `b8437df626` * Update minja to https://github.com/google/minja/pull/25 * Update minja from https://github.com/google/minja/pull/27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-21 13:18:51 +00:00
Xuan Son Nguyen	e28245f35f	export-lora : fix tok_embd tensor (#11330 )	2025-01-21 14:07:12 +01:00
Radoslav Gerganov	6da5bec81c	rpc : better caching of the base buffer pointer (#11331 ) There is no need to use map, just store the base pointer in the buffer context.	2025-01-21 15:06:41 +02:00

... 2 3 4 5 6 ...

4970 commits