llama.cpp

Author	SHA1	Message	Date
ochafik	adc673c355	agent: add --think "tool", default to local tools endpoint, support --temperature, fix --seed	2024-12-05 21:32:08 +00:00
ochafik	f9b1969097	Update README.md	2024-11-09 19:00:53 +00:00
ochafik	5789f69d2d	`minja`: don't explode upon referencing a field on an array (fixes Hermes tool use template)	2024-11-09 18:57:09 +00:00
ochafik	c059aecd37	`agent`: memorize, search_memory (sqlite-vec + sqlite-lembed), fetch + docling (pdf -> markdown), sparql for dbpedia and wikidata	2024-11-09 18:25:34 +00:00
ochafik	bc52c0a4f0	`agent`: add missing tool name in response!	2024-10-31 15:01:17 +00:00
ochafik	479c1520b1	`tool-call`: fix qwen template test	2024-10-31 14:49:59 +00:00
ochafik	fe967b61a1	Update README.md	2024-10-31 14:37:55 +00:00
ochafik	f5f74751b9	nits	2024-10-31 14:28:52 +00:00
ochafik	c4a8050120	Update README.md	2024-10-31 14:27:40 +00:00
ochafik	9477c54676	`tool-call`: functionary-small-v3.2 test now green	2024-10-31 14:11:34 +00:00
ochafik	b35aa4ae1c	`tool-call`: add LLAMA_UPDATE_GOLDENS env for test-chat-template	2024-10-31 13:53:33 +00:00
ochafik	c773516d57	`tool-call`: don't use -fa w/ Mistral-Nemo (hard crashes?)	2024-10-31 13:53:11 +00:00
ochafik	f5b7825595	`tool-call`: code_interpreter & system + tool call support for all jinja templates!	2024-10-31 13:52:46 +00:00
ochafik	c395d4804f	`tool-call`: behaviour-based detection of template features	2024-10-31 13:45:10 +00:00
ochafik	e8d9d711f6	Update tool_call.feature	2024-10-31 04:50:38 +00:00
ochafik	7d9c90f46b	`tool-call`: nemo tweak (accept raw sql again)	2024-10-31 04:39:40 +00:00
ochafik	542853b34b	`tool-call`: greedy sampling in server tests + tweak prompt	2024-10-31 04:38:22 +00:00
ochafik	be9de3ed8a	Update llama-sampling.cpp	2024-10-31 03:58:15 +00:00
ochafik	61655b9cdd	Merge remote-tracking branch 'origin/master' into tool-call	2024-10-31 01:45:07 +00:00
Olivier Chafik	e4d5449638	`tool-calls`: test Qwen2.5-7B-Instruct-Q4_K_M.gguf	2024-10-30 21:40:15 +00:00
Sergio López	61408e7fad	kompute: add backend registry / device interfaces (#10045 ) Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-10-30 17:01:52 +01:00
Diego Devesa	b9e02e8184	ggml : fix memory leaks when loading invalid gguf files (#10094 ) * ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file	2024-10-30 14:51:21 +01:00
ochafik	5227321dfd	`tool-call`: when slow server tests fail, hint to run `python scripts/fetch_server_test_models.py`	2024-10-30 12:40:22 +00:00
ochafik	35ac17f3f1	`tool-call`: fix missing initializer errors	2024-10-30 12:38:34 +00:00
Rich Dougherty	6763f713bb	readme : more lora detail in main example readme (#10064 )	2024-10-30 13:22:39 +01:00
Rich Dougherty	79a2bc042d	convert : more detailed convert lora usage docs (#10065 )	2024-10-30 13:22:21 +01:00
ochafik	3ebdb2b805	`tool-call`: support tool_use variant in llama_chat_template_from_model + drop llama_get_chat_template	2024-10-30 10:07:10 +00:00
xctan	fc83a9e584	ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029 ) * ggml : RISC-V vector gemv for q4_0_8x8 * ggml : Added WIP rvv q4_0_8x8 gemm * ggml : Added initial implementation of rvv gemm * ggml : optimize gemm to avoid register spillover * ggml : Fix GCC rvv load alignment issue * ggml : Format gemm rvv code * ggml : Fix a typo in RVV q4_0_8_8 GEMM	2024-10-30 09:00:40 +02:00
Diego Devesa	c5b0f4b5d9	llama : refactor model loader with backend registry (#10026 )	2024-10-30 02:01:23 +01:00
Olivier Chafik	92c384a5e8	nits	2024-10-29 17:24:59 +00:00
Olivier Chafik	773ff91b7a	`tool-call`: force printing of lazy grammar trigger tokens to regularize function call parsing	2024-10-29 15:26:51 +00:00
Olivier Chafik	fa4c1119c9	`tool-call`: use functionary-small-v3.2-Q8_0.gguf in test (Q4_K_M too dumb for function call)	2024-10-29 15:25:37 +00:00
Olivier Chafik	64287a328d	`tool-call`: test Hermes-3-Llama-3.1-8B	2024-10-29 14:52:25 +00:00
Changyeon Kim	8f275a7c45	ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763 ) * ggml: Add POOL2D OP for GPU ACC to the Vulkan. - The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend. - A GGML_OP_POOL_2D shader has been added. (Pooling) - The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * [fix] Correct the incorrect order of the parameters. fix casting to int. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> --------- Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>	2024-10-29 09:52:56 +01:00
Georgi Gerganov	8d8ff71536	llama : remove Tail-Free sampling (#10071 ) ggml-ci	2024-10-29 10:42:05 +02:00
ochafik	aefac1e5cb	`tool-call`: update scripts/fetch_server_test_models.py	2024-10-28 23:57:23 +00:00
ochafik	b825440c81	`tool-call`: use Q4_K_M models	2024-10-28 23:56:40 +00:00
ochafik	74d71a673e	`agent`: simplify syntax (default tools to local w/ default port)	2024-10-28 23:54:01 +00:00
ochafik	b51c71c734	`tool-call`: remove duplicate script to fetch templates	2024-10-28 21:35:18 +00:00
arch-btw	61715d5cc8	llama : Add IBM granite template (#10013 ) * Add granite template to llama.cpp * Add granite template to test-chat-template.cpp * Update src/llama.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Update tests/test-chat-template.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Added proper template and expected output * Small change to \n Small change to \n * Add code space & Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Fix spacing * Apply suggestions from code review * Update src/llama.cpp --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-10-28 18:45:33 +01:00
Georgi Gerganov	07028f9d74	flake.lock: Update (#10063 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/4c2fcb090b1f3e5b47eaa7bd33913b574a11e0a0?narHash=sha256-/uilDXvCIEs3C9l73JTACm4quuHUsIHcns1c%2BcHUJwA%3D' (2024-10-18) → 'github:NixOS/nixpkgs/2768c7d042a37de65bb1b5b3268fc987e534c49d?narHash=sha256-AlcmCXJZPIlO5dmFzV3V2XF6x/OpNWUV8Y/FMPGd8Z4%3D' (2024-10-23) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-10-28 08:41:24 -07:00
ochafik	ec547e4137	`tool-call`: add tests: tool_call=none, parallel_tool_calls=true	2024-10-28 10:04:00 +00:00
R0CKSTAR	524afeec9d	musa: workaround for Guilty Lockup in cleaning src0 (#10042 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-10-28 10:02:48 +01:00
Georgi Gerganov	8125e6cbfc	server : don't overfill the batch during infill (#10018 ) ggml-ci	2024-10-28 08:49:32 +02:00
ochafik	168add7ec8	Update tool_call.feature	2024-10-28 02:06:00 +00:00
ochafik	dd6d0241a7	`tool-call`: script to prefetch models used in server tests	2024-10-28 02:01:00 +00:00
ochafik	7fde6d0091	`tool_call`: test no tool call on a real model + rename scenarios	2024-10-28 02:00:09 +00:00
ochafik	c88095e3fc	space nits	2024-10-28 00:27:04 +00:00
ochafik	9a86ea79a2	`tool-call`: slow tool call integration tests	2024-10-28 00:26:40 +00:00
Georgi Gerganov	8841ce3f43	llama : switch KQ multiplication to F32 precision by default (#10015 ) ggml-ci	2024-10-27 20:59:58 +02:00

1 2 3 4 5 ...

4158 commits