llama.cpp

Author	SHA1	Message	Date
ochafik	18d5a1b2ca	nits	2025-01-29 02:15:34 +00:00
ochafik	47be437356	Text fireworks v2 template	2025-01-29 01:51:07 +00:00
ochafik	4cdbb8c53f	Revert breaking minja change	2025-01-29 01:50:49 +00:00
ochafik	64263910d8	Fix firefunction w/ jinja: requires two variables, use the chat handlers everywhere templates are used	2025-01-29 01:15:44 +00:00
ochafik	d603d067d5	sync: minja	2025-01-28 23:49:04 +00:00
ochafik	4f257550a2	minja: sync on https://github.com/google/minja/pull/33	2025-01-28 23:46:51 +00:00
Emreerdog	794fe23f29	cmake: add hints for locating ggml on Windows using Llama find-package (#11466 )	2025-01-28 19:22:06 -04:00
peidaqi	cf8cc856d7	server : Fixed wrong function name in llamacpp server unit test (#11473 ) The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True	2025-01-29 00:03:42 +01:00
Xuan-Son Nguyen	d0c08040b6	ci : fix build CPU arm64 (#11472 ) * ci : fix build CPU arm64 * failed, trying ubuntu 22 * vulkan: ubuntu 24 * vulkan : jammy --> noble	2025-01-29 00:02:56 +01:00
uvos	be5ef7963f	HIP: Supress transformation warning in softmax.cu loops with bounds not known at compile time can not be unrolled. when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.	2025-01-28 23:06:32 +01:00
Nikita Sarychev	cae9fb4361	HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080 ) This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.	2025-01-28 16:42:20 +01:00
ochafik	cad1448ac7	Disable test-chat-handler on win32 like the other grammar-related tests	2025-01-28 14:46:37 +00:00
Eric Curtin	7fee2889e6	Add github protocol pulling and http:// (#11465 ) As pulling protocols to llama-run Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-28 14:45:41 +00:00
ochafik	cd63ba435e	beef up test-chat-handler w/ delta expectations	2025-01-28 14:40:23 +00:00
Nuno	d7d1eccacc	docker: allow installing pip packages system-wide (#11437 ) Signed-off-by: rare-magma <rare-magma@posteo.eu>	2025-01-28 14:17:25 +00:00
someone13574	4bf3119d61	cmake : don't fail on `GGML_CPU=OFF` (#11457 )	2025-01-28 15:15:34 +01:00
ochafik	ba10b47ae5	Add missing link dep for windows build	2025-01-28 10:52:14 +00:00
ochafik	b5a74d1a24	Simplify parser defs (incremental parsing for streaming will need more thinking)	2025-01-28 10:48:11 +00:00
Nuno	f643120bad	docker: add perplexity and bench commands to full image (#11438 ) Signed-off-by: rare-magma <rare-magma@posteo.eu>	2025-01-28 10:42:32 +00:00
ochafik	ec4aeaf18a	Revert "Allow tool use + streaming" This reverts commit `62717145f7`.	2025-01-28 10:29:17 +00:00
Akarshan Biswas	6e84b0ab8e	SYCL : SOFTMAX F16 mask support and other fixes (#11261 ) Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021. To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it). * SYCL: SOFTMAX F16 mask support and other fixes * test-backend-ops: Add F16 mask test cases	2025-01-28 09:56:58 +00:00
ochafik	62d45a552f	Disable slow tests where appropriate, + nits	2025-01-28 09:47:41 +00:00
ochafik	d274ffcc95	build: Add missing optional include for gcc	2025-01-28 09:29:31 +00:00
ochafik	0a51e514f6	Update test-chat-handler.cpp	2025-01-28 09:24:35 +00:00
Olivier Chafik	2f99236f77	Tool-call: do last partial parse upon limit stop	2025-01-28 09:23:19 +00:00
Olivier Chafik	6d5682909f	Cleanup dead code in llama_3_1 tool call code	2025-01-28 09:22:26 +00:00
Olivier Chafik	62717145f7	Allow tool use + streaming	2025-01-28 09:22:03 +00:00
Michael Engel	2b8525d5c8	Handle missing model in CLI parameters for llama-run (#11399 ) The HTTP client in llama-run only prints an error in case the download of a resource failed. If the model name in the CLI parameter list is missing, this causes the application to crash. In order to prevent this, a check for the required model parameter has been added and errors for resource downloads get propagated to the caller. Signed-off-by: Michael Engel <mengel@redhat.com>	2025-01-28 08:32:40 +00:00
ochafik	ef9efc9ed3	Fix Llama 3.1 (incl. constrained builtin tools e.g. `<\|python_tag\|>foo.call(arg=vallue)`)	2025-01-28 01:04:06 +00:00
ochafik	2d607f1a68	Update test-chat-handler.cpp	2025-01-27 23:29:28 +00:00
ochafik	b565ab2ab1	comment out broken tests in test_tool_call.py	2025-01-27 23:02:15 +00:00
ochafik	cafea60922	Split e2e test_tool_call from test_chat_completion	2025-01-27 22:46:33 +00:00
ochafik	90effb845f	Pass grammar laziness all the way down to sampler (need to print special trigger tokens e.g. for Nemo even w/ tool_choice=required)	2025-01-27 22:46:17 +00:00
ochafik	ad229783c5	updated tool call example to be less ambiguous (deepseek likes to rant about hello world)	2025-01-27 22:44:44 +00:00
ochafik	fa065eb095	Rehabilitate test_format_detection	2025-01-27 20:46:03 +00:00
ochafik	add9124115	fix test-chat-handler grammar tests	2025-01-27 20:13:09 +00:00
Eric Curtin	a4417ddda9	Add new hf protocol for ollama (#11449 ) https://huggingface.co/docs/hub/en/ollama Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-27 19:36:10 +01:00
ochafik	118f799ae4	DeepSeek-R1: implement grammar constraints	2025-01-27 17:52:46 +00:00
ochafik	92ac336dfa	Prepare DeepSeek-R1-Distill-Llama-8B support	2025-01-27 17:26:43 +00:00
ochafik	09971e626c	Update test_chat_completion.py	2025-01-27 15:43:03 +00:00
ochafik	67709552ad	tool-call: compact json output to cap # tokens generated	2025-01-27 15:42:27 +00:00
ochafik	57f40e366b	tool-call: fix lazy grammar & mixed content + tool calls parsing	2025-01-27 15:41:54 +00:00
ochafik	2efa0c27bf	tool-call: add weather tool e2e tests	2025-01-27 15:02:09 +00:00
ochafik	15ec01e896	jinja: only add special tokens if template doesn't seem to handle them	2025-01-27 14:28:11 +00:00
ochafik	da606d8d41	tool-call: remove nonsensical code_interpreter code	2025-01-27 14:19:20 +00:00
Haus1	d6d24cd9ed	AMD: parse the architecture as supplied by gcnArchName (#11244 ) The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.	2025-01-27 14:58:17 +01:00
lexasub	a5203b4465	llama : minor fixes for up llama load model speed (#11448 ) * impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30% * llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings * Update src/llama-vocab.cpp --------- Co-authored-by: lexasub <empty@empty.ru> Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-27 14:42:09 +01:00
ochafik	bddc1bebcc	tool-call: fix special handling of special trigger tokens (Nemo)	2025-01-27 11:37:41 +00:00
Johannes Gäßler	df984e0147	llama: refactor llama_decode_impl (#11381 )	2025-01-27 12:07:12 +01:00
Ihar Hrachyshka	acd38efee3	metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441 ) This fixes segmentation fault error when running tests when no metal devices are available (for example, when not linked with Core Graphics framework or otherwise).	2025-01-27 09:41:59 +02:00

1 2 3 4 5 ...

4956 commits