ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								a29dc921ec 
								
							 
						 
						
							
							
								
								fix server test_tool_calls.py  
							
							
							
						 
						
							2025-02-09 21:01:35 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								c0f972bb45 
								
							 
						 
						
							
							
								
								Use --reasoning-format, remove forced thinking for now  
							
							
							
						 
						
							2025-02-08 17:58:33 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
							
							
								
							
							
								098629df15 
								
							 
						 
						
							
							
								
								disable some failing chatml tests  
							
							
							
						 
						
							2025-02-05 16:15:19 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
							
							
								
							
							
								e6d9b52480 
								
							 
						 
						
							
							
								
								align Command R7B w/ --think / reasoning_content behaviour  
							
							
							
						 
						
							2025-02-05 15:47:37 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								f3e9f8b62a 
								
							 
						 
						
							
							
								
								fix test_thoughts  
							
							
							
						 
						
							2025-02-05 12:34:27 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								9d7c3cc51b 
								
							 
						 
						
							
							
								
								--think to force any model to return reasoning_content (or just parse <think> for deepseek r1)  
							
							
							
						 
						
							2025-02-05 12:16:37 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
							
							
								
							
							
								5d60cebbcc 
								
							 
						 
						
							
							
								
								Update test_tool_call.py  
							
							
							
						 
						
							2025-02-04 17:48:29 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
							
							
								
							
							
								933f7a186e 
								
							 
						 
						
							
							
								
								Merge branch 'master' into r1-toolcall  
							
							
							
						 
						
							2025-02-04 15:56:25 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								db288b60cb 
								
							 
						 
						
							
							
								
								tool-call: command r7b fix for normal responses (#11608 )  
							
							... 
							
							
							
							* fix command r7b normal response regex + add to server test
* test multiline non-tool-call responses in test-chat 
							
						 
						
							2025-02-04 15:48:53 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
							
							
								
							
							
								39c1d8163b 
								
							 
						 
						
							
							
								
								return thoughts in reasoning_content field  
							
							
							
						 
						
							2025-02-04 11:37:09 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								1f5ec59809 
								
							 
						 
						
							
							
								
								ensure deepseek r1 thoughts parsed even w/o tool calls  
							
							
							
						 
						
							2025-02-04 04:48:08 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								b6e14a4101 
								
							 
						 
						
							
							
								
								fix mistral expectation  
							
							
							
						 
						
							2025-02-04 04:26:49 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								812544ab8b 
								
							 
						 
						
							
							
								
								server: check that content is null when we get tool_calls  
							
							
							
						 
						
							2025-02-04 04:14:15 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								86994db697 
								
							 
						 
						
							
							
								
								fix spaces  
							
							
							
						 
						
							2025-02-04 03:47:52 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								78b47bb0e9 
								
							 
						 
						
							
							
								
								fix test_calc_result  
							
							
							
						 
						
							2025-02-04 03:46:26 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								326e7002b3 
								
							 
						 
						
							
							
								
								update test_calc_result  
							
							
							
						 
						
							2025-02-04 03:13:13 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
							
							
								
							
							
								30ea3591c9 
								
							 
						 
						
							
							
								
								update to minja's new api  
							
							
							
						 
						
							2025-02-03 23:53:27 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cde3833239 
								
							 
						 
						
							
							
								
								tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616 )  
							
							... 
							
							
							
							* tool-call: allow `--jinja --chat-template chatml`
* fix double bos issue (drop bos/eos tokens from jinja template)
* add missing try catch around jinja parsing to default to chatml
* Simplify default chatml logic 
							
						 
						
							2025-02-03 23:49:27 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
							
							
								
							
							
								4cb0e1d873 
								
							 
						 
						
							
							
								
								Merge branch 'jinja-chatml' into r1-toolcall  
							
							
							
						 
						
							2025-02-03 17:15:14 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
							
							
								
							
							
								5d18d76b69 
								
							 
						 
						
							
							
								
								fix double bos issue (drop bos/eos tokens from jinja template)  
							
							
							
						 
						
							2025-02-03 13:59:16 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								a76073cf88 
								
							 
						 
						
							
							
								
								minimize diffs  
							
							
							
						 
						
							2025-02-03 10:58:52 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								77ae97e7d6 
								
							 
						 
						
							
							
								
								Update test_tool_call.py  
							
							
							
						 
						
							2025-02-03 10:28:30 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								1e9acd2d31 
								
							 
						 
						
							
							
								
								tool-call: allow --jinja --chat-template chatml  
							
							
							
						 
						
							2025-02-03 04:07:11 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								19bea4ecc3 
								
							 
						 
						
							
							
								
								tell DS R1 not to overthink (weather test)  
							
							
							
						 
						
							2025-02-03 02:24:30 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								ae9d5812a7 
								
							 
						 
						
							
							
								
								tool-calls: add DeepSeek R1 Qwen 7B to server test_hello_world  
							
							
							
						 
						
							2025-02-03 02:24:30 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								04be723b33 
								
							 
						 
						
							
							
								
								tool-call: fix command-r7b parsing when response is multiline  
							
							
							
						 
						
							2025-02-03 02:24:30 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								08716281f2 
								
							 
						 
						
							
							
								
								rename tests  
							
							
							
						 
						
							2025-02-03 02:24:30 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4a2b196d03 
								
							 
						 
						
							
							
								
								server : fix --jinja when there's no tools or schema (typo was forcing JSON) ( #11531 )  
							
							
							
						 
						
							2025-01-31 10:12:40 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8b576b6c55 
								
							 
						 
						
							
							
								
								Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )  
							
							... 
							
							
							
							---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2025-01-30 19:13:58 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Nigel Bosch 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								eb7cf15a80 
								
							 
						 
						
							
							
								
								server : add /apply-template endpoint for additional use cases of Minja functionality ( #11489 )  
							
							... 
							
							
							
							* add /apply-template endpoint to server
* remove unnecessary line
* add /apply-template documentation
* return only "prompt" field in /apply-template
* use suggested idea instead of my overly verbose way 
							
						 
						
							2025-01-29 19:45:44 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									peidaqi 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cf8cc856d7 
								
							 
						 
						
							
							
								
								server : Fixed wrong function name in llamacpp server unit test ( #11473 )  
							
							... 
							
							
							
							The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True 
							
						 
						
							2025-01-29 00:03:42 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6171c9d258 
								
							 
						 
						
							
							
								
								Add Jinja template support ( #11016 )  
							
							... 
							
							
							
							* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2025-01-21 13:18:51 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								92bc493917 
								
							 
						 
						
							
							
								
								tests : increase timeout when sanitizers are enabled ( #11300 )  
							
							... 
							
							
							
							* tests : increase timeout when sanitizers are enabled
* tests : add DEFAULT_HTTP_TIMEOUT 
							
						 
						
							2025-01-19 20:22:30 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f30f099228 
								
							 
						 
						
							
							
								
								server : implement cancellable request ( #11285 )  
							
							... 
							
							
							
							* server : implement cancellable request
* fix typo
* httplib 0.18.5
* fix i underflow 
							
						 
						
							2025-01-18 14:12:05 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e6e7c75d94 
								
							 
						 
						
							
							
								
								server : fix extra BOS in infill endpoint ( #11106 )  
							
							... 
							
							
							
							* server : fix extra BOS in infill endpoing
ggml-ci
* server : update infill tests 
							
						 
						
							2025-01-06 15:36:08 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0da5d86026 
								
							 
						 
						
							
							
								
								server : allow using LoRA adapters per-request ( #10994 )  
							
							... 
							
							
							
							* slot.can_batch_with
* lora per request
* test: force disable cache prompt
* move can_batch_with check
* fix condition
* add slow test with llama 8b
* update docs
* move lora change task to queue
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* lora_base
* remove redundant check
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2025-01-02 15:05:18 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45095a61bf 
								
							 
						 
						
							
							
								
								server : clean up built-in template detection ( #11026 )  
							
							... 
							
							
							
							* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
							
						 
						
							2024-12-31 15:22:01 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5896c65232 
								
							 
						 
						
							
							
								
								server : add OAI compat for /v1/completions ( #10974 )  
							
							... 
							
							
							
							* server : add OAI compat for /v1/completions
* add test
* add docs
* better docs 
							
						 
						
							2024-12-31 12:34:13 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Reza Kakhki 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9ba399dfa7 
								
							 
						 
						
							
							
								
								server : add support for "encoding_format": "base64" to the */embeddings endpoints ( #10967 )  
							
							... 
							
							
							
							* add support for base64
* fix base64 test
* improve test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2024-12-24 21:33:04 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Djip007 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2cd43f4900 
								
							 
						 
						
							
							
								
								ggml : more perfo with llamafile tinyblas on x86_64 ( #10714 )  
							
							... 
							
							
							
							* more perfo with llamafile tinyblas on x86_64.
- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71  )
- reduce memory bandwidth
simple tinyblas dispache and more cache freindly
* tinyblas dynamic dispaching
* sgemm: add M blocs.
* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2
* remove not stable test 
							
						 
						
							2024-12-24 18:54:49 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									NeverLucky 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								09fe2e7613 
								
							 
						 
						
							
							
								
								server:  allow filtering llama server response fields ( #10940 )  
							
							... 
							
							
							
							* llama_server_response_fields
* llama_server_response_fields_fix_issues
* params fixes
* fix
* clarify docs
* change to "response_fields"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2024-12-24 17:39:49 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								485dc01214 
								
							 
						 
						
							
							
								
								server : add system_fingerprint to chat/completion ( #10917 )  
							
							... 
							
							
							
							* server : add system_fingerprint to chat/completion
* update README 
							
						 
						
							2024-12-23 12:02:44 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								57bb2c40cd 
								
							 
						 
						
							
							
								
								server : fix logprobs, make it OAI-compatible ( #10783 )  
							
							... 
							
							
							
							* server : fix logprobs, make it openai-compatible
* update docs
* add std::log
* return pre-sampling p
* sort before apply softmax
* add comment
* fix test
* set p for sampled token
* update docs
* add --multi-token-probs
* update docs
* add `post_sampling_probs` option
* update docs [no ci]
* remove --multi-token-probs
* "top_probs" with "post_sampling_probs"
* resolve review comments
* rename struct token_prob to prob_info
* correct comment placement
* fix setting prob for sampled token 
							
						 
						
							2024-12-19 15:40:08 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								152610eda9 
								
							 
						 
						
							
							
								
								server : output embeddings for all tokens when pooling = none ( #10861 )  
							
							... 
							
							
							
							* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-12-18 13:01:41 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0e70ba686e 
								
							 
						 
						
							
							
								
								server : add "tokens" output ( #10853 )  
							
							... 
							
							
							
							* server : add "tokens" output
ggml-ci
* server : update readme
ggml-ci
* server : return tokens ids only if requested
ggml-ci
* tests : improve "tokens" type check
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* server : remove "tokens" from the OAI endpoint
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-12-18 11:05:29 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								46828872c3 
								
							 
						 
						
							
							
								
								server : (embeddings) using same format for "input" and "content" ( #10872 )  
							
							... 
							
							
							
							* server : (embeddings) using same format for "input" and "content"
* fix test case
* handle empty input case
* fix test 
							
						 
						
							2024-12-18 10:55:09 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									krystiancha 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								05c3a444b8 
								
							 
						 
						
							
							
								
								server : fill usage info in embeddings and rerank responses ( #10852 )  
							
							... 
							
							
							
							* server : fill usage info in embeddings response
* server : fill usage info in reranking response 
							
						 
						
							2024-12-17 18:00:24 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Michelle Tan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								89d604f2c8 
								
							 
						 
						
							
							
								
								server: Fix has_next_line in JSON response ( #10818 )  
							
							... 
							
							
							
							* Update server JSON response.
* Add unit test to check `has_new_line` JSON response
* Remove `has_new_line` unit test changes.
* Address code review comment: type check for `has_new_line` in unit test 
							
						 
						
							2024-12-14 23:29:45 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Yüg 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a86ad841f1 
								
							 
						 
						
							
							
								
								server : add flag to disable the web-ui ( #10762 ) ( #10751 )  
							
							... 
							
							
							
							Co-authored-by: eugenio.segala <esegala@deloitte.co.uk> 
							
						 
						
							2024-12-10 18:22:34 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ce8784bdb1 
								
							 
						 
						
							
							
								
								server : fix format_infill ( #10724 )  
							
							... 
							
							
							
							* server : fix format_infill
* fix
* rename
* update test
* use another model
* update test
* update test
* test_invalid_input_extra_req 
							
						 
						
							2024-12-08 23:04:29 +01:00