Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								afa8a9ec9b 
								
							 
						 
						
							
							
								
								llama : add llama_vocab, functions -> methods, naming ( #11110 )  
							
							... 
							
							
							
							* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
							
						 
						
							2025-01-12 11:32:42 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								727368c60f 
								
							 
						 
						
							
							
								
								llama : use LLAMA_TOKEN_NULL ( #11062 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-06 10:52:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f66f582927 
								
							 
						 
						
							
							
								
								llama : refactor src/llama.cpp ( #10902 )  
							
							... 
							
							
							
							* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
							
						 
						
							2025-01-03 10:18:53 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0da5d86026 
								
							 
						 
						
							
							
								
								server : allow using LoRA adapters per-request ( #10994 )  
							
							... 
							
							
							
							* slot.can_batch_with
* lora per request
* test: force disable cache prompt
* move can_batch_with check
* fix condition
* add slow test with llama 8b
* update docs
* move lora change task to queue
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* lora_base
* remove redundant check
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2025-01-02 15:05:18 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45095a61bf 
								
							 
						 
						
							
							
								
								server : clean up built-in template detection ( #11026 )  
							
							... 
							
							
							
							* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
							
						 
						
							2024-12-31 15:22:01 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5896c65232 
								
							 
						 
						
							
							
								
								server : add OAI compat for /v1/completions ( #10974 )  
							
							... 
							
							
							
							* server : add OAI compat for /v1/completions
* add test
* add docs
* better docs 
							
						 
						
							2024-12-31 12:34:13 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Reza Kakhki 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9ba399dfa7 
								
							 
						 
						
							
							
								
								server : add support for "encoding_format": "base64" to the */embeddings endpoints ( #10967 )  
							
							... 
							
							
							
							* add support for base64
* fix base64 test
* improve test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2024-12-24 21:33:04 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									NeverLucky 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								09fe2e7613 
								
							 
						 
						
							
							
								
								server:  allow filtering llama server response fields ( #10940 )  
							
							... 
							
							
							
							* llama_server_response_fields
* llama_server_response_fields_fix_issues
* params fixes
* fix
* clarify docs
* change to "response_fields"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2024-12-24 17:39:49 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								485dc01214 
								
							 
						 
						
							
							
								
								server : add system_fingerprint to chat/completion ( #10917 )  
							
							... 
							
							
							
							* server : add system_fingerprint to chat/completion
* update README 
							
						 
						
							2024-12-23 12:02:44 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								57bb2c40cd 
								
							 
						 
						
							
							
								
								server : fix logprobs, make it OAI-compatible ( #10783 )  
							
							... 
							
							
							
							* server : fix logprobs, make it openai-compatible
* update docs
* add std::log
* return pre-sampling p
* sort before apply softmax
* add comment
* fix test
* set p for sampled token
* update docs
* add --multi-token-probs
* update docs
* add `post_sampling_probs` option
* update docs [no ci]
* remove --multi-token-probs
* "top_probs" with "post_sampling_probs"
* resolve review comments
* rename struct token_prob to prob_info
* correct comment placement
* fix setting prob for sampled token 
							
						 
						
							2024-12-19 15:40:08 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								46828872c3 
								
							 
						 
						
							
							
								
								server : (embeddings) using same format for "input" and "content" ( #10872 )  
							
							... 
							
							
							
							* server : (embeddings) using same format for "input" and "content"
* fix test case
* handle empty input case
* fix test 
							
						 
						
							2024-12-18 10:55:09 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									krystiancha 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								05c3a444b8 
								
							 
						 
						
							
							
								
								server : fill usage info in embeddings and rerank responses ( #10852 )  
							
							... 
							
							
							
							* server : fill usage info in embeddings response
* server : fill usage info in reranking response 
							
						 
						
							2024-12-17 18:00:24 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Michelle Tan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								89d604f2c8 
								
							 
						 
						
							
							
								
								server: Fix has_next_line in JSON response ( #10818 )  
							
							... 
							
							
							
							* Update server JSON response.
* Add unit test to check `has_new_line` JSON response
* Remove `has_new_line` unit test changes.
* Address code review comment: type check for `has_new_line` in unit test 
							
						 
						
							2024-12-14 23:29:45 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									kallewoof 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								484d2f31ae 
								
							 
						 
						
							
							
								
								bug-fix: snprintf prints NULL in place of the last character ( #10419 )  
							
							... 
							
							
							
							* bug-fix: snprintf prints NULL in place of the last character
We need to give snprintf enough space to print the last character and the null character, thus we allocate one extra byte and then ignore it when converting to std::string.
* add comment about extra null-term byte requirement 
							
						 
						
							2024-12-11 14:48:04 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3573fa8e7b 
								
							 
						 
						
							
							
								
								server : (refactor) no more json in server_task input ( #10691 )  
							
							... 
							
							
							
							* server : (refactor) no more json in server_task input
* add test for slots endpoint
* add tests for /props and /slots
* remove task inf_type
* fix CI by adding safe_json_to_str
* add "model_path" to /props
* update readme 
							
						 
						
							2024-12-07 20:21:09 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ce4a7b8493 
								
							 
						 
						
							
							
								
								server : various fixes ( #10704 )  
							
							... 
							
							
							
							* server : various fixes
ggml-ci
* server : show curent seed in slot_params
ggml-ci
* fix /slots endpoint
* Update examples/server/server.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : reflect endpoint response changes in the readme
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-12-07 18:02:05 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c5bc0625f 
								
							 
						 
						
							
							
								
								server : (refactoring) do not rely on JSON internally ( #10643 )  
							
							... 
							
							
							
							* server : (refactoring) reduce usage of json internally
* move all response types to struct
* wip [no ci]
* many fixes
* add virtual function
* fix index
* minor style fix
* add std::move
* refactor handle_completions_generic
* add virtual functions
* remove server.hpp
* clarify server_sent_event RFC specs
* apply review comments
* fix model_alias and completion_probabilities
* small clean up
* remove virtual for to_json_oai_compat()
* naming oai_compat --> oaicompat
* fix unwanted recursive call
* update docs 
							
						 
						
							2024-12-06 11:14:32 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									haopeng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								64ed2091b2 
								
							 
						 
						
							
							
								
								server: Add "tokens per second" information in the backend ( #10548 )  
							
							... 
							
							
							
							* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2024-12-02 14:45:54 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d9d54e498d 
								
							 
						 
						
							
							
								
								speculative : refactor and add a simpler example ( #10362 )  
							
							... 
							
							
							
							* speculative : refactor and add a simpler example
ggml-ci
* speculative : clean-up and add comments and TODOs [no ci]
* speculative : manage context in common_speculative
ggml-ci
* speculative : simplify
ggml-ci
* speculative : simplify (cont)
ggml-ci
* speculative : add --draft-min CLI arg
* speculative : minor fixup
* make : build fixes
* speculative : do not redraft previous drafts
ggml-ci
* speculative : fix the draft sampling
ggml-ci
* speculative : fix compile warning
* common : refactor args
ggml-ci
* common : change defaults [no ci]
* common : final touches
ggml-ci 
							
						 
						
							2024-11-25 09:58:41 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									sasha0552 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								42cadc74bd 
								
							 
						 
						
							
							
								
								server : fix slot selection by lru ( #10126 )  
							
							... 
							
							
							
							* server : fix slot selection by lru, migrate lcs to `size_t`
* minor debug log fix 
							
						 
						
							2024-11-02 18:34:56 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									sasha0552 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d865d1478c 
								
							 
						 
						
							
							
								
								server : fix smart selection of available slot ( #10120 )  
							
							... 
							
							
							
							* Fix smart selection of available slot
* minor fix
* replace vectors of tokens with shorthands 
							
						 
						
							2024-11-01 14:33:14 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8d8ff71536 
								
							 
						 
						
							
							
								
								llama : remove Tail-Free sampling ( #10071 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-10-29 10:42:05 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8125e6cbfc 
								
							 
						 
						
							
							
								
								server : don't overfill the batch during infill ( #10018 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-10-28 08:49:32 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								958367bf53 
								
							 
						 
						
							
							
								
								server : refactor slot input data, move tokenizer to HTTP thread ( #10023 )  
							
							... 
							
							
							
							* server : refactor slot input data, move tokenizer to HTTP thread
* move prompt_tokens.empty() check
* fix incorrect if branch
* fix infinite generation loop
* bring back infill validation
* add infill test
* try fixing format_infill
* fix test
* remove redundant code
* rename completion to inference
* update docs
* use llama_tokens everywhere 
							
						 
						
							2024-10-24 21:51:22 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									VoidIsVoid 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a89f75e1b7 
								
							 
						 
						
							
							
								
								server : handle "logprobs" field with false value ( #9871 )  
							
							... 
							
							
							
							Co-authored-by: Gimling <huangjl@ruyi.ai> 
							
						 
						
							2024-10-14 10:04:36 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c7181bd294 
								
							 
						 
						
							
							
								
								server : reuse cached context chunks ( #9866 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-10-13 18:52:48 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7eee341bee 
								
							 
						 
						
							
							
								
								common : use common_ prefix for common library functions ( #9805 )  
							
							... 
							
							
							
							* common : use common_ prefix for common library functions
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-10-10 22:57:42 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								458367a906 
								
							 
						 
						
							
							
								
								server : better security control for public deployments ( #9776 )  
							
							... 
							
							
							
							* server : more explicit endpoint access settings
* protect /props endpoint
* fix tests
* update server docs
* fix typo
* fix tests 
							
						 
						
							2024-10-08 13:27:04 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f4d2b8846a 
								
							 
						 
						
							
							
								
								llama : add reranking support ( #9510 )  
							
							... 
							
							
							
							* py : add XLMRobertaForSequenceClassification [no ci]
* py : fix scalar-tensor conversion [no ci]
* py : fix position embeddings chop [no ci]
* llama : read new cls tensors [no ci]
* llama : add classigication head (wip) [no ci]
* llama : add "rank" pooling type
ggml-ci
* server : add rerank endpoint
ggml-ci
* llama : aboud ggml_repeat during classification
* rerank : cleanup + comments
* server : accept /rerank endpoint in addition to /v1/rerank [no ci]
* embedding : parse special tokens
* jina : support v1 reranker
* vocab : minor style
ggml-ci
* server : initiate tests for later
ggml-ci
* server : add docs
* llama : add comment [no ci]
* llama : fix uninitialized tensors
* ci : add rerank tests
ggml-ci
* add reranking test
* change test data
* Update examples/server/server.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* add `--reranking` argument
* update server docs
* llama : fix comment [no ci]
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-09-28 17:42:03 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Vinesh Janarthanan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8a308354f6 
								
							 
						 
						
							
							
								
								server : match OAI structured output response ( #9527 )  
							
							
							
						 
						
							2024-09-18 09:50:34 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6262d13e0b 
								
							 
						 
						
							
							
								
								common : reimplement logging ( #9418 )  
							
							... 
							
							
							
							https://github.com/ggerganov/llama.cpp/pull/9418  
						
							2024-09-15 20:46:12 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Mathijs Henquet 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								78203641fe 
								
							 
						 
						
							
							
								
								server : Add option to return token pieces in /tokenize endpoint ( #9108 )  
							
							... 
							
							
							
							* server : added with_pieces functionality to /tokenize endpoint
* server : Add tokenize with pieces tests to server.feature
* Handle case if tokenizer splits along utf8 continuation bytes
* Add example of token splitting
* Remove trailing ws
* Fix trailing ws
* Maybe fix ci
* maybe this fix windows ci?
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2024-09-12 22:30:11 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6e7d133a5f 
								
							 
						 
						
							
							
								
								server : refactor multitask handling ( #9274 )  
							
							... 
							
							
							
							* server : remove multitask from server_task
* refactor completions handler
* fix embeddings
* use res_ok everywhere
* small change for handle_slots_action
* use unordered_set everywhere
* (try) fix test
* no more "mutable" lambda
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* use deque
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-09-02 17:11:51 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ardfork 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								978ba3d83d 
								
							 
						 
						
							
							
								
								Server: Don't ignore llama.cpp params ( #8754 )  
							
							... 
							
							
							
							* Don't ignore llama.cpp params
* Add fallback for max_tokens 
							
						 
						
							2024-08-04 20:16:23 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4e24cffd8c 
								
							 
						 
						
							
							
								
								server : handle content array in chat API ( #8449 )  
							
							... 
							
							
							
							* server : handle content array in chat API
* Update examples/server/utils.hpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-07-12 14:48:15 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								48e6b92cc3 
								
							 
						 
						
							
							
								
								Add chat template support for llama-cli ( #8068 )  
							
							... 
							
							
							
							* add chat template support for llama-cli
* add help message
* server: simplify format_chat
* more consistent naming
* improve
* add llama_chat_format_example
* fix server
* code style
* code style
* Update examples/main/main.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-06-25 21:56:49 +10:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									sasha0552 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7a16ce7db2 
								
							 
						 
						
							
							
								
								server : smart slot selection using Longest Common Prefix ( #7728 )  
							
							... 
							
							
							
							* server : Smart selection of available slot using Longest Common Substring
* add usage
* remove trailing whitespaces
* Use Longest Common Prefix (LCP) instead of LCS
* Rename argument 
							
						 
						
							2024-06-08 10:50:31 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1442677f92 
								
							 
						 
						
							
							
								
								common : refactor cli arg parsing ( #7675 )  
							
							... 
							
							
							
							* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params 
							
						 
						
							2024-06-04 21:23:39 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Findley 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e586ee4259 
								
							 
						 
						
							
							
								
								change default temperature of OAI compat API from 0 to 1 ( #7226 )  
							
							... 
							
							
							
							* change default temperature of OAI compat API from 0 to 1
* make tests explicitly send temperature to OAI API 
							
						 
						
							2024-05-13 12:40:08 +10:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c12452c7ae 
								
							 
						 
						
							
							
								
								JSON: [key] -> .at(key), assert() -> GGML_ASSERT ( #7143 )  
							
							
							
						 
						
							2024-05-08 21:53:08 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1fd9c1741d 
								
							 
						 
						
							
							
								
								clean up json_value & server_log ( #7142 )  
							
							
							
						 
						
							2024-05-08 13:24:14 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Pedro Cuenca 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b97bc3966e 
								
							 
						 
						
							
							
								
								llama : support Llama 3 HF conversion ( #6745 )  
							
							... 
							
							
							
							* Support Llama 3 conversion
The tokenizer is BPE.
* style
* Accept suggestion
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
* llama : add llama_token_is_eog()
ggml-ci
* llama : auto-detect more EOT tokens when missing in KV data
* convert : replacing EOS token is a hack
* llama : fix codegemma EOT token + add TODOs
* llama : fix model type string for 8B model
---------
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-04-21 14:50:41 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Pierrick Hymbert 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								75cd4c7729 
								
							 
						 
						
							
							
								
								ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response ( #6495 )  
							
							... 
							
							
							
							* ci: bench: support sse and fix prompt processing time
server: add tokens usage in stream mode
* ci: bench: README.md EOL
* ci: bench: remove total pp and tg as it is not accurate
* ci: bench: fix case when there is no token generated
* ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics
* ci: bench: fix finish reason rate 
							
						 
						
							2024-04-06 05:40:47 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									JH23X 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								60cdf40cc3 
								
							 
						 
						
							
							
								
								server : handle exception on wrong type in request ( #6452 )  
							
							... 
							
							
							
							Co-authored-by: Jonas Holzner <jonas.holzner.external@hensoldt.net> 
							
						 
						
							2024-04-03 21:09:52 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ad3a0505e3 
								
							 
						 
						
							
							
								
								Server: clean up OAI params parsing function ( #6284 )  
							
							... 
							
							
							
							* server: clean up oai parsing function
* fix response_format
* fix empty response_format
* minor fixes
* add TODO for logprobs
* update docs 
							
						 
						
							2024-03-25 09:42:17 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Pierrick Hymbert 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1b26aebe4d 
								
							 
						 
						
							
							
								
								server: flush stdout after logging in both text and json layout ( #6253 )  
							
							
							
						 
						
							2024-03-23 13:18:45 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								72114edf06 
								
							 
						 
						
							
							
								
								json-schema-to-grammar : fix order of props + non-str const/enum ( #6232 )  
							
							... 
							
							
							
							* json: ordered json in server/schema converter to respect orig order
* json: ws nits
* json: support non-string const / enums 
							
						 
						
							2024-03-22 15:07:44 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5b7b0ac8df 
								
							 
						 
						
							
							
								
								json-schema-to-grammar improvements (+ added to server) ( #5978 )  
							
							... 
							
							
							
							* json: fix arrays (disallow `[,1]`)
* json: support tuple types (`[number, string]`)
* json: support additionalProperties (`{[k: string]: [string,number][]}`)
* json: support required / optional properties
* json: add support for pattern
* json: resolve $ref (and support https schema urls)
* json: fix $ref resolution
* join: support union types (mostly for nullable types I think)
* json: support allOf + nested anyOf
* json: support any (`{}` or `{type: object}`)
* json: fix merge
* json: temp fix for escapes
* json: spaces in output and unrestricted output spaces
* json: add typings
* json:fix typo
* Create ts-type-to-grammar.sh
* json: fix _format_literal (json.dumps already escapes quotes)
* json: merge lit sequences and handle negatives
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
* json: handle pattern repetitions
* Update json-schema-to-grammar.mjs
* Create regex-to-grammar.py
* json: extract repeated regexp patterns to subrule
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* json: handle schema from pydantic Optional fields
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update ts-type-to-grammar.sh
* Update ts-type-to-grammar.sh
* json: simplify nullable fields handling
* json: accept duplicate identical rules
* json: revert space to 1 at most
* json: reuse regexp pattern subrules
* json: handle uuid string format
* json: fix literal escapes
* json: add --allow-fetch
* json: simplify range escapes
* json: support negative ranges in patterns
* Delete commit.txt
* json: custom regex parser, adds dot support & JS-portable
* json: rm trailing spaces
* Update json-schema-to-grammar.mjs
* json: updated server & chat `( cd examples/server && ./deps.sh )`
* json: port fixes from mjs to python
* Update ts-type-to-grammar.sh
* json: support prefixItems alongside array items
* json: add date format + fix uuid
* json: add date, time, date-time formats
* json: preserve order of props from TS defs
* json: port schema converter to C++, wire in ./server
* json: nits
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* json: fix mjs implementation + align outputs
* Update json-schema-to-grammar.mjs.hpp
* json: test C++, JS & Python versions
* json: nits + regen deps
* json: cleanup test
* json: revert from c++17 to 11
* json: nit fixes
* json: dirty include for test
* json: fix zig build
* json: pass static command to std::system in tests (fixed temp files)
* json: fix top-level $refs
* json: don't use c++20 designated initializers
* nit
* json: basic support for reserved names `{number:{number:{root:number}}}`
* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)
* json: re-ran server deps.sh
* json: simplify test
* json: support mix of additional props & required/optional
* json: add tests for some expected failures
* json: fix type=const in c++, add failure expectations for non-str const&enum
* json: test (& simplify output of) empty schema
* json: check parsing in test + fix value & string refs
* json: add server tests for OAI JSON response_format
* json: test/fix top-level anyOf
* json: improve grammar parsing failures
* json: test/fix additional props corner cases
* json: fix string patterns (was missing quotes)
* json: ws nit
* json: fix json handling in server when there's no response_format
* json: catch schema conversion errors in server
* json: don't complain about unknown format type in server if unset
* json: cleaner build of test
* json: create examples/json-schema-pydantic-example.py
* json: fix date pattern
* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
* json: indent 4 spaces
* json: fix naming of top-level c++ function (+ drop unused one)
* json: avoid using namespace std
* json: fix zig build
* Update server.feature
* json: iostream -> fprintf
* json: space before & refs for consistency
* json: nits 
							
						 
						
							2024-03-21 11:50:43 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Karthick 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								47cc7a7bf9 
								
							 
						 
						
							
							
								
								Server: Handle n_keep parameter in the request ( #6174 )  
							
							
							
						 
						
							2024-03-20 12:02:34 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								99b71c068f 
								
							 
						 
						
							
							
								
								Server: Use multi-task for embeddings endpoint ( #6001 )  
							
							... 
							
							
							
							* use multitask for embd endpoint
* specify types
* remove redundant {"n_predict", 0} 
							
						 
						
							2024-03-13 11:39:11 +01:00