Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ce8784bdb1 
								
							 
						 
						
							
							
								
								server : fix format_infill ( #10724 )  
							
							... 
							
							
							
							* server : fix format_infill
* fix
* rename
* update test
* use another model
* update test
* update test
* test_invalid_input_extra_req 
							
						 
						
							2024-12-08 23:04:29 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e52522b869 
								
							 
						 
						
							
							
								
								server : bring back info of final chunk in stream mode ( #10722 )  
							
							... 
							
							
							
							* server : bring back into to final chunk in stream mode
* clarify a bit
* traling space 
							
						 
						
							2024-12-08 20:38:51 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3573fa8e7b 
								
							 
						 
						
							
							
								
								server : (refactor) no more json in server_task input ( #10691 )  
							
							... 
							
							
							
							* server : (refactor) no more json in server_task input
* add test for slots endpoint
* add tests for /props and /slots
* remove task inf_type
* fix CI by adding safe_json_to_str
* add "model_path" to /props
* update readme 
							
						 
						
							2024-12-07 20:21:09 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c5bc0625f 
								
							 
						 
						
							
							
								
								server : (refactoring) do not rely on JSON internally ( #10643 )  
							
							... 
							
							
							
							* server : (refactoring) reduce usage of json internally
* move all response types to struct
* wip [no ci]
* many fixes
* add virtual function
* fix index
* minor style fix
* add std::move
* refactor handle_completions_generic
* add virtual functions
* remove server.hpp
* clarify server_sent_event RFC specs
* apply review comments
* fix model_alias and completion_probabilities
* small clean up
* remove virtual for to_json_oai_compat()
* naming oai_compat --> oaicompat
* fix unwanted recursive call
* update docs 
							
						 
						
							2024-12-06 11:14:32 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1da7b76569 
								
							 
						 
						
							
							
								
								server : fix speculative decoding with context shift ( #10641 )  
							
							... 
							
							
							
							* server : fix speculative decoding with context shift
ggml-ci
* server : take into account speculative limits
ggml-ci
* server : add tests 
							
						 
						
							2024-12-04 22:38:20 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									haopeng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								64ed2091b2 
								
							 
						 
						
							
							
								
								server: Add "tokens per second" information in the backend ( #10548 )  
							
							... 
							
							
							
							* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2024-12-02 14:45:54 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b782e5c7d4 
								
							 
						 
						
							
							
								
								server : add more test cases ( #10569 )  
							
							... 
							
							
							
							* server : add split model test
* add test speculative
* add invalid cases 
							
						 
						
							2024-11-29 21:48:56 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45abe0f74e 
								
							 
						 
						
							
							
								
								server : replace behave with pytest ( #10416 )  
							
							... 
							
							
							
							* server : replace behave with pytest
* fix test on windows
* misc
* add more tests
* more tests
* styling
* log less, fix embd test
* added all sequential tests
* fix coding style
* fix save slot test
* add parallel completion test
* fix parallel test
* remove feature files
* update test docs
* no cache_prompt for some tests
* add test_cache_vs_nocache_prompt 
							
						 
						
							2024-11-26 16:20:18 +01:00