Xuan Son Nguyen 
								
							 
						 
						
							
							
							
							
								
							
							
								c9e7cbb08b 
								
							 
						 
						
							
							
								
								safer jinja llama_chat_templates struct  
							
							
							
						 
						
							2025-01-20 16:58:29 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								cc50356470 
								
							 
						 
						
							
							
								
								minja: fix vigogne ( https://github.com/google/minja/pull/22 )  
							
							
							
						 
						
							2025-01-18 17:55:04 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								fc60802b6e 
								
							 
						 
						
							
							
								
								Rm unused optional include  
							
							
							
						 
						
							2025-01-18 11:35:54 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								5074e6fecd 
								
							 
						 
						
							
							
								
								Fix copy elision warning  
							
							
							
						 
						
							2025-01-18 10:48:03 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								e63520f37a 
								
							 
						 
						
							
							
								
								Forward decl minja::chat_template to avoid eager json dep  
							
							
							
						 
						
							2025-01-18 10:37:56 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								d5fa351a24 
								
							 
						 
						
							
							
								
								Revert LLAMA_CHATML_TEMPLATE refactor  
							
							
							
						 
						
							2025-01-18 01:04:12 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								81c0d437a5 
								
							 
						 
						
							
							
								
								Attempt to fix linkage of LLAMA_CHATML_TEMPLATE  
							
							
							
						 
						
							2025-01-18 00:56:19 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								40db78963b 
								
							 
						 
						
							
							
								
								Merge remote-tracking branch 'origin/master' into jinja  
							
							
							
						 
						
							2025-01-18 00:44:37 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								b75d0622e4 
								
							 
						 
						
							
							
								
								Refactor common_chat_* functions to accept minja template + use_jinja option  
							
							
							
						 
						
							2025-01-18 00:43:38 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Radoslav Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								667d72846c 
								
							 
						 
						
							
							
								
								rpc : early register backend devices ( #11262 )  
							
							... 
							
							
							
							Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609  
							
						 
						
							2025-01-17 10:57:09 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1b3bb7eeb9 
								
							 
						 
						
							
							
								
								Update arg.cpp  
							
							
							
						 
						
							2025-01-14 00:07:18 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								a6afb2735f 
								
							 
						 
						
							
							
								
								Update common_chat_format_example to use minja template wrapper  
							
							
							
						 
						
							2025-01-13 22:57:35 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								c04c50e40c 
								
							 
						 
						
							
							
								
								Merge remote-tracking branch 'origin/master' into jinja  
							
							
							
						 
						
							2025-01-13 22:26:13 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								8dd4f334a4 
								
							 
						 
						
							
							
								
								Add --jinja to llama-run  
							
							
							
						 
						
							2025-01-13 22:07:49 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								18f257bf1a 
								
							 
						 
						
							
							
								
								Fix deprecation  
							
							
							
						 
						
							2025-01-13 21:30:48 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								78861a3eb2 
								
							 
						 
						
							
							
								
								Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template  
							
							
							
						 
						
							2025-01-13 19:58:15 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								cb72cf1fc3 
								
							 
						 
						
							
							
								
								Merge remote-tracking branch 'origin/master' into jinja  
							
							
							
						 
						
							2025-01-13 19:56:27 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								84a44815f7 
								
							 
						 
						
							
							
								
								cli : auto activate conversation mode if chat template is available ( #11214 )  
							
							... 
							
							
							
							* cli : auto activate conversation mode if chat template is detected
* add warn on bad template
* update readme (writing with the help of chatgpt)
* update readme (2)
* do not activate -cnv for non-instruct models 
							
						 
						
							2025-01-13 20:18:12 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								00b4c3da62 
								
							 
						 
						
							
							
								
								common : support tag-based --hf-repo like on ollama ( #11195 )  
							
							... 
							
							
							
							* common : support tag-based hf_repo like on ollama
* fix build
* various fixes
* small fixes
* fix style
* fix windows build?
* move common_get_hf_file to common.cpp
* fix complain with noreturn 
							
						 
						
							2025-01-13 13:56:23 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9a483999a6 
								
							 
						 
						
							
							
								
								llama : fix chat template gguf key ( #11201 )  
							
							
							
						 
						
							2025-01-12 13:45:14 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								afa8a9ec9b 
								
							 
						 
						
							
							
								
								llama : add llama_vocab, functions -> methods, naming ( #11110 )  
							
							... 
							
							
							
							* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
							
						 
						
							2025-01-12 11:32:42 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a3c1232c3f 
								
							 
						 
						
							
							
								
								arg : option to exclude arguments from specific examples ( #11136 )  
							
							... 
							
							
							
							* arg : option to exclude arguments from specific examples
ggml-ci
* readme : remove old args [no ci] 
							
						 
						
							2025-01-08 12:55:36 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								53ff6b9b9f 
								
							 
						 
						
							
							
								
								GGUF: C++ refactor, backend support, misc fixes ( #11030 )  
							
							... 
							
							
							
							* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types 
							
						 
						
							2025-01-07 18:01:58 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								47182dd03f 
								
							 
						 
						
							
							
								
								llama : update llama_model API names ( #11063 )  
							
							... 
							
							
							
							* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci 
							
						 
						
							2025-01-06 10:55:18 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								727368c60f 
								
							 
						 
						
							
							
								
								llama : use LLAMA_TOKEN_NULL ( #11062 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-06 10:52:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Molly Sophia 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4b0c638b9a 
								
							 
						 
						
							
							
								
								common : disable KV cache shifting automatically for unsupported models ( #11053 )  
							
							... 
							
							
							
							* Disable KV cache shifting automatically for unsupported models
instead of exiting directly
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2025-01-03 14:13:18 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f66f582927 
								
							 
						 
						
							
							
								
								llama : refactor src/llama.cpp ( #10902 )  
							
							... 
							
							
							
							* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
							
						 
						
							2025-01-03 10:18:53 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45095a61bf 
								
							 
						 
						
							
							
								
								server : clean up built-in template detection ( #11026 )  
							
							... 
							
							
							
							* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
							
						 
						
							2024-12-31 15:22:01 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Peter 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6e1531aca5 
								
							 
						 
						
							
							
								
								common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON ( #11013 )  
							
							... 
							
							
							
							In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)
In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)
In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned) 
							
						 
						
							2024-12-31 01:46:06 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								389d79b6b4 
								
							 
						 
						
							
							
								
								Try and work around msvc++ non-macro max resolution quirk  
							
							
							
						 
						
							2024-12-30 04:50:20 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								ce48584f7d 
								
							 
						 
						
							
							
								
								No designated initializers yet  
							
							
							
						 
						
							2024-12-30 04:19:33 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								80138d9007 
								
							 
						 
						
							
							
								
								Add missing <optional> include  
							
							
							
						 
						
							2024-12-30 04:10:20 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								e5113e8d74 
								
							 
						 
						
							
							
								
								Add --jinja and --chat-template-file flags  
							
							
							
						 
						
							2024-12-30 03:50:51 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ochafik 
								
							 
						 
						
							
							
							
							
								
							
							
								abd274a48f 
								
							 
						 
						
							
							
								
								Copy minja from  58f0ca6dd7 
							
							
							
						 
						
							2024-12-30 03:50:51 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Molly Sophia 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0a11f8b7b5 
								
							 
						 
						
							
							
								
								convert : fix RWKV v6 model conversion ( #10913 )  
							
							... 
							
							
							
							* Enable --no-context-shift for llama-perplexity example
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV 6: Fix error in ggml_cuda_op_bin_bcast
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com> 
							
						 
						
							2024-12-20 11:44:58 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								36319dec5d 
								
							 
						 
						
							
							
								
								tts : small QoL for easy model fetch ( #10903 )  
							
							
							
						 
						
							2024-12-19 17:35:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0bf2d10c55 
								
							 
						 
						
							
							
								
								tts : add OuteTTS support ( #10784 )  
							
							... 
							
							
							
							* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder 
							
						 
						
							2024-12-18 19:27:21 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								152610eda9 
								
							 
						 
						
							
							
								
								server : output embeddings for all tokens when pooling = none ( #10861 )  
							
							... 
							
							
							
							* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-12-18 13:01:41 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								644fd71b44 
								
							 
						 
						
							
							
								
								sampling : refactor + optimize penalties sampler ( #10803 )  
							
							... 
							
							
							
							* sampling : refactor + optimize penalties sampler
ggml-ci
* common : apply ignore_eos as logit bias
ggml-ci
* batched : remove penalties sampler
* params : allow penalty_last_n == -1 to be equal to context size
ggml-ci
* common : by default, move the penalties at the end of the sampling chain
ggml-ci
* common : ignore all EOG tokens
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* common : move back the penalties at the front of the sampling chain
ggml-ci
* readme : restore hint about --ignore-eos flag [no ci]
* llama : minor
ggml-ci
* webui : update
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
							
						 
						
							2024-12-16 12:31:14 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c27ac678dd 
								
							 
						 
						
							
							
								
								Opt class for positional argument handling ( #10508 )  
							
							... 
							
							
							
							Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:
  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf 
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2024-12-13 19:34:25 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								adffa6ffd5 
								
							 
						 
						
							
							
								
								common : improve -ctv -ctk CLI arguments ( #10806 )  
							
							... 
							
							
							
							* common : improve ctv ctk cli argument
* regenerate docs
* even better approach
* use std::vector 
							
						 
						
							2024-12-12 22:53:05 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9fdb124304 
								
							 
						 
						
							
							
								
								common : add missing env var for speculative ( #10801 )  
							
							
							
						 
						
							2024-12-12 16:57:32 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Bartowski 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae4b922614 
								
							 
						 
						
							
							
								
								imatrix : Add imatrix to --no-context-shift ( #10766 )  
							
							... 
							
							
							
							This allows for setting the --no-context-shift value in llama-imatrix which is required for models like DeepSeek 
							
						 
						
							2024-12-10 18:23:50 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Yüg 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a86ad841f1 
								
							 
						 
						
							
							
								
								server : add flag to disable the web-ui ( #10762 ) ( #10751 )  
							
							... 
							
							
							
							Co-authored-by: eugenio.segala <esegala@deloitte.co.uk> 
							
						 
						
							2024-12-10 18:22:34 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c2a16c0bdb 
								
							 
						 
						
							
							
								
								server : fix free of spec context and batch ( #10651 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-12-07 11:52:44 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f162d45a21 
								
							 
						 
						
							
							
								
								common : bring back --no-warmup to server ( #10686 )  
							
							
							
						 
						
							2024-12-06 13:29:05 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c5bc0625f 
								
							 
						 
						
							
							
								
								server : (refactoring) do not rely on JSON internally ( #10643 )  
							
							... 
							
							
							
							* server : (refactoring) reduce usage of json internally
* move all response types to struct
* wip [no ci]
* many fixes
* add virtual function
* fix index
* minor style fix
* add std::move
* refactor handle_completions_generic
* add virtual functions
* remove server.hpp
* clarify server_sent_event RFC specs
* apply review comments
* fix model_alias and completion_probabilities
* small clean up
* remove virtual for to_json_oai_compat()
* naming oai_compat --> oaicompat
* fix unwanted recursive call
* update docs 
							
						 
						
							2024-12-06 11:14:32 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								642330ac7c 
								
							 
						 
						
							
							
								
								llama : add enum for built-in chat templates ( #10623 )  
							
							... 
							
							
							
							* llama : add enum for supported chat templates
* use "built-in" instead of "supported"
* arg: print list of built-in templates
* fix test
* update server README 
							
						 
						
							2024-12-02 22:10:19 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									haopeng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								64ed2091b2 
								
							 
						 
						
							
							
								
								server: Add "tokens per second" information in the backend ( #10548 )  
							
							... 
							
							
							
							* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2024-12-02 14:45:54 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7cc2d2c889 
								
							 
						 
						
							
							
								
								ggml : move AMX to the CPU backend ( #10570 )  
							
							... 
							
							
							
							* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-11-29 21:54:58 +01:00