Daniel Bevenius 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b636228c0a 
								
							 
						 
						
							
							
								
								embedding : enable --no-warmup option ( #11475 )  
							
							... 
							
							
							
							This commit enables the `--no-warmup` option for the llama-embeddings.
The motivation for this change is to allow the user to disable the
warmup when running the the program. 
							
						 
						
							2025-01-29 10:38:54 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c64d2becb1 
								
							 
						 
						
							
							
								
								minja: sync at 0f5f7f2b37 ( #11352 )  
							
							
							
						 
						
							2025-01-22 16:16:27 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a94f3b2727 
								
							 
						 
						
							
							
								
								common: utils to split / join / repeat strings (from json converter) (#11342 )  
							
							... 
							
							
							
							* Factor string_join, string_split, string_repeat into common
* json: refactor to surface a versatile builder
* Update common.cpp 
							
						 
						
							2025-01-22 09:51:44 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6171c9d258 
								
							 
						 
						
							
							
								
								Add Jinja template support ( #11016 )  
							
							... 
							
							
							
							* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2025-01-21 13:18:51 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								80d0d6b4b7 
								
							 
						 
						
							
							
								
								common : add -hfd option for the draft model ( #11318 )  
							
							... 
							
							
							
							* common : add -hfd option for the draft model
* cont : fix env var
* cont : more fixes 
							
						 
						
							2025-01-20 22:29:43 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									LostRuins Concedo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6390a998bf 
								
							 
						 
						
							
							
								
								tts : add guide tokens support ( #11186 )  
							
							... 
							
							
							
							* Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences.
* applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start 
							
						 
						
							2025-01-18 12:20:57 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Radoslav Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								667d72846c 
								
							 
						 
						
							
							
								
								rpc : early register backend devices ( #11262 )  
							
							... 
							
							
							
							Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609  
							
						 
						
							2025-01-17 10:57:09 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								84a44815f7 
								
							 
						 
						
							
							
								
								cli : auto activate conversation mode if chat template is available ( #11214 )  
							
							... 
							
							
							
							* cli : auto activate conversation mode if chat template is detected
* add warn on bad template
* update readme (writing with the help of chatgpt)
* update readme (2)
* do not activate -cnv for non-instruct models 
							
						 
						
							2025-01-13 20:18:12 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								00b4c3da62 
								
							 
						 
						
							
							
								
								common : support tag-based --hf-repo like on ollama ( #11195 )  
							
							... 
							
							
							
							* common : support tag-based hf_repo like on ollama
* fix build
* various fixes
* small fixes
* fix style
* fix windows build?
* move common_get_hf_file to common.cpp
* fix complain with noreturn 
							
						 
						
							2025-01-13 13:56:23 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9a483999a6 
								
							 
						 
						
							
							
								
								llama : fix chat template gguf key ( #11201 )  
							
							
							
						 
						
							2025-01-12 13:45:14 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								afa8a9ec9b 
								
							 
						 
						
							
							
								
								llama : add llama_vocab, functions -> methods, naming ( #11110 )  
							
							... 
							
							
							
							* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
							
						 
						
							2025-01-12 11:32:42 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a3c1232c3f 
								
							 
						 
						
							
							
								
								arg : option to exclude arguments from specific examples ( #11136 )  
							
							... 
							
							
							
							* arg : option to exclude arguments from specific examples
ggml-ci
* readme : remove old args [no ci] 
							
						 
						
							2025-01-08 12:55:36 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								53ff6b9b9f 
								
							 
						 
						
							
							
								
								GGUF: C++ refactor, backend support, misc fixes ( #11030 )  
							
							... 
							
							
							
							* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types 
							
						 
						
							2025-01-07 18:01:58 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								47182dd03f 
								
							 
						 
						
							
							
								
								llama : update llama_model API names ( #11063 )  
							
							... 
							
							
							
							* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci 
							
						 
						
							2025-01-06 10:55:18 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								727368c60f 
								
							 
						 
						
							
							
								
								llama : use LLAMA_TOKEN_NULL ( #11062 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-06 10:52:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Molly Sophia 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4b0c638b9a 
								
							 
						 
						
							
							
								
								common : disable KV cache shifting automatically for unsupported models ( #11053 )  
							
							... 
							
							
							
							* Disable KV cache shifting automatically for unsupported models
instead of exiting directly
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2025-01-03 14:13:18 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f66f582927 
								
							 
						 
						
							
							
								
								llama : refactor src/llama.cpp ( #10902 )  
							
							... 
							
							
							
							* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
							
						 
						
							2025-01-03 10:18:53 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45095a61bf 
								
							 
						 
						
							
							
								
								server : clean up built-in template detection ( #11026 )  
							
							... 
							
							
							
							* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
							
						 
						
							2024-12-31 15:22:01 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Peter 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6e1531aca5 
								
							 
						 
						
							
							
								
								common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON ( #11013 )  
							
							... 
							
							
							
							In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)
In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)
In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned) 
							
						 
						
							2024-12-31 01:46:06 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Molly Sophia 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0a11f8b7b5 
								
							 
						 
						
							
							
								
								convert : fix RWKV v6 model conversion ( #10913 )  
							
							... 
							
							
							
							* Enable --no-context-shift for llama-perplexity example
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV 6: Fix error in ggml_cuda_op_bin_bcast
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com> 
							
						 
						
							2024-12-20 11:44:58 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								36319dec5d 
								
							 
						 
						
							
							
								
								tts : small QoL for easy model fetch ( #10903 )  
							
							
							
						 
						
							2024-12-19 17:35:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0bf2d10c55 
								
							 
						 
						
							
							
								
								tts : add OuteTTS support ( #10784 )  
							
							... 
							
							
							
							* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder 
							
						 
						
							2024-12-18 19:27:21 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								152610eda9 
								
							 
						 
						
							
							
								
								server : output embeddings for all tokens when pooling = none ( #10861 )  
							
							... 
							
							
							
							* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-12-18 13:01:41 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								644fd71b44 
								
							 
						 
						
							
							
								
								sampling : refactor + optimize penalties sampler ( #10803 )  
							
							... 
							
							
							
							* sampling : refactor + optimize penalties sampler
ggml-ci
* common : apply ignore_eos as logit bias
ggml-ci
* batched : remove penalties sampler
* params : allow penalty_last_n == -1 to be equal to context size
ggml-ci
* common : by default, move the penalties at the end of the sampling chain
ggml-ci
* common : ignore all EOG tokens
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* common : move back the penalties at the front of the sampling chain
ggml-ci
* readme : restore hint about --ignore-eos flag [no ci]
* llama : minor
ggml-ci
* webui : update
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
							
						 
						
							2024-12-16 12:31:14 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c27ac678dd 
								
							 
						 
						
							
							
								
								Opt class for positional argument handling ( #10508 )  
							
							... 
							
							
							
							Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:
  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf 
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2024-12-13 19:34:25 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								adffa6ffd5 
								
							 
						 
						
							
							
								
								common : improve -ctv -ctk CLI arguments ( #10806 )  
							
							... 
							
							
							
							* common : improve ctv ctk cli argument
* regenerate docs
* even better approach
* use std::vector 
							
						 
						
							2024-12-12 22:53:05 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9fdb124304 
								
							 
						 
						
							
							
								
								common : add missing env var for speculative ( #10801 )  
							
							
							
						 
						
							2024-12-12 16:57:32 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Bartowski 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae4b922614 
								
							 
						 
						
							
							
								
								imatrix : Add imatrix to --no-context-shift ( #10766 )  
							
							... 
							
							
							
							This allows for setting the --no-context-shift value in llama-imatrix which is required for models like DeepSeek 
							
						 
						
							2024-12-10 18:23:50 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Yüg 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a86ad841f1 
								
							 
						 
						
							
							
								
								server : add flag to disable the web-ui ( #10762 ) ( #10751 )  
							
							... 
							
							
							
							Co-authored-by: eugenio.segala <esegala@deloitte.co.uk> 
							
						 
						
							2024-12-10 18:22:34 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c2a16c0bdb 
								
							 
						 
						
							
							
								
								server : fix free of spec context and batch ( #10651 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-12-07 11:52:44 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f162d45a21 
								
							 
						 
						
							
							
								
								common : bring back --no-warmup to server ( #10686 )  
							
							
							
						 
						
							2024-12-06 13:29:05 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c5bc0625f 
								
							 
						 
						
							
							
								
								server : (refactoring) do not rely on JSON internally ( #10643 )  
							
							... 
							
							
							
							* server : (refactoring) reduce usage of json internally
* move all response types to struct
* wip [no ci]
* many fixes
* add virtual function
* fix index
* minor style fix
* add std::move
* refactor handle_completions_generic
* add virtual functions
* remove server.hpp
* clarify server_sent_event RFC specs
* apply review comments
* fix model_alias and completion_probabilities
* small clean up
* remove virtual for to_json_oai_compat()
* naming oai_compat --> oaicompat
* fix unwanted recursive call
* update docs 
							
						 
						
							2024-12-06 11:14:32 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								642330ac7c 
								
							 
						 
						
							
							
								
								llama : add enum for built-in chat templates ( #10623 )  
							
							... 
							
							
							
							* llama : add enum for supported chat templates
* use "built-in" instead of "supported"
* arg: print list of built-in templates
* fix test
* update server README 
							
						 
						
							2024-12-02 22:10:19 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									haopeng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								64ed2091b2 
								
							 
						 
						
							
							
								
								server: Add "tokens per second" information in the backend ( #10548 )  
							
							... 
							
							
							
							* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2024-12-02 14:45:54 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7cc2d2c889 
								
							 
						 
						
							
							
								
								ggml : move AMX to the CPU backend ( #10570 )  
							
							... 
							
							
							
							* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-11-29 21:54:58 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								890719311b 
								
							 
						 
						
							
							
								
								common: fix warning message when no GPU found ( #10564 )  
							
							
							
						 
						
							2024-11-28 18:15:25 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f912511bc 
								
							 
						 
						
							
							
								
								common : fix duplicated file name with hf_repo and hf_file ( #10550 )  
							
							
							
						 
						
							2024-11-27 22:30:52 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ab96610b1e 
								
							 
						 
						
							
							
								
								cmake : enable warnings in llama ( #10474 )  
							
							... 
							
							
							
							* cmake : enable warnings in llama
ggml-ci
* cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS
* cmake : get_flags -> ggml_get_flags
* speculative-simple : fix warnings
* cmake : reuse ggml_get_flags
ggml-ci
* speculative-simple : fix compile warning
ggml-ci 
							
						 
						
							2024-11-26 14:18:08 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9fd8c2687f 
								
							 
						 
						
							
							
								
								server : add more information about error ( #10455 )  
							
							
							
						 
						
							2024-11-25 22:28:59 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								10bce0450f 
								
							 
						 
						
							
							
								
								llama : accept a list of devices to use to offload a model ( #10497 )  
							
							... 
							
							
							
							* llama : accept a list of devices to use to offload a model
* accept `--dev none` to completely disable offloading
* fix dev list with dl backends
* rename env parameter to LLAMA_ARG_DEVICE for consistency 
							
						 
						
							2024-11-25 19:30:06 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5931c1f233 
								
							 
						 
						
							
							
								
								ggml : add support for dynamic loading of backends ( #10469 )  
							
							... 
							
							
							
							* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-11-25 15:13:39 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d9d54e498d 
								
							 
						 
						
							
							
								
								speculative : refactor and add a simpler example ( #10362 )  
							
							... 
							
							
							
							* speculative : refactor and add a simpler example
ggml-ci
* speculative : clean-up and add comments and TODOs [no ci]
* speculative : manage context in common_speculative
ggml-ci
* speculative : simplify
ggml-ci
* speculative : simplify (cont)
ggml-ci
* speculative : add --draft-min CLI arg
* speculative : minor fixup
* make : build fixes
* speculative : do not redraft previous drafts
ggml-ci
* speculative : fix the draft sampling
ggml-ci
* speculative : fix compile warning
* common : refactor args
ggml-ci
* common : change defaults [no ci]
* common : final touches
ggml-ci 
							
						 
						
							2024-11-25 09:58:41 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8e752a777b 
								
							 
						 
						
							
							
								
								llama : add check for KV cache shifts ( #10401 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-19 13:29:26 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4e54be0ec6 
								
							 
						 
						
							
							
								
								llama/ex: remove --logdir argument ( #10339 )  
							
							
							
						 
						
							2024-11-16 23:00:41 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae8de6d50a 
								
							 
						 
						
							
							
								
								ggml : build backends as libraries ( #10256 )  
							
							... 
							
							
							
							* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com> 
							
						 
						
							2024-11-14 18:04:35 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b141e5f6ef 
								
							 
						 
						
							
							
								
								server : enable KV cache defrag by default ( #10233 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-11 08:38:43 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5c333e0140 
								
							 
						 
						
							
							
								
								metal : add BF16 support ( #8439 )  
							
							... 
							
							
							
							* ggml : add initial BF16 support
ggml-ci
* metal : add mul_mat_id BF16 support
ggml-ci
* metal : check for bfloat support on the Metal device
ggml-ci
* metal : better var names [no ci]
* metal : do not build bfloat kernels when not supported
ggml-ci
* metal : try to fix BF16 support check
ggml-ci
* metal : this should correctly check bfloat support 
							
						 
						
							2024-11-06 19:53:51 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f40989351 
								
							 
						 
						
							
							
								
								ggml : move CPU backend to a separate file ( #10144 )  
							
							
							
						 
						
							2024-11-03 19:34:08 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1926d6e39d 
								
							 
						 
						
							
							
								
								llama : adjust default context size + print warnings ( #10136 )  
							
							... 
							
							
							
							* llama : adjust default context size + print warnings
ggml-ci
* ggml-ci : add missing gpu-layers + adjust context sizes 
							
						 
						
							2024-11-02 15:18:56 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8d8ff71536 
								
							 
						 
						
							
							
								
								llama : remove Tail-Free sampling ( #10071 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-10-29 10:42:05 +02:00