Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7cc2d2c889 
								
							 
						 
						
							
							
								
								ggml : move AMX to the CPU backend ( #10570 )  
							
							... 
							
							
							
							* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-11-29 21:54:58 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ab96610b1e 
								
							 
						 
						
							
							
								
								cmake : enable warnings in llama ( #10474 )  
							
							... 
							
							
							
							* cmake : enable warnings in llama
ggml-ci
* cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS
* cmake : get_flags -> ggml_get_flags
* speculative-simple : fix warnings
* cmake : reuse ggml_get_flags
ggml-ci
* speculative-simple : fix compile warning
ggml-ci 
							
						 
						
							2024-11-26 14:18:08 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								811872a59d 
								
							 
						 
						
							
							
								
								speculative : simplify the implementation ( #10504 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-26 12:29:38 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								10bce0450f 
								
							 
						 
						
							
							
								
								llama : accept a list of devices to use to offload a model ( #10497 )  
							
							... 
							
							
							
							* llama : accept a list of devices to use to offload a model
* accept `--dev none` to completely disable offloading
* fix dev list with dl backends
* rename env parameter to LLAMA_ARG_DEVICE for consistency 
							
						 
						
							2024-11-25 19:30:06 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d9d54e498d 
								
							 
						 
						
							
							
								
								speculative : refactor and add a simpler example ( #10362 )  
							
							... 
							
							
							
							* speculative : refactor and add a simpler example
ggml-ci
* speculative : clean-up and add comments and TODOs [no ci]
* speculative : manage context in common_speculative
ggml-ci
* speculative : simplify
ggml-ci
* speculative : simplify (cont)
ggml-ci
* speculative : add --draft-min CLI arg
* speculative : minor fixup
* make : build fixes
* speculative : do not redraft previous drafts
ggml-ci
* speculative : fix the draft sampling
ggml-ci
* speculative : fix compile warning
* common : refactor args
ggml-ci
* common : change defaults [no ci]
* common : final touches
ggml-ci 
							
						 
						
							2024-11-25 09:58:41 +02:00