Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b11f9ba9b8 
								
							 
						 
						
							
							
								
								server : remove hack for extra parallel slot ( #10187 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-06 13:29:01 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								94d8cb8be1 
								
							 
						 
						
							
							
								
								metal : fix from ptr buffer name ( #10189 )  
							
							
							
						 
						
							2024-11-06 12:10:07 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1dc04b2dee 
								
							 
						 
						
							
							
								
								ggml : adjust is_first_call init value ( #10193 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-06 11:20:10 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a1eaf6a960 
								
							 
						 
						
							
							
								
								metal : add quantized FA support ( #10149 )  
							
							... 
							
							
							
							* metal : add quantized FA (vec) support
ggml-ci
* metal : add quantized FA (non-vec) support
* metal : fix support check
ggml-ci
* metal : clean-up
* metal : clean-up (cont)
* metal : fix shared memory calc + reduce smem + comments
* metal : float-correctness
* metal : minor [no ci] 
							
						 
						
							2024-11-06 10:24:23 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Gabe Goodhart 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b8deef0ec0 
								
							 
						 
						
							
							
								
								llama : add <|tool_call|> formatting to Granite template ( #10177 )  
							
							... 
							
							
							
							Branch: GraniteToolCallTemplate
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> 
							
						 
						
							2024-11-05 14:23:04 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9e8a9a030 
								
							 
						 
						
							
							
								
								ggml : fix arch check in bf16_to_fp32 ( #10164 )  
							
							
							
						 
						
							2024-11-04 23:17:01 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eve 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3407364776 
								
							 
						 
						
							
							
								
								Q6_K AVX improvements ( #10118 )  
							
							... 
							
							
							
							* q6_k instruction reordering attempt
* better subtract method
* should be theoretically faster
small improvement with shuffle lut, likely because all loads are already done at that stage
* optimize bit fiddling
* handle -32 offset separately. bsums exists for a reason!
* use shift
* Update ggml-quants.c
* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 
							
						 
						
							2024-11-04 23:06:31 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d5a409e57f 
								
							 
						 
						
							
							
								
								ggml : fix gelu tables initialization ( #10172 )  
							
							
							
						 
						
							2024-11-04 20:06:58 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								401558b7ba 
								
							 
						 
						
							
							
								
								ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment ( #10167 )  
							
							
							
						 
						
							2024-11-04 17:34:08 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9e0ecfb697 
								
							 
						 
						
							
							
								
								server : clarify /slots endpoint, add is_processing ( #10162 )  
							
							... 
							
							
							
							* server : clarify /slots endpoint, add is_processing
* fix tests 
							
						 
						
							2024-11-04 16:33:29 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									snadampal 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6a066b9978 
								
							 
						 
						
							
							
								
								fix build break on arm64 linux ( #10166 )  
							
							... 
							
							
							
							This fixes the build break from the recent changes
to move the CPU backend to separate files
https://github.com/ggerganov/llama.cpp/pull/10144  
							
						 
						
							2024-11-04 16:08:33 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ea02c753eb 
								
							 
						 
						
							
							
								
								cuda : clear error after changing peer access ( #10153 )  
							
							
							
						 
						
							2024-11-04 13:10:23 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								05697f670b 
								
							 
						 
						
							
							
								
								metal : simplify f16 and f32 dequant kernels ( #0 )  
							
							
							
						 
						
							2024-11-04 13:49:34 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f8e58135cf 
								
							 
						 
						
							
							
								
								metal : move dequantize templates to beginning of MSL source ( #0 )  
							
							
							
						 
						
							2024-11-04 13:44:06 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									leo-pony 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								329ed914c9 
								
							 
						 
						
							
							
								
								CANN: adjust backend registry refactor. ( #10158 )  
							
							... 
							
							
							
							remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR. 
							
						 
						
							2024-11-04 19:08:22 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ce027adfb3 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-11-04 10:33:37 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Yuri Khrustalev 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								284e5b0275 
								
							 
						 
						
							
							
								
								cmake : make it possible linking ggml as external lib (ggml/1003)  
							
							
							
						 
						
							2024-11-04 10:33:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Plamen Minev 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e2292aaa17 
								
							 
						 
						
							
							
								
								metal : fix minor string leaks (ggml/1004)  
							
							
							
						 
						
							2024-11-04 10:33:10 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f40989351 
								
							 
						 
						
							
							
								
								ggml : move CPU backend to a separate file ( #10144 )  
							
							
							
						 
						
							2024-11-03 19:34:08 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								08828a6d7d 
								
							 
						 
						
							
							
								
								metal : minor fixup in FA kernel ( #10143 )  
							
							... 
							
							
							
							* metal : minor fixup in FA kernel
ggml-ci
* metal : use the unrolled loop variable
* metal : remove unused var 
							
						 
						
							2024-11-03 15:18:40 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1839f69130 
								
							 
						 
						
							
							
								
								flake.lock: Update ( #10146 )  
							
							
							
						 
						
							2024-11-03 05:14:15 -08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Christian Köhnenkamp 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9830b6923b 
								
							 
						 
						
							
							
								
								Add apple arm to presets ( #10134 )  
							
							... 
							
							
							
							* Add apple arm to presets
* Add final new line 
							
						 
						
							2024-11-02 15:35:31 -07:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									sasha0552 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								42cadc74bd 
								
							 
						 
						
							
							
								
								server : fix slot selection by lru ( #10126 )  
							
							... 
							
							
							
							* server : fix slot selection by lru, migrate lcs to `size_t`
* minor debug log fix 
							
						 
						
							2024-11-02 18:34:56 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45950415ed 
								
							 
						 
						
							
							
								
								server : fix endpoint checks ( #10135 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-02 18:34:00 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1926d6e39d 
								
							 
						 
						
							
							
								
								llama : adjust default context size + print warnings ( #10136 )  
							
							... 
							
							
							
							* llama : adjust default context size + print warnings
ggml-ci
* ggml-ci : add missing gpu-layers + adjust context sizes 
							
						 
						
							2024-11-02 15:18:56 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b634f8a26f 
								
							 
						 
						
							
							
								
								simple-chat : only add bos on first prompt ( #10129 )  
							
							
							
						 
						
							2024-11-02 13:08:53 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7554aa4655 
								
							 
						 
						
							
							
								
								convert-lora : make --base optional ( #10110 )  
							
							... 
							
							
							
							* convert-lora : make `--base` optional
* lint
* handle case where base_model_name_or_path is invalid
* do not include metadata from base model
* clarify unspecified --base
* add small comment [no ci]
* trigger ci 
							
						 
						
							2024-11-02 12:53:17 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a6744e43e8 
								
							 
						 
						
							
							
								
								llama : add simple-chat example ( #10124 )  
							
							... 
							
							
							
							* llama : add simple-chat example
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-11-01 23:50:59 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e991e3127f 
								
							 
						 
						
							
							
								
								llama : use smart pointers for ggml resources ( #10117 )  
							
							
							
						 
						
							2024-11-01 23:48:26 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Shupei Fan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								418f5eef26 
								
							 
						 
						
							
							
								
								vulkan : improve ggml_vk_create_buffer error handling ( #9898 )  
							
							
							
						 
						
							2024-11-01 19:33:14 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ba6f62eb79 
								
							 
						 
						
							
							
								
								readme : update hot topics  
							
							
							
						 
						
							2024-11-01 17:31:51 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									sasha0552 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d865d1478c 
								
							 
						 
						
							
							
								
								server : fix smart selection of available slot ( #10120 )  
							
							... 
							
							
							
							* Fix smart selection of available slot
* minor fix
* replace vectors of tokens with shorthands 
							
						 
						
							2024-11-01 14:33:14 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1804adb0cf 
								
							 
						 
						
							
							
								
								ggml : remove ggml_scratch ( #10121 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-01 12:58:45 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								815fe72adc 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-11-01 10:28:24 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f221d56220 
								
							 
						 
						
							
							
								
								ggml : alloc ggml_contexts on the heap (whisper/2525)  
							
							
							
						 
						
							2024-11-01 10:24:50 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Zhenwei Jin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e597e50794 
								
							 
						 
						
							
							
								
								build: fix build error in Windows env with OneAPI setup ( #10107 )  
							
							
							
						 
						
							2024-11-01 11:09:59 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								85679d37f3 
								
							 
						 
						
							
							
								
								llama : improve output buffer type selection ( #10098 )  
							
							
							
						 
						
							2024-11-01 00:49:53 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1e9f94994e 
								
							 
						 
						
							
							
								
								quantize : fix --keep-split ( #10114 )  
							
							
							
						 
						
							2024-11-01 00:45:34 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c02e5ab2a6 
								
							 
						 
						
							
							
								
								llama : fix buffer checks for mamba and rwk ( #10111 )  
							
							... 
							
							
							
							* llama : fix buffer checks for mamba and rwk
* llama : fix missing worst case flag during reserve
* cuda : fix supports_op for norm
* disable sched SET_CAUSE 
							
						 
						
							2024-10-31 22:54:23 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Zhenwei Jin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ab3d71f97f 
								
							 
						 
						
							
							
								
								loader:  refactor tensor weights storage ( #9935 )  
							
							... 
							
							
							
							* loader: refactor tensor weights storage
* use sorted map, sort weights by layer
---------
Co-authored-by: slaren <slarengh@gmail.com> 
							
						 
						
							2024-10-31 19:50:39 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Kevin Gibbons 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0a683e8088 
								
							 
						 
						
							
							
								
								server : include scheme when printing URL ( #10106 )  
							
							
							
						 
						
							2024-10-31 14:02:35 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dea5e86051 
								
							 
						 
						
							
							
								
								ggml : check tensor name lengths in gguf files ( #10100 )  
							
							
							
						 
						
							2024-10-31 11:40:59 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Sergio López 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1329c0a75e 
								
							 
						 
						
							
							
								
								kompute: add mul_mat_q4_k shader ( #10097 )  
							
							... 
							
							
							
							This is a more or less direct translation from the Metal implementation
to GLSL.
Signed-off-by: Sergio Lopez <slp@redhat.com> 
							
						 
						
							2024-10-31 11:09:52 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Sergio López 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								61408e7fad 
								
							 
						 
						
							
							
								
								kompute: add backend registry / device interfaces ( #10045 )  
							
							... 
							
							
							
							Get in line with the other backends by supporting the newer
backend/device registry interfaces.
Signed-off-by: Sergio Lopez <slp@redhat.com> 
							
						 
						
							2024-10-30 17:01:52 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b9e02e8184 
								
							 
						 
						
							
							
								
								ggml : fix memory leaks when loading invalid gguf files ( #10094 )  
							
							... 
							
							
							
							* ggml : fix gguf string leak when reading kv pairs fails
* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type
* ggml : avoid crashing on failed memory allocations when loading a gguf file 
							
						 
						
							2024-10-30 14:51:21 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Rich Dougherty 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6763f713bb 
								
							 
						 
						
							
							
								
								readme : more lora detail in main example readme ( #10064 )  
							
							
							
						 
						
							2024-10-30 13:22:39 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Rich Dougherty 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								79a2bc042d 
								
							 
						 
						
							
							
								
								convert : more detailed convert lora usage docs ( #10065 )  
							
							
							
						 
						
							2024-10-30 13:22:21 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									xctan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fc83a9e584 
								
							 
						 
						
							
							
								
								ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels ( #10029 )  
							
							... 
							
							
							
							* ggml : RISC-V vector gemv for q4_0_8x8
* ggml : Added WIP rvv q4_0_8x8 gemm
* ggml : Added initial implementation of rvv gemm
* ggml : optimize gemm to avoid register spillover
* ggml : Fix GCC rvv load alignment issue
* ggml : Format gemm rvv code
* ggml : Fix a typo in RVV q4_0_8_8 GEMM 
							
						 
						
							2024-10-30 09:00:40 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c5b0f4b5d9 
								
							 
						 
						
							
							
								
								llama : refactor model loader with backend registry ( #10026 )  
							
							
							
						 
						
							2024-10-30 02:01:23 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Changyeon Kim 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8f275a7c45 
								
							 
						 
						
							
							
								
								ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. ( #9763 )  
							
							... 
							
							
							
							* ggml: Add POOL2D OP for GPU ACC to the Vulkan.
- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* [fix] Correct the incorrect order of the parameters.
fix casting to int.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
---------
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> 
							
						 
						
							2024-10-29 09:52:56 +01:00