Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f40989351 
								
							 
						 
						
							
							
								
								ggml : move CPU backend to a separate file ( #10144 )  
							
							
							
						 
						
							2024-11-03 19:34:08 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								08828a6d7d 
								
							 
						 
						
							
							
								
								metal : minor fixup in FA kernel ( #10143 )  
							
							... 
							
							
							
							* metal : minor fixup in FA kernel
ggml-ci
* metal : use the unrolled loop variable
* metal : remove unused var 
							
						 
						
							2024-11-03 15:18:40 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1839f69130 
								
							 
						 
						
							
							
								
								flake.lock: Update ( #10146 )  
							
							
							
						 
						
							2024-11-03 05:14:15 -08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Christian Köhnenkamp 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9830b6923b 
								
							 
						 
						
							
							
								
								Add apple arm to presets ( #10134 )  
							
							... 
							
							
							
							* Add apple arm to presets
* Add final new line 
							
						 
						
							2024-11-02 15:35:31 -07:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									sasha0552 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								42cadc74bd 
								
							 
						 
						
							
							
								
								server : fix slot selection by lru ( #10126 )  
							
							... 
							
							
							
							* server : fix slot selection by lru, migrate lcs to `size_t`
* minor debug log fix 
							
						 
						
							2024-11-02 18:34:56 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45950415ed 
								
							 
						 
						
							
							
								
								server : fix endpoint checks ( #10135 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-02 18:34:00 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1926d6e39d 
								
							 
						 
						
							
							
								
								llama : adjust default context size + print warnings ( #10136 )  
							
							... 
							
							
							
							* llama : adjust default context size + print warnings
ggml-ci
* ggml-ci : add missing gpu-layers + adjust context sizes 
							
						 
						
							2024-11-02 15:18:56 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b634f8a26f 
								
							 
						 
						
							
							
								
								simple-chat : only add bos on first prompt ( #10129 )  
							
							
							
						 
						
							2024-11-02 13:08:53 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7554aa4655 
								
							 
						 
						
							
							
								
								convert-lora : make --base optional ( #10110 )  
							
							... 
							
							
							
							* convert-lora : make `--base` optional
* lint
* handle case where base_model_name_or_path is invalid
* do not include metadata from base model
* clarify unspecified --base
* add small comment [no ci]
* trigger ci 
							
						 
						
							2024-11-02 12:53:17 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a6744e43e8 
								
							 
						 
						
							
							
								
								llama : add simple-chat example ( #10124 )  
							
							... 
							
							
							
							* llama : add simple-chat example
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-11-01 23:50:59 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e991e3127f 
								
							 
						 
						
							
							
								
								llama : use smart pointers for ggml resources ( #10117 )  
							
							
							
						 
						
							2024-11-01 23:48:26 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Shupei Fan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								418f5eef26 
								
							 
						 
						
							
							
								
								vulkan : improve ggml_vk_create_buffer error handling ( #9898 )  
							
							
							
						 
						
							2024-11-01 19:33:14 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ba6f62eb79 
								
							 
						 
						
							
							
								
								readme : update hot topics  
							
							
							
						 
						
							2024-11-01 17:31:51 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									sasha0552 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d865d1478c 
								
							 
						 
						
							
							
								
								server : fix smart selection of available slot ( #10120 )  
							
							... 
							
							
							
							* Fix smart selection of available slot
* minor fix
* replace vectors of tokens with shorthands 
							
						 
						
							2024-11-01 14:33:14 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1804adb0cf 
								
							 
						 
						
							
							
								
								ggml : remove ggml_scratch ( #10121 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-01 12:58:45 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								815fe72adc 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-11-01 10:28:24 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f221d56220 
								
							 
						 
						
							
							
								
								ggml : alloc ggml_contexts on the heap (whisper/2525)  
							
							
							
						 
						
							2024-11-01 10:24:50 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Zhenwei Jin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e597e50794 
								
							 
						 
						
							
							
								
								build: fix build error in Windows env with OneAPI setup ( #10107 )  
							
							
							
						 
						
							2024-11-01 11:09:59 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								85679d37f3 
								
							 
						 
						
							
							
								
								llama : improve output buffer type selection ( #10098 )  
							
							
							
						 
						
							2024-11-01 00:49:53 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1e9f94994e 
								
							 
						 
						
							
							
								
								quantize : fix --keep-split ( #10114 )  
							
							
							
						 
						
							2024-11-01 00:45:34 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c02e5ab2a6 
								
							 
						 
						
							
							
								
								llama : fix buffer checks for mamba and rwk ( #10111 )  
							
							... 
							
							
							
							* llama : fix buffer checks for mamba and rwk
* llama : fix missing worst case flag during reserve
* cuda : fix supports_op for norm
* disable sched SET_CAUSE 
							
						 
						
							2024-10-31 22:54:23 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Zhenwei Jin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ab3d71f97f 
								
							 
						 
						
							
							
								
								loader:  refactor tensor weights storage ( #9935 )  
							
							... 
							
							
							
							* loader: refactor tensor weights storage
* use sorted map, sort weights by layer
---------
Co-authored-by: slaren <slarengh@gmail.com> 
							
						 
						
							2024-10-31 19:50:39 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Kevin Gibbons 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0a683e8088 
								
							 
						 
						
							
							
								
								server : include scheme when printing URL ( #10106 )  
							
							
							
						 
						
							2024-10-31 14:02:35 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dea5e86051 
								
							 
						 
						
							
							
								
								ggml : check tensor name lengths in gguf files ( #10100 )  
							
							
							
						 
						
							2024-10-31 11:40:59 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Sergio López 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1329c0a75e 
								
							 
						 
						
							
							
								
								kompute: add mul_mat_q4_k shader ( #10097 )  
							
							... 
							
							
							
							This is a more or less direct translation from the Metal implementation
to GLSL.
Signed-off-by: Sergio Lopez <slp@redhat.com> 
							
						 
						
							2024-10-31 11:09:52 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Sergio López 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								61408e7fad 
								
							 
						 
						
							
							
								
								kompute: add backend registry / device interfaces ( #10045 )  
							
							... 
							
							
							
							Get in line with the other backends by supporting the newer
backend/device registry interfaces.
Signed-off-by: Sergio Lopez <slp@redhat.com> 
							
						 
						
							2024-10-30 17:01:52 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b9e02e8184 
								
							 
						 
						
							
							
								
								ggml : fix memory leaks when loading invalid gguf files ( #10094 )  
							
							... 
							
							
							
							* ggml : fix gguf string leak when reading kv pairs fails
* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type
* ggml : avoid crashing on failed memory allocations when loading a gguf file 
							
						 
						
							2024-10-30 14:51:21 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Rich Dougherty 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6763f713bb 
								
							 
						 
						
							
							
								
								readme : more lora detail in main example readme ( #10064 )  
							
							
							
						 
						
							2024-10-30 13:22:39 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Rich Dougherty 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								79a2bc042d 
								
							 
						 
						
							
							
								
								convert : more detailed convert lora usage docs ( #10065 )  
							
							
							
						 
						
							2024-10-30 13:22:21 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									xctan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fc83a9e584 
								
							 
						 
						
							
							
								
								ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels ( #10029 )  
							
							... 
							
							
							
							* ggml : RISC-V vector gemv for q4_0_8x8
* ggml : Added WIP rvv q4_0_8x8 gemm
* ggml : Added initial implementation of rvv gemm
* ggml : optimize gemm to avoid register spillover
* ggml : Fix GCC rvv load alignment issue
* ggml : Format gemm rvv code
* ggml : Fix a typo in RVV q4_0_8_8 GEMM 
							
						 
						
							2024-10-30 09:00:40 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c5b0f4b5d9 
								
							 
						 
						
							
							
								
								llama : refactor model loader with backend registry ( #10026 )  
							
							
							
						 
						
							2024-10-30 02:01:23 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Changyeon Kim 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8f275a7c45 
								
							 
						 
						
							
							
								
								ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. ( #9763 )  
							
							... 
							
							
							
							* ggml: Add POOL2D OP for GPU ACC to the Vulkan.
- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* [fix] Correct the incorrect order of the parameters.
fix casting to int.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
---------
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> 
							
						 
						
							2024-10-29 09:52:56 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8d8ff71536 
								
							 
						 
						
							
							
								
								llama : remove Tail-Free sampling ( #10071 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-10-29 10:42:05 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									arch-btw 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								61715d5cc8 
								
							 
						 
						
							
							
								
								llama : Add IBM granite template ( #10013 )  
							
							... 
							
							
							
							* Add granite template to llama.cpp
* Add granite template to test-chat-template.cpp
* Update src/llama.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Update tests/test-chat-template.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Added proper template and expected output
* Small change to \n
Small change to \n
* Add code space &
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Fix spacing
* Apply suggestions from code review
* Update src/llama.cpp
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> 
							
						 
						
							2024-10-28 18:45:33 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								07028f9d74 
								
							 
						 
						
							
							
								
								flake.lock: Update ( #10063 )  
							
							... 
							
							
							
							Flake lock file updates:
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/4c2fcb090b1f3e5b47eaa7bd33913b574a11e0a0?narHash=sha256-/uilDXvCIEs3C9l73JTACm4quuHUsIHcns1c%2BcHUJwA%3D' (2024-10-18)
  → 'github:NixOS/nixpkgs/2768c7d042a37de65bb1b5b3268fc987e534c49d?narHash=sha256-AlcmCXJZPIlO5dmFzV3V2XF6x/OpNWUV8Y/FMPGd8Z4%3D' (2024-10-23)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> 
							
						 
						
							2024-10-28 08:41:24 -07:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									R0CKSTAR 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								524afeec9d 
								
							 
						 
						
							
							
								
								musa: workaround for Guilty Lockup in cleaning src0 ( #10042 )  
							
							... 
							
							
							
							Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> 
							
						 
						
							2024-10-28 10:02:48 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8125e6cbfc 
								
							 
						 
						
							
							
								
								server : don't overfill the batch during infill ( #10018 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-10-28 08:49:32 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8841ce3f43 
								
							 
						 
						
							
							
								
								llama : switch KQ multiplication to F32 precision by default ( #10015 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-10-27 20:59:58 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cc2983d375 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-10-26 10:34:08 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									bssrdf 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8c60a8a462 
								
							 
						 
						
							
							
								
								increase cuda_cpy block size (ggml/996)  
							
							... 
							
							
							
							Co-authored-by: bssrdf <bssrdf@gmail.com> 
							
						 
						
							2024-10-26 10:33:56 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9e4a2563ea 
								
							 
						 
						
							
							
								
								scripts : fix amx sync [no ci]  
							
							
							
						 
						
							2024-10-26 10:33:31 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								668750357e 
								
							 
						 
						
							
							
								
								metal : support permuted matrix multiplicaions ( #10033 )  
							
							... 
							
							
							
							* metal : support permuted matrix multiplicaions
ggml-ci
* cont : use nb01 directly for row steps
ggml-ci
* cont : add comments [no ci]
* metal : minor refactor
* metal : minor 
							
						 
						
							2024-10-25 22:26:15 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									wwoodsTM 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ff252ea48e 
								
							 
						 
						
							
							
								
								llama : add DRY sampler ( #9702 )  
							
							... 
							
							
							
							* sampling : add DRY sampler (post-refactor)
* DRY: Trying to fix coauthors, removed unneeded line
* DRY: Fixed redundant code
* DRY: Fixed crash issue due to DRY being in chain but uninitialized
---------
Co-authored-by: l3utterfly <gc.pthzfoldr@gmail.com>
Co-authored-by: pi6am <34464159+pi6am@users.noreply.github.com> 
							
						 
						
							2024-10-25 19:07:34 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Michael Podvitskiy 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d80fb71f8b 
								
							 
						 
						
							
							
								
								llama: string_split fix ( #10022 )  
							
							... 
							
							
							
							* llama: Refactor string_split to use template specialization,  fixes parsing strings with spaces
* llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string 
							
						 
						
							2024-10-25 17:57:54 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Srihari-mcw 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2f8bd2b901 
								
							 
						 
						
							
							
								
								llamafile : extend sgemm.cpp support for Q5_0 models ( #10010 )  
							
							
							
						 
						
							2024-10-25 10:27:41 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bc5ba007b2 
								
							 
						 
						
							
							
								
								server : check that the prompt fits in the slot's context ( #10030 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-10-25 10:13:46 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								958367bf53 
								
							 
						 
						
							
							
								
								server : refactor slot input data, move tokenizer to HTTP thread ( #10023 )  
							
							... 
							
							
							
							* server : refactor slot input data, move tokenizer to HTTP thread
* move prompt_tokens.empty() check
* fix incorrect if branch
* fix infinite generation loop
* bring back infill validation
* add infill test
* try fixing format_infill
* fix test
* remove redundant code
* rename completion to inference
* update docs
* use llama_tokens everywhere 
							
						 
						
							2024-10-24 21:51:22 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								40f2555797 
								
							 
						 
						
							
							
								
								ci : fix cmake flags for SYCL  
							
							
							
						 
						
							2024-10-24 21:23:33 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								167a515651 
								
							 
						 
						
							
							
								
								CUDA: fix insufficient buffer clearing for MMQ ( #10032 )  
							
							
							
						 
						
							2024-10-24 14:40:23 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c39665f589 
								
							 
						 
						
							
							
								
								CUDA: fix MMQ for non-contiguous src0, add tests ( #10021 )  
							
							... 
							
							
							
							* CUDA: fix MMQ for non-contiguous src0, add tests
* revise test code 
							
						 
						
							2024-10-24 11:09:36 +02:00