Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9cae48003 
								
							 
						 
						
							
							
								
								tests : add non-cont unary tests ( #7857 )  
							
							... 
							
							
							
							* tests : add non-cont unary tests
* ggml : update unary asserts and "supports_op"
ggml-ci 
							
						 
						
							2024-06-12 16:00:22 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bfaa676b08 
								
							 
						 
						
							
							
								
								ggml : improve ggml_is_contiguous logic ( #7856 )  
							
							... 
							
							
							
							* ggml : improve ggml_is_contiguous logic
ggml-ci
* ggml : support more contiguous cases
ggml-ci 
							
						 
						
							2024-06-12 15:24:20 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								704a35b183 
								
							 
						 
						
							
							
								
								server : restore numeric prompts ( #7883 )  
							
							
							
						 
						
							2024-06-12 14:42:29 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Meng, Hengyu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dcf752707d 
								
							 
						 
						
							
							
								
								update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 ( #7894 )  
							
							... 
							
							
							
							In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04 
							
						 
						
							2024-06-12 19:05:35 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Patrice Ferlet 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f2b5764beb 
								
							 
						 
						
							
							
								
								Fix a typo and add Fedora 40 pacakge to install for Vulkan ( #7794 ) [no ci]  
							
							... 
							
							
							
							Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support 
							
						 
						
							2024-06-12 11:18:16 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									k.h.lai 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								73bac2b11d 
								
							 
						 
						
							
							
								
								vulkan: select only one device for single gpu with multiple drivers ( #7582 )  
							
							
							
						 
						
							2024-06-11 21:26:05 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									0cc4m 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef52d1d16a 
								
							 
						 
						
							
							
								
								Update Vulkan RoPE implementation ( #7818 )  
							
							... 
							
							
							
							* Update Vulkan RoPE implementation
* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception
Minor fixes
* Fix segfault when running out of VRAM
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com> 
							
						 
						
							2024-06-11 21:20:29 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Deven Mistry 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								14f83526cd 
								
							 
						 
						
							
							
								
								fix broken link in pr template ( #7880 ) [no ci]  
							
							... 
							
							
							
							* fix broken link in pr template
* Update pull_request_template.md [no ci]
---------
Co-authored-by: Brian <mofosyne@gmail.com> 
							
						 
						
							2024-06-12 02:18:58 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Brian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6fe42d073f 
								
							 
						 
						
							
							
								
								github: move PR template to .github/ root ( #7868 )  
							
							
							
						 
						
							2024-06-11 17:43:41 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								148995e5e5 
								
							 
						 
						
							
							
								
								llama-bench: more compact markdown tables ( #7879 )  
							
							
							
						 
						
							2024-06-11 14:45:40 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4bfe50f741 
								
							 
						 
						
							
							
								
								tests : check the Python version ( #7872 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-06-11 10:10:20 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bdcb8f4222 
								
							 
						 
						
							
							
								
								CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) ( #7860 )  
							
							
							
						 
						
							2024-06-11 08:26:07 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c2ce6c47e4 
								
							 
						 
						
							
							
								
								fix CUDA CI by using a windows-2019 image ( #7861 )  
							
							... 
							
							
							
							* try to fix CUDA ci with --allow-unsupported-compiler
* trigger when build.yml changes
* another test
* try exllama/bdashore3 method
* install vs build tools before cuda toolkit
* try win-2019 
							
						 
						
							2024-06-11 08:59:20 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b61eb9644d 
								
							 
						 
						
							
							
								
								json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )  
							
							
							
						 
						
							2024-06-11 02:22:57 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								396b18dfec 
								
							 
						 
						
							
							
								
								json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 )  
							
							... 
							
							
							
							* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme 
							
						 
						
							2024-06-11 01:00:30 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jared Van Bortel 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								864a99e7a0 
								
							 
						 
						
							
							
								
								cmake : fix CMake requirement for CUDA ( #7821 )  
							
							
							
						 
						
							2024-06-10 18:32:10 -04:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fd5ea0f897 
								
							 
						 
						
							
							
								
								ci : try win-2019 on server windows test ( #7854 )  
							
							
							
						 
						
							2024-06-10 15:18:41 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c28a83902c 
								
							 
						 
						
							
							
								
								examples : remove --instruct remnants ( #7846 )  
							
							
							
						 
						
							2024-06-10 15:00:15 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d9da0e4986 
								
							 
						 
						
							
							
								
								server : improve "prompt" handling ( #7847 )  
							
							
							
						 
						
							2024-06-10 14:59:55 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1f0dabda8d 
								
							 
						 
						
							
							
								
								CUDA: use tensor cores for MMQ ( #7676 )  
							
							... 
							
							
							
							* CUDA: int8 tensor cores for MMQ (legacy quants)
* fix out-of-bounds writes
* __builtin_assume -> GGML_CUDA_ASSUME
* fix writeback returning too early 
							
						 
						
							2024-06-10 11:45:13 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Ben Ashbaugh 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								af4ae502dd 
								
							 
						 
						
							
							
								
								use the correct SYCL context for host USM allocations ( #7777 )  
							
							... 
							
							
							
							Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com> 
							
						 
						
							2024-06-10 10:21:31 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								10ceba354a 
								
							 
						 
						
							
							
								
								flake.lock: Update ( #7838 )  
							
							... 
							
							
							
							Flake lock file updates:
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
  → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> 
							
						 
						
							2024-06-09 16:04:50 -07:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e95beeb1fc 
								
							 
						 
						
							
							
								
								imatrix : handle partial entries ( #7833 )  
							
							
							
						 
						
							2024-06-09 20:19:35 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Nicolás Pérez 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								57bf62ce7c 
								
							 
						 
						
							
							
								
								docs: Added initial PR template with directions for doc only changes and squash merges [no ci] ( #7700 )  
							
							... 
							
							
							
							This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.
Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net> 
							
						 
						
							2024-06-10 01:24:29 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									mgroeber9110 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e2ee44315 
								
							 
						 
						
							
							
								
								server: do not remove whitespace at the start of a completion chunk ( #7830 )  
							
							
							
						 
						
							2024-06-09 20:50:35 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								42b53d192f 
								
							 
						 
						
							
							
								
								CUDA: revise q8_1 data layout for mul_mat_q ( #7824 )  
							
							
							
						 
						
							2024-06-09 09:42:25 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									sasha0552 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2decf57bc6 
								
							 
						 
						
							
							
								
								convert-hf : set the model name based on cli arg, if present ( #7693 )  
							
							... 
							
							
							
							`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature. 
							
						 
						
							2024-06-09 16:39:25 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									compilade 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5795b94182 
								
							 
						 
						
							
							
								
								convert-hf : match model part name prefix and suffix ( #7687 )  
							
							... 
							
							
							
							In #7075 , to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. 
But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3  are present.
This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime. 
							
						 
						
							2024-06-09 12:47:25 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									compilade 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ed9f252118 
								
							 
						 
						
							
							
								
								gguf-py : decouple adding metadata from writing in GGUFWriter ( #7827 )  
							
							... 
							
							
							
							Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. 
In addition use_temp_file is now opt-in instead of opt-out defaulting to False.
Also GGUFWriter now does not require output file name until when actually writing to it.
And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata 
							
						 
						
							2024-06-09 12:34:29 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fe1e3917cf 
								
							 
						 
						
							
							
								
								Revert "[SYCL] Update rpc-server.cpp to include SYCL backend ( #7682 )" ( #7808 )  
							
							... 
							
							
							
							This reverts commit 9422c5e34b 
							
						 
						
							2024-06-09 01:43:39 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d4d915d351 
								
							 
						 
						
							
							
								
								url: save -mu downloads to new cache location ( #7826 )  
							
							... 
							
							
							
							* url: save -mu download to new cache location
* url: fs_get_cache_file_path util
* url: tweak sig of fs_get_cache_file 
							
						 
						
							2024-06-08 21:21:08 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									sasha0552 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7a16ce7db2 
								
							 
						 
						
							
							
								
								server : smart slot selection using Longest Common Prefix ( #7728 )  
							
							... 
							
							
							
							* server : Smart selection of available slot using Longest Common Substring
* add usage
* remove trailing whitespaces
* Use Longest Common Prefix (LCP) instead of LCS
* Rename argument 
							
						 
						
							2024-06-08 10:50:31 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								da799b4189 
								
							 
						 
						
							
							
								
								vulkan : reuse parent extra for views ( #7806 )  
							
							... 
							
							
							
							* vulkan : reuse parent extra for views
* Fix validation error when multiple compute contexts are used in a graph
---------
Co-authored-by: 0cc4m <picard12@live.de> 
							
						 
						
							2024-06-07 19:47:49 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Christian Zhou-Zheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c00fad71e5 
								
							 
						 
						
							
							
								
								gguf-split : change binary multi-byte units to decimal ( #7803 )  
							
							
							
						 
						
							2024-06-07 15:56:01 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									intelmatt 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								27615f5ab2 
								
							 
						 
						
							
							
								
								cmake : fix BUILD_SHARED_LIBS=ON build ( #7784 )  
							
							... 
							
							
							
							common depends on pthreads in Linux 
							
						 
						
							2024-06-07 15:15:07 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7027b27d76 
								
							 
						 
						
							
							
								
								server: update cache_prompt documentation [no ci] ( #7745 )  
							
							
							
						 
						
							2024-06-07 11:15:49 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									woodx 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a5cabd7649 
								
							 
						 
						
							
							
								
								server : do not get prompt in infill mode ( #7286 )  
							
							... 
							
							
							
							* avoid to get prompt in infill mode and embedding mode
* remove embedding mode
* refactor format
---------
Co-authored-by: wudexiang <wudexiang@bytedance.com> 
							
						 
						
							2024-06-07 10:09:45 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									pengxin99 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d5c938cd77 
								
							 
						 
						
							
							
								
								[SYCL] fix softmax r2r result wrong issue ( #7811 )  
							
							
							
						 
						
							2024-06-07 14:28:26 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c9ee7118d5 
								
							 
						 
						
							
							
								
								check for nans in imatrix and quantize ( #7807 )  
							
							... 
							
							
							
							* imatrix : detect nan/inf values
* quantize : check imatrix for nan/inf values 
							
						 
						
							2024-06-07 09:01:29 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ee459f40f6 
								
							 
						 
						
							
							
								
								server : fix --threads-http arg ( #7801 )  
							
							
							
						 
						
							2024-06-06 19:19:59 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f83351f9a6 
								
							 
						 
						
							
							
								
								imatrix : migrate to gpt_params ( #7771 )  
							
							... 
							
							
							
							* imatrix : migrate to gpt_params
ggml-ci
* imatrix : add --save-frequency cli arg
* common : fix --no-ppl 
							
						 
						
							2024-06-06 16:30:58 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Clint Herron 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ad675e1c67 
								
							 
						 
						
							
							
								
								Added support for . (any character) token in grammar engine. ( #6467 )  
							
							... 
							
							
							
							* Added support for . (any characer) token in grammar engine.
* Add integration tests for any-character symbol. 
							
						 
						
							2024-06-06 06:08:52 -07:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Mattheus Chediak 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a143c04375 
								
							 
						 
						
							
							
								
								README minor fixes ( #7798 ) [no ci]  
							
							... 
							
							
							
							derievatives --> derivatives 
							
						 
						
							2024-06-06 22:17:54 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								55b2d0849d 
								
							 
						 
						
							
							
								
								grammars: x{min,max} repetition operator ( #6640 )  
							
							... 
							
							
							
							* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates
* grammars: handle `x{n}` and fix `x{n,n}`
* grammars: document new repetition operators
* grammars: uniform use of int for min & max
* grammars: refactor parser test
* grammar: parsing tests w/ natural pretty print of updated expectations
* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)
* grammars: improve test pretty print again
* grammars: pretty print rules and chars
* grammars: fix copy rule skipping
* grammars: disallow `a{,}` (not allowed in regexps)
* Update common/grammar-parser.cpp
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: fix copy rule skipping (again) & display of expectations
* grammars: more test cases
* grammars: update reps parsing to bring ? / * / + closer to before
* json: use new GBNF repetitions{m,n} syntax
* grammars: update performance gotchas w/ repetition advice
* Update examples/json_schema_to_grammar.py
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: comment on rule repetitions
* grammars: ensure unambiguous number alternatives
* grammar: nit typo switched error msgs
* grammar: nit numbering in comment
* json: update numeric rule to be unambiguous
* Apply suggestions from code review
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* json: fix integral-part
* grammar: add repetition tests
---------
Co-authored-by: Clint Herron <hanclinto@gmail.com> 
							
						 
						
							2024-06-06 10:07:06 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Joan Fontanals 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f5d7b268ec 
								
							 
						 
						
							
							
								
								llama : add jina v2 base code ( #7596 )  
							
							... 
							
							
							
							* feat: add changes to handle jina v2 base code
* fix: do not complicate things
* fix: fix the usage of the code model
* fix: fix comments
* fix: fix linting issues
* fix: remove ollama patches
* style : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-06-06 10:22:41 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2d08b7fbb4 
								
							 
						 
						
							
							
								
								docker : build only main and server in their images ( #7782 )  
							
							... 
							
							
							
							* add openmp lib to dockerfiles
* build only main and server in their docker images 
							
						 
						
							2024-06-06 08:19:49 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d67caea0d6 
								
							 
						 
						
							
							
								
								docker : add openmp lib ( #7780 )  
							
							
							
						 
						
							2024-06-06 08:17:21 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Galunid 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7672adeec7 
								
							 
						 
						
							
							
								
								Fix encoding in python scripts ( #7733 )  
							
							
							
						 
						
							2024-06-06 03:07:24 +10:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7d1a378b8f 
								
							 
						 
						
							
							
								
								CUDA: refactor mmq, dmmv, mmvq ( #7716 )  
							
							... 
							
							
							
							* CUDA: refactor mmq, dmmv, mmvq
* fix out-of-bounds write
* struct for qk, qr, qi
* fix cmake build
* mmq_type_traits 
							
						 
						
							2024-06-05 16:53:00 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2b3389677a 
								
							 
						 
						
							
							
								
								ggml : refactor rope norm/neox ( #7634 )  
							
							... 
							
							
							
							* ggml : unify rope norm/neox (CPU)
* ggml : fix compile warning
* ggml : remove GLM rope mode
ggml-ci
* metal : better rope implementation
ggml-ci
* cuda : better rope implementation
ggml-ci
* naming : n_orig_ctx -> n_ctx_orig
ggml-ci
* dev : add reminders to update backends
ggml-ci
* vulkan : fix ggml_rope_ext() usage
* cuda : fix array size + indents
ggml-ci 
							
						 
						
							2024-06-05 11:29:20 +03:00