Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								468ea24fb4 
								
							 
						 
						
							
							
								
								CUDA: faster non k-quant mul_mat_q kernels ( #2483 )  
							
							
							
						 
						
							2023-08-02 18:04:04 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4f6b60c776 
								
							 
						 
						
							
							
								
								CUDA: Fix models with output size != 32000 ( #2480 )  
							
							
							
						 
						
							2023-08-02 16:48:10 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									ldwang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								220d931864 
								
							 
						 
						
							
							
								
								readme : add Aquila-7B model series to supported models ( #2487 )  
							
							... 
							
							
							
							* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
* Add Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
* Up Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com> 
							
						 
						
							2023-08-02 11:21:11 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eve 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								81844fbcfd 
								
							 
						 
						
							
							
								
								tests : Fix compilation warnings (Linux/GCC) ( #2451 )  
							
							... 
							
							
							
							* fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params 
							
						 
						
							2023-08-02 11:06:19 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Yiming Cui 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a312193e18 
								
							 
						 
						
							
							
								
								readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models ( #2475 )  
							
							... 
							
							
							
							* add support for chinese llama-2 / alpaca-2
* remove white spaces 
							
						 
						
							2023-08-02 09:18:31 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Bono Lv 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c574bddb36 
								
							 
						 
						
							
							
								
								fix a typo in examples/server/README.md ( #2478 )  
							
							
							
						 
						
							2023-08-01 14:54:28 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									ebraminio 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								86aeb27734 
								
							 
						 
						
							
							
								
								server : Support dark mode ( #2414 )  
							
							... 
							
							
							
							* server : Support dark mode
So it respects user system light / dark settings.
* Update index.html.hpp by running ./deps.sh 
							
						 
						
							2023-08-01 10:56:23 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Matteo Boschini 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1873ff586b 
								
							 
						 
						
							
							
								
								metal : add gqa8 kernel to allow llama-2-70B on metal ( #2459 )  
							
							... 
							
							
							
							* Added gqa8 kernel to allow llama-2-70B on metal
* Update ggml-metal.m
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast
* Added ne03==ne13 assertion
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> 
							
						 
						
							2023-08-01 10:43:12 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								49e7cb5bb1 
								
							 
						 
						
							
							
								
								CUDA: fixed LLAMA_FAST compilation option ( #2473 )  
							
							
							
						 
						
							2023-07-31 21:02:19 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b772bba42e 
								
							 
						 
						
							
							
								
								CUDA: fixed cmake F16 option ( #2471 )  
							
							
							
						 
						
							2023-07-31 19:52:22 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0728c5a8b9 
								
							 
						 
						
							
							
								
								CUDA: mmq CLI option, fixed mmq build issues ( #2453 )  
							
							
							
						 
						
							2023-07-31 15:44:35 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1215ed7d5c 
								
							 
						 
						
							
							
								
								CUDA: Implemented row flattening for non-glm RoPE ( #2468 )  
							
							
							
						 
						
							2023-07-31 14:32:30 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2dbf518911 
								
							 
						 
						
							
							
								
								CUDA: fewer memory bank conflicts for mul_mat_q ( #2458 )  
							
							
							
						 
						
							2023-07-31 13:18:51 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9d2382b3e4 
								
							 
						 
						
							
							
								
								Fix Metal backend broken from the allocator changes ( #2455 )  
							
							... 
							
							
							
							* fix Metal backend broken from the allocator changes 
							
						 
						
							2023-07-31 11:02:53 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a113689571 
								
							 
						 
						
							
							
								
								ggml : add graph tensor allocator ( #2411 )  
							
							... 
							
							
							
							* ggml : add graph tensor allocator
* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset
* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset 
							
						 
						
							2023-07-30 15:58:01 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								11f3ca06b8 
								
							 
						 
						
							
							
								
								CUDA: Quantized matrix matrix multiplication ( #2160 )  
							
							... 
							
							
							
							* mmq implementation for non k-quants
* q6_K
* q2_K
* q3_k
* q4_K
* vdr
* q5_K
* faster q8_1 loading
* loop unrolling
* add __restrict__
* q2_K sc_high
* GGML_CUDA_MMQ_Y
* Updated Makefile
* Update Makefile
* DMMV_F16 -> F16
* Updated README, CMakeLists
* Fix CMakeLists.txt
* Fix CMakeLists.txt
* Fix multi GPU out-of-bounds 
							
						 
						
							2023-07-29 23:04:44 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9baf9ef304 
								
							 
						 
						
							
							
								
								CUDA: faster multi GPU synchronization ( #2448 )  
							
							
							
						 
						
							2023-07-29 23:04:10 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									klosax 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8a88e5855c 
								
							 
						 
						
							
							
								
								perplexity : add Hellaswag calculation ( #2389 )  
							
							... 
							
							
							
							* common.h : add hellaswag / remove perplexity-lines
* common.cpp : add hellaswag / remove perplexity-lines
* perplexity.cpp : add hellswag scores / remove perplexity-lines
* perplexity.cpp : clean up
* common.h : change default param value
* common.cpp : Change default param
* perplexity.cpp : alter wording
* common.h : alter wording
* common.cpp : alter wording 
							
						 
						
							2023-07-28 21:25:36 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Lee 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9559bf77b 
								
							 
						 
						
							
							
								
								ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c ( #2405 )  
							
							
							
						 
						
							2023-07-28 21:17:45 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									eric8607242 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ee1b497c98 
								
							 
						 
						
							
							
								
								llama : support more diverse tokenizers? ( #2420 )  
							
							... 
							
							
							
							* supporting more diverse tokenizers
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2023-07-28 21:10:05 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d73b8d48b4 
								
							 
						 
						
							
							
								
								examples : fix whitespace  
							
							
							
						 
						
							2023-07-28 21:05:08 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									nhamanasu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								34ae1caf7f 
								
							 
						 
						
							
							
								
								examples : server chat mode with llama2 ( #2400 )  
							
							... 
							
							
							
							* add: server chat mode with llama2
* fix: remove the unnecessary last \n 
							
						 
						
							2023-07-28 21:02:10 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Weird Constructor 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d91f3f0c55 
								
							 
						 
						
							
							
								
								readme : fix the description of the Tail free sampling (TFS) method ( #2431 )  
							
							
							
						 
						
							2023-07-28 11:44:43 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Rand Xie 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								65cdf34bdc 
								
							 
						 
						
							
							
								
								llama : use n_embd_gqa instead of n_embd to handle llama-2 70B ( #2433 )  
							
							
							
						 
						
							2023-07-28 11:42:53 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									niansa/tuxifan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								edcc7ae7d2 
								
							 
						 
						
							
							
								
								Obtaining LLaMA 2 instructions ( #2308 )  
							
							... 
							
							
							
							* Obtaining LLaMA 2 instructions
* Removed sharing warning for LLaMA 2
* Linked TheBloke's GGML repos
* Add LLaMA 2 to list of supported models
* Added LLaMA 2 usage instructions
* Added links to LLaMA 2 70B models 
							
						 
						
							2023-07-28 03:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									mj-shifu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7c529cede6 
								
							 
						 
						
							
							
								
								convert.py : Update to support 70B HF format model files ( #2427 )  
							
							... 
							
							
							
							* convert.py : fix llama 2 70b conversion from Huggingface 
							
						 
						
							2023-07-27 14:39:17 -06:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1a941869cb 
								
							 
						 
						
							
							
								
								metal : disable graph concurrency optimization due to bug ( #2413 )  
							
							
							
						 
						
							2023-07-27 11:00:54 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b5472ea0ad 
								
							 
						 
						
							
							
								
								ggml : fix assert in ggml_set_unary_op ( #2410 )  
							
							
							
						 
						
							2023-07-26 23:57:23 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Cebtenzzre 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6df1f5940f 
								
							 
						 
						
							
							
								
								make : build with -Wmissing-prototypes ( #2394 )  
							
							
							
						 
						
							2023-07-26 21:00:04 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5488fb789e 
								
							 
						 
						
							
							
								
								ggml : allocate graphs in a context ( #2392 )  
							
							... 
							
							
							
							* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2023-07-26 15:56:53 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Kawrakow 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								eb542d3932 
								
							 
						 
						
							
							
								
								Add LLAMA_DEFAULT_RMS_EPS so we can change the default ( #2384 )  
							
							... 
							
							
							
							Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> 
							
						 
						
							2023-07-25 18:35:53 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								07aaa0f63f 
								
							 
						 
						
							
							
								
								ggml : fix ggml_flash_attn to use op_params ( #2387 )  
							
							... 
							
							
							
							* ggml : fix ggml_flash_attn to use op_params 
							
						 
						
							2023-07-25 16:20:12 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									ldwang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fce48caf9a 
								
							 
						 
						
							
							
								
								convert.py : support bpe tokenizer ( #2228 )  
							
							... 
							
							
							
							* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com> 
							
						 
						
							2023-07-25 16:22:09 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jiahao Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								875086bdb9 
								
							 
						 
						
							
							
								
								ggml : relax contiguous constraints in activation function ( #2371 )  
							
							
							
						 
						
							2023-07-25 15:58:32 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								da1889834a 
								
							 
						 
						
							
							
								
								ggml : improve graph build time via hash table lookup ( #2329 )  
							
							... 
							
							
							
							* improve graph build time
* ggml_tensor : use 1 bit per flag
* use a hash table instead 
							
						 
						
							2023-07-25 15:32:20 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Hesen Peng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								82552b7f54 
								
							 
						 
						
							
							
								
								build : fix line breaking error in build-info.sh ( #2349 )  
							
							... 
							
							
							
							* fix line breaking
* build number line break removal 
							
						 
						
							2023-07-25 15:24:09 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xiao-Yong Jin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0c06204fb3 
								
							 
						 
						
							
							
								
								main : add --in-prefix-bos to prefix BOS to user inputs; keep EOS ( #2304 )  
							
							... 
							
							
							
							* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools 
							
						 
						
							2023-07-25 15:19:11 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eve 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1fed755b1f 
								
							 
						 
						
							
							
								
								ci : add non-AVX scalar build/test ( #2356 )  
							
							... 
							
							
							
							* noavx build and test
* we don't need to remove f16c in windows 
							
						 
						
							2023-07-25 15:16:13 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									katsu560 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								be2301bcda 
								
							 
						 
						
							
							
								
								k_quants : add AVX support to dot functions with QK_K as 64 ( #2339 )  
							
							... 
							
							
							
							* add AVX to ggml_vec_dot_q2_K_q8_K()
* add AVX to ggml_vec_dot_q3_K_q8_K()
* add AVX to ggml_vec_dot_q4_K_q8_K()
* add AVX to ggml_vec_dot_q5_K_q8_K()
* add AVX to ggml_vec_dot_q6_K_q8_K()
* refactor AVX code in ggml_vec_dot_q6_K_q8_K() 
							
						 
						
							2023-07-25 15:13:41 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Shouzheng Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1aa18ef994 
								
							 
						 
						
							
							
								
								metal : concurrently dispatch commands ( #2358 )  
							
							... 
							
							
							
							* metal: concurrently dispatch commands
Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.
* metal: don't call find_concurrency automatically.
* metal : code style changes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2023-07-25 15:00:19 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Kawrakow 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9a08eaf3c4 
								
							 
						 
						
							
							
								
								Another speed gain for Q4_0 and Q4_1 on Metal ( #2375 )  
							
							... 
							
							
							
							* Another speed gain for Q4_0 and Q4_1 on Metal
* Have N_DST, etc., be template parameters
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> 
							
						 
						
							2023-07-25 13:48:29 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Kawrakow 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								129d844c87 
								
							 
						 
						
							
							
								
								Fix Q4_K and Q5_K for QK_K = 64 on CUDA ( #2359 )  
							
							... 
							
							
							
							* Fix Q4_K and Q5_K for QK_K = 64
* Very slightly better Q5_K bit fiddling
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> 
							
						 
						
							2023-07-25 13:48:04 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d5512b782b 
								
							 
						 
						
							
							
								
								server: add rms_norm_eps parameter ( #2380 )  
							
							
							
						 
						
							2023-07-25 12:36:17 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Henri Vasserman 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c798308e3a 
								
							 
						 
						
							
							
								
								[Server] Escape HTML in webchat ( #2368 )  
							
							... 
							
							
							
							* escape HTML in webchat
* add amp 
							
						 
						
							2023-07-25 10:27:34 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								41c674161f 
								
							 
						 
						
							
							
								
								make rms_norm_eps a parameter ( #2374 )  
							
							... 
							
							
							
							* make rms_norm_eps a parameter
* add rms_norm_eps to command line
* fix baby llama, test-grad0
* use scientific notation for eps param in the help
ggml-ci 
							
						 
						
							2023-07-24 17:57:12 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Aarni Koskela 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b3f138d058 
								
							 
						 
						
							
							
								
								Chat UI extras ( #2366 )  
							
							... 
							
							
							
							* makefile: correct deps for server
* server: tighten settings layout a little
* server: expose all currently configured generation params in UI
* server: expose remaining generation params, for the adventurous
* server: embetter mirostat fields 
							
						 
						
							2023-07-24 17:54:22 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5b2b2dc6ae 
								
							 
						 
						
							
							
								
								ggml : sync (unary ops refactor, static-correctness) ( #2370 )  
							
							... 
							
							
							
							* ggml : sync (unary ops, tests)
ggml-ci
* tests : remove unnecessary funcs 
							
						 
						
							2023-07-24 14:46:21 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Kawrakow 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								42f70cb2f6 
								
							 
						 
						
							
							
								
								Fix scalar version of Q5_K when QK_K = 64 ( #2362 )  
							
							... 
							
							
							
							Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> 
							
						 
						
							2023-07-24 12:55:02 +03:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Evan Jones 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								84e09a7d8b 
								
							 
						 
						
							
							
								
								llama : add grammar-based sampling ( #1773 )  
							
							... 
							
							
							
							* llama, main : constrain sampling to grammar
* allow loading grammar from file
* fix whitespace errors
* handle & print parser errors
* add comments to grammar syntax and allow newlines where unambiguous
* add missing include
* support alternates in root rule
* fix bugs with empty token and EOS
* adjust JSON grammar
* remove swp file
* rewrite ternary expressions
Co-authored-by: Henri Vasserman <henv@hot.ee>
* use struct for grammar elements and add Unicode support
* add unicode escapes
* add inverse char ranges
* only sample full tokens (no peeking or truncation)
* llama : minor style changes
blindly applied in online editor - hopefully I didn't break something
* update help text
* add warning message if EOS is disabled
---------
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2023-07-23 23:58:10 -04:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Kawrakow 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2f9cf974a0 
								
							 
						 
						
							
							
								
								Some more Q4_K and Q5_K speedup on CUDA ( #2346 )  
							
							... 
							
							
							
							* Faster Q5_K on CUDA
* Small Q5_K improvement on older GPUs
* Spped up Q4_K on CUDA
GTX1660: 29.5 ms/t -> 25.6 ms/t
RTX4080: 8.40 ms/t -> 8.25 ms/t
* Spped up Q4_K on CUDA
GTX1660: 36.7 ms/t -> 35.6 ms/t
RTX4080:  9.8 ms/t ->  9.5 ms/t
* Address PR comments
* Add some comments to satisfy PR reviewer
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> 
							
						 
						
							2023-07-24 00:19:47 +03:00