llama.cpp

Author	SHA1	Message	Date
Molly Sophia	18decea3ed	convert_hf_to_gguf: rwkv: Avoid using ``eval`` Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:22:05 +08:00
Molly Sophia	8bc1f9ae80	build_rwkv: Avoid using inplace operations Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:22:05 +08:00
Molly Sophia	6ae2f4866f	Remove trailing whitespaces Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:22:05 +08:00
Molly Sophia	01dcf4bb77	Fix parallel inferencing for RWKV Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:22:04 +08:00
Molly Sophia	98ce5f43f0	Fix offloading layers to CUDA Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:21:21 +08:00
Molly Sophia	903089b5eb	Add ``wkv.head_size`` key for RWKV so it doesn't reuse Mamba ssm parameters Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:21:21 +08:00
Molly Sophia	8d498c7075	Add ``rescale_every_n_layers`` parameter Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:21:21 +08:00
Molly Sophia	0784a0cf26	RWKV v6 graph building Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:21:20 +08:00
Molly Sophia	5732de89b7	ggml: Add unary operator Exp Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:20:24 +08:00
Molly Sophia	0e5ac349f8	Fix rwkv tokenizer Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:20:24 +08:00
Molly Sophia	a180b63b49	Load more tensors for rwkv v6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:20:24 +08:00
Molly Sophia	700dad1b86	Fix build Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:20:24 +08:00
Layl Bongers	b3b17e05fe	Add placeholder llm_build_time_mix	2024-08-28 10:20:24 +08:00
Layl Bongers	3cbeffc50f	Add time mix output loading	2024-08-28 10:20:24 +08:00
Layl Bongers	b409fd8e11	Add remaining time mix parameters	2024-08-28 10:20:24 +08:00
Layl Bongers	dd3aa3d40e	Add time mix KVRG & correct merge mistake	2024-08-28 10:20:24 +08:00
Layl Bongers	5479588569	Add rwkv5 layer norms	2024-08-28 10:20:24 +08:00
Layl Bongers	4e23d9715b	Add logits conversion to rwkv5	2024-08-28 10:20:24 +08:00
Layl Bongers	a866789603	Add workaround for kv cache	2024-08-28 10:20:24 +08:00
Layl Bongers	a0aae8d671	Add (broken) placeholder graph builder for RWKV	2024-08-28 10:20:24 +08:00
Layl Bongers	e92c74f4a1	Fix model loading	2024-08-28 10:20:24 +08:00
Layl Bongers	7cac72a80b	Do not use special tokens when matching in RWKV tokenizer	2024-08-28 10:20:24 +08:00
Molly Sophia	865167d01a	Fix build Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:20:24 +08:00
Layl Bongers	dc0767f4b3	Add RWKV tokenization	2024-08-28 10:20:24 +08:00
Molly Sophia	8d2eca3507	convert_hf_to_gguf: Add support for RWKV v6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-28 10:20:24 +08:00
Georgi Gerganov	20f1789dfb	vulkan : fix build (#0 ) ggml-ci	2024-08-27 22:41:27 +03:00
Georgi Gerganov	231cff5f6f	sync : ggml	2024-08-27 22:41:27 +03:00
Xie Yanbo	3246fe84d7	Fix minicpm example directory (#9111 )	2024-08-27 14:33:08 +02:00
compilade	78eb487bb0	llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156 )	2024-08-27 13:09:23 +03:00
Xuan Son Nguyen	a77feb5d71	server : add some missing env variables (#9116 ) * server : add some missing env variables * add LLAMA_ARG_HOST to server dockerfile * also add LLAMA_ARG_CONT_BATCHING	2024-08-27 11:07:01 +02:00
CausalLM	2e59d61c1b	llama : fix ChatGLM4 wrong shape (#9194 ) This should fix THUDM/glm-4-9b-chat-1m and CausalLM/miniG	2024-08-27 09:58:22 +03:00
Carsten Kragelund Jørgensen	75e1dbbaab	llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141 ) * fix: llama3.1 rope_freqs not respecting custom head_dim * fix: use potential head_dim for Exaone	2024-08-27 09:53:40 +03:00
arch-btw	ad76569f8e	common : Update stb_image.h to latest version (#9161 ) * Update stb_image.h to latest version Fixes https://github.com/ggerganov/llama.cpp/issues/7431 * Update .ecrc	2024-08-27 08:58:50 +03:00
slaren	7d787ed96c	ggml : do not crash when quantizing q4_x_x with an imatrix (#9192 )	2024-08-26 19:44:43 +02:00
Georgi Gerganov	06658ad7c3	metal : separate scale and mask from QKT in FA kernel (#9189 ) * metal : separate scale and mask from QKT in FA kernel * metal : ne01 check no longer necessary * metal : keep data in local memory	2024-08-26 18:31:02 +03:00
Georgi Gerganov	fc18425b6a	ggml : add SSM Metal kernels (#8546 ) * ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci	2024-08-26 17:55:36 +03:00
Georgi Gerganov	879275ac98	tests : fix compile warnings for unreachable code (#9185 ) ggml-ci	2024-08-26 16:30:25 +03:00
Georgi Gerganov	7a3df798fc	ci : add VULKAN support to ggml-ci (#9055 )	2024-08-26 12:19:39 +03:00
Georgi Gerganov	e5edb210cd	server : update deps (#9183 )	2024-08-26 12:16:57 +03:00
slaren	0c41e03ceb	metal : gemma2 flash attention support (#9159 )	2024-08-26 11:08:59 +02:00
slaren	f12ceaca0c	ggml-ci : try to improve build time (#9160 )	2024-08-26 11:03:30 +02:00
Justine Tunney	436787f170	llama : fix time complexity of string replacement (#9163 ) This change fixes a bug where replacing text in a very long string could cause llama.cpp to hang indefinitely. This is because the algorithm used was quadratic, due to memmove() when s.replace() is called in a loop. It seems most search results and LLM responses actually provide the O(n**2) algorithm, which is a great tragedy. Using a builder string fixes things	2024-08-26 09:09:53 +03:00
Herman Semenov	93bc3839f9	common: fixed not working find argument --n-gpu-layers-draft (#9175 )	2024-08-26 00:54:37 +02:00
Johannes Gäßler	f91fc5639b	CUDA: fix Gemma 2 numerical issues for FA (#9166 )	2024-08-25 22:11:48 +02:00
Johannes Gäßler	e11bd856d5	CPU/CUDA: Gemma 2 FlashAttention support (#8542 ) * CPU/CUDA: Gemma 2 FlashAttention support * apply logit_softcap to scale in kernel * disable logit softcapping tests on Metal * remove metal check	2024-08-24 21:34:59 +02:00
João Dinis Ferreira	8f824ffe8e	quantize : fix typo in usage help of `quantize.cpp` (#9145 )	2024-08-24 09:22:45 +03:00
Xuan Son Nguyen	3ba780e2a8	lora : fix llama conversion script with ROPE_FREQS (#9117 )	2024-08-23 12:58:53 +02:00
piDack	a07c32ea54	llama : use F32 precision in GLM4 attention and no FA (#9130 )	2024-08-23 10:27:17 +03:00
Akarshan Biswas	11b84eb457	[SYCL] Add a space to supress a cmake warning (#9133 )	2024-08-22 22:09:47 +08:00
luoyu-intel	1731d4238f	[SYCL] Add oneDNN primitive support (#9091 ) * add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc	2024-08-22 12:50:10 +08:00

1 2 3 4 5 ...

3664 commits