Molly Sophia
276d53b18f
build_rwkv6: Simplify graph
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:22:05 +08:00
Molly Sophia
12fbe1ade2
Use MODEL_ARCH.RWKV6 instead of MODEL_ARCH.RWKV
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:22:05 +08:00
Molly Sophia
5afa3eff3a
Update convert_hf_to_gguf.py
...
Co-authored-by: compilade <git@compilade.net>
2024-08-28 10:22:05 +08:00
Molly Sophia
ae9936a80d
Update convert_hf_to_gguf.py
...
Co-authored-by: compilade <git@compilade.net>
2024-08-28 10:22:05 +08:00
Molly Sophia
8aa711ad98
ggml: Add backward computation for unary op `exp
`
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:22:05 +08:00
Molly Sophia
c6955525b4
Update convert_hf_to_gguf.py
...
Co-authored-by: compilade <git@compilade.net>
2024-08-28 10:22:05 +08:00
Molly Sophia
7f2e370fa2
convert_hf_to_gguf: rwkv tokenizer: Don't escape sequences manually
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:22:05 +08:00
Molly Sophia
18decea3ed
convert_hf_to_gguf: rwkv: Avoid using `eval
`
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:22:05 +08:00
Molly Sophia
8bc1f9ae80
build_rwkv: Avoid using inplace operations
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:22:05 +08:00
Molly Sophia
6ae2f4866f
Remove trailing whitespaces
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:22:05 +08:00
Molly Sophia
01dcf4bb77
Fix parallel inferencing for RWKV
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:22:04 +08:00
Molly Sophia
98ce5f43f0
Fix offloading layers to CUDA
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:21:21 +08:00
Molly Sophia
903089b5eb
Add `wkv.head_size
` key for RWKV
...
so it doesn't reuse Mamba ssm parameters
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:21:21 +08:00
Molly Sophia
8d498c7075
Add `rescale_every_n_layers
` parameter
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:21:21 +08:00
Molly Sophia
0784a0cf26
RWKV v6 graph building
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:21:20 +08:00
Molly Sophia
5732de89b7
ggml: Add unary operator Exp
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:20:24 +08:00
Molly Sophia
0e5ac349f8
Fix rwkv tokenizer
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:20:24 +08:00
Molly Sophia
a180b63b49
Load more tensors for rwkv v6
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:20:24 +08:00
Molly Sophia
700dad1b86
Fix build
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:20:24 +08:00
Layl Bongers
b3b17e05fe
Add placeholder llm_build_time_mix
2024-08-28 10:20:24 +08:00
Layl Bongers
3cbeffc50f
Add time mix output loading
2024-08-28 10:20:24 +08:00
Layl Bongers
b409fd8e11
Add remaining time mix parameters
2024-08-28 10:20:24 +08:00
Layl Bongers
dd3aa3d40e
Add time mix KVRG & correct merge mistake
2024-08-28 10:20:24 +08:00
Layl Bongers
5479588569
Add rwkv5 layer norms
2024-08-28 10:20:24 +08:00
Layl Bongers
4e23d9715b
Add logits conversion to rwkv5
2024-08-28 10:20:24 +08:00
Layl Bongers
a866789603
Add workaround for kv cache
2024-08-28 10:20:24 +08:00
Layl Bongers
a0aae8d671
Add (broken) placeholder graph builder for RWKV
2024-08-28 10:20:24 +08:00
Layl Bongers
e92c74f4a1
Fix model loading
2024-08-28 10:20:24 +08:00
Layl Bongers
7cac72a80b
Do not use special tokens when matching in RWKV tokenizer
2024-08-28 10:20:24 +08:00
Molly Sophia
865167d01a
Fix build
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:20:24 +08:00
Layl Bongers
dc0767f4b3
Add RWKV tokenization
2024-08-28 10:20:24 +08:00
Molly Sophia
8d2eca3507
convert_hf_to_gguf: Add support for RWKV v6
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-28 10:20:24 +08:00
Georgi Gerganov
20f1789dfb
vulkan : fix build ( #0 )
...
ggml-ci
2024-08-27 22:41:27 +03:00
Georgi Gerganov
231cff5f6f
sync : ggml
2024-08-27 22:41:27 +03:00
Xie Yanbo
3246fe84d7
Fix minicpm example directory ( #9111 )
2024-08-27 14:33:08 +02:00
compilade
78eb487bb0
llama : fix qs.n_attention_wv for DeepSeek-V2 ( #9156 )
2024-08-27 13:09:23 +03:00
Xuan Son Nguyen
a77feb5d71
server : add some missing env variables ( #9116 )
...
* server : add some missing env variables
* add LLAMA_ARG_HOST to server dockerfile
* also add LLAMA_ARG_CONT_BATCHING
2024-08-27 11:07:01 +02:00
CausalLM
2e59d61c1b
llama : fix ChatGLM4 wrong shape ( #9194 )
...
This should fix THUDM/glm-4-9b-chat-1m and CausalLM/miniG
2024-08-27 09:58:22 +03:00
Carsten Kragelund Jørgensen
75e1dbbaab
llama : fix llama3.1 rope_freqs not respecting custom head_dim ( #9141 )
...
* fix: llama3.1 rope_freqs not respecting custom head_dim
* fix: use potential head_dim for Exaone
2024-08-27 09:53:40 +03:00
arch-btw
ad76569f8e
common : Update stb_image.h to latest version ( #9161 )
...
* Update stb_image.h to latest version
Fixes https://github.com/ggerganov/llama.cpp/issues/7431
* Update .ecrc
2024-08-27 08:58:50 +03:00
slaren
7d787ed96c
ggml : do not crash when quantizing q4_x_x with an imatrix ( #9192 )
2024-08-26 19:44:43 +02:00
Georgi Gerganov
06658ad7c3
metal : separate scale and mask from QKT in FA kernel ( #9189 )
...
* metal : separate scale and mask from QKT in FA kernel
* metal : ne01 check no longer necessary
* metal : keep data in local memory
2024-08-26 18:31:02 +03:00
Georgi Gerganov
fc18425b6a
ggml : add SSM Metal kernels ( #8546 )
...
* ggml : add ggml_ssm_conv metal impl
* ggml : add ssm_scan metal impl
ggml-ci
2024-08-26 17:55:36 +03:00
Georgi Gerganov
879275ac98
tests : fix compile warnings for unreachable code ( #9185 )
...
ggml-ci
2024-08-26 16:30:25 +03:00
Georgi Gerganov
7a3df798fc
ci : add VULKAN support to ggml-ci ( #9055 )
2024-08-26 12:19:39 +03:00
Georgi Gerganov
e5edb210cd
server : update deps ( #9183 )
2024-08-26 12:16:57 +03:00
slaren
0c41e03ceb
metal : gemma2 flash attention support ( #9159 )
2024-08-26 11:08:59 +02:00
slaren
f12ceaca0c
ggml-ci : try to improve build time ( #9160 )
2024-08-26 11:03:30 +02:00
Justine Tunney
436787f170
llama : fix time complexity of string replacement ( #9163 )
...
This change fixes a bug where replacing text in a very long string could
cause llama.cpp to hang indefinitely. This is because the algorithm used
was quadratic, due to memmove() when s.replace() is called in a loop. It
seems most search results and LLM responses actually provide the O(n**2)
algorithm, which is a great tragedy. Using a builder string fixes things
2024-08-26 09:09:53 +03:00
Herman Semenov
93bc3839f9
common: fixed not working find argument --n-gpu-layers-draft ( #9175 )
2024-08-26 00:54:37 +02:00