llama.cpp/src
Molly Sophia ee7136c6d1
llama: add support for QRWKV6 model architecture (#11001)
llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix some typos

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix cuda warning

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update README.md

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-10 09:58:08 +08:00
..
CMakeLists.txt llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-adapter.cpp lora : improve compat with mergekit-extract-lora (#11131) 2025-01-08 15:59:53 +01:00
llama-adapter.h lora : improve compat with mergekit-extract-lora (#11131) 2025-01-08 15:59:53 +01:00
llama-arch.cpp llama: add support for QRWKV6 model architecture (#11001) 2025-01-10 09:58:08 +08:00
llama-arch.h llama: add support for QRWKV6 model architecture (#11001) 2025-01-10 09:58:08 +08:00
llama-batch.cpp llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-batch.h llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-chat.cpp llama-chat : add phi 4 template (#11148) 2025-01-09 10:07:33 +01:00
llama-chat.h llama-chat : add phi 4 template (#11148) 2025-01-09 10:07:33 +01:00
llama-context.cpp llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-context.h llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-cparams.cpp llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-cparams.h llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-grammar.cpp llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-grammar.h llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-hparams.cpp llama: add support for QRWKV6 model architecture (#11001) 2025-01-10 09:58:08 +08:00
llama-hparams.h llama: add support for QRWKV6 model architecture (#11001) 2025-01-10 09:58:08 +08:00
llama-impl.cpp GGUF: C++ refactor, backend support, misc fixes (#11030) 2025-01-07 18:01:58 +01:00
llama-impl.h llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-kv-cache.cpp llama : rename missed batch params/vars to ubatch (#10059) 2025-01-06 11:28:17 +02:00
llama-kv-cache.h llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-mmap.cpp mmap : fix fileno macro clash (#11076) 2025-01-06 10:52:38 +02:00
llama-mmap.h mmap : fix fileno macro clash (#11076) 2025-01-06 10:52:38 +02:00
llama-model-loader.cpp GGUF: C++ refactor, backend support, misc fixes (#11030) 2025-01-07 18:01:58 +01:00
llama-model-loader.h llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-model.cpp llama: add support for QRWKV6 model architecture (#11001) 2025-01-10 09:58:08 +08:00
llama-model.h llama: add support for QRWKV6 model architecture (#11001) 2025-01-10 09:58:08 +08:00
llama-quant.cpp llama: add support for QRWKV6 model architecture (#11001) 2025-01-10 09:58:08 +08:00
llama-quant.h llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama-sampling.cpp llama : use LLAMA_TOKEN_NULL (#11062) 2025-01-06 10:52:15 +02:00
llama-sampling.h llama : add DRY sampler (#9702) 2024-10-25 19:07:34 +03:00
llama-vocab.cpp llama : use LLAMA_TOKEN_NULL (#11062) 2025-01-06 10:52:15 +02:00
llama-vocab.h llama : refactor src/llama.cpp (#10902) 2025-01-03 10:18:53 +02:00
llama.cpp llama: add support for QRWKV6 model architecture (#11001) 2025-01-10 09:58:08 +08:00
unicode-data.cpp server : better security control for public deployments (#9776) 2024-10-08 13:27:04 +02:00
unicode-data.h llama : reduce compile time and binary size (#9712) 2024-10-02 15:49:55 +02:00
unicode.cpp llama : Add support for DeepSeek V3 (#11049) 2025-01-04 21:06:11 +01:00
unicode.h unicode : improve naming style (#10838) 2024-12-16 12:31:45 +02:00