llama.cpp

History

Eve adc5dd92e8 vulkan: scale caching for k quants + misc fixes (#11081 ) * q6_k scale caching * 16 bit unpack * q4_k test (slow) * revert it * q3_k * q2_k * little stuff * try precalculating products of a and q2_k scales * Revert "try precalculating products of a and q2_k scales" This reverts commit 65110b81f23f66331a50c6e889a7c1ab9470a86b. * unpack should be u16, add vim swap to gitignore (about time) * better q4_k scales * q5_k * better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations * q2_k better dequant * q3_k optimizations * q3_k use hmask simd from cpu avx version * make the caches happy * q3_k separate out calculation * q2_k separate out * little stuff * use calc_superblock everywhere * q2_k optimize scale calculation * more barriers		2025-01-15 19:50:13 +00:00
..
include	RoPE: fix back, CUDA support for back + noncont. (#11240 )	2025-01-15 12:51:37 +01:00
src	vulkan: scale caching for k quants + misc fixes (#11081 )	2025-01-15 19:50:13 +00:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	fix: ggml: fix vulkan-shaders-gen build (#10448 )	2025-01-15 14:17:42 +01:00