Commit graph

  • bc4b5ed1cb Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels. Adam Treat 2023-10-04 14:24:35 -04:00
  • de589ced7c Change this back to be in agreement with metal and our previous softmax kernel. Adam Treat 2023-10-03 13:30:23 -04:00
  • 6ac39752bf Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch. Adam Treat 2023-10-03 12:40:24 -04:00
  • 32289aa447 Fixes for norm. Adam Treat 2023-10-02 21:00:48 -04:00
  • 06d4b21598 Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama. Adam Treat 2023-10-02 11:30:10 -04:00
  • f1c9bc1821 Add q6_k getrows and mul*vec kernel. Adam Treat 2023-10-02 09:05:22 -04:00
  • 4b223ec432 Refactor getrows to use common code and get ready for q6_k. Adam Treat 2023-10-02 09:04:02 -04:00
  • 5509f74318 Minor cleanup. Adam Treat 2023-10-02 09:01:45 -04:00
  • 601905e75e Move the subgroups and printf into common. Adam Treat 2023-10-02 09:00:55 -04:00
  • 93306f16d0 Consolidate code for mat x vec kernels and use subgroups more extensively. Adam Treat 2023-09-29 10:02:22 -04:00
  • 77135a3bf5 Add a common boilerplate code via include and elim copy pasta Adam Treat 2023-09-21 13:00:10 -04:00
  • 9e4f8b4acc Upload immediately to device. Adam Treat 2023-09-26 11:58:39 -04:00
  • 6b6c73a9e3 kompute : don't fail build because of -Warray-bounds Cebtenzzre 2023-09-26 10:35:05 -04:00
  • 1b1416d7b7 Support for gguf. Adam Treat 2023-09-21 12:39:33 -04:00
  • e50ab5af5b
    llama : increase inference graph size up to 4096 nodes Georgi Gerganov 2023-11-03 21:59:02 +02:00
  • d9b33fe95b
    metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) b1483 Peter Sugihara 2023-11-03 12:18:18 -07:00
  • 8f312d91bb ggml-metal: round up to 16 to fix MTLDebugComputeCommandEncoder assertion psugihara 2023-11-03 12:03:58 -07:00
  • 5ba3746171
    ggml-metal: fix yarn rope (#3937) Xiao-Yong Jin 2023-11-03 13:00:31 -05:00
  • d40ab6a116 ggml-metal: fix yarn rope Xiao-Yong Jin 2023-11-03 12:31:31 -05:00
  • 00bea85cf2 Remove cblas dependency 0cc4m 2023-11-03 17:45:28 +01:00
  • 36f43ae834 syntax correction Concedo 2023-11-04 00:03:45 +08:00
  • 9bc2e35b2e Merge branch 'master' into concedo_experimental Concedo 2023-11-03 23:51:32 +08:00
  • 373c20ad51 print error log if tunnel fails Concedo 2023-11-03 23:48:21 +08:00
  • 815bf1a2f6 prop.memoryPoolsSupported cant be found in cuda 17. Revert back to basic error check. Oleksii Maryshchenko 2023-11-03 15:51:53 +01:00
  • c42ca8f1b7 GGML_CUDA_FORCE_CUSTOM_MEMORY_POOL was added to force use only custom memory pool Oleksii Maryshchenko 2023-11-03 15:06:40 +01:00
  • bd56886fd6 set nullptr to memory pool element if it failed during initialization. Oleksii Maryshchenko 2023-11-03 13:46:14 +01:00
  • b1592ea054
    train : fix context size calculations Georgi Gerganov 2023-11-03 14:40:45 +02:00
  • ce4df17e42 Check CUDA memory support in device properties. Oleksii Maryshchenko 2023-11-03 13:40:28 +01:00
  • ee311f5ca7 Allow common process_escapes to handle \x sequences KerfuffleV2 2023-11-03 05:20:06 -06:00
  • abb77e7319
    ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) b1481 slaren 2023-11-03 12:13:09 +01:00
  • 8002d1014f add comment slaren 2023-11-03 12:12:08 +01:00
  • c794fd5ceb sampler seed added (+1 squashed commits) Concedo 2023-11-03 16:32:55 +08:00
  • dc22db734b
    scripts : fix header in sync script Georgi Gerganov 2023-11-03 10:14:53 +02:00
  • d7729ac3eb Merge branch 'master' into concedo_experimental Concedo 2023-11-03 16:00:05 +08:00
  • 075ee61191
    Merge branch 'master' into sync Georgi Gerganov 2023-11-03 09:49:41 +02:00
  • 66b24d9f31 sft weirui.kwr 2023-11-03 15:44:53 +08:00
  • 8f961abdc4
    speculative : change default p_accept to 0.5 + CLI args (#3919) Georgi Gerganov 2023-11-03 09:41:17 +02:00
  • 05816027d6
    common : YAYF (yet another YARN fix) (#3925) Georgi Gerganov 2023-11-03 09:24:00 +02:00
  • 88ff0e39a1
    common : YAYF (yet another YARN fix) Georgi Gerganov 2023-11-03 09:02:27 +02:00
  • 3fdbe6b66b
    llama : change yarn_ext_factor placeholder to -1 (#3922) cebtenzzre 2023-11-03 02:31:58 -04:00
  • 8c14c81b33 hopefully this fixes the dotnet nonsense Concedo 2023-11-03 11:23:56 +08:00
  • bc2027b008 Merge remote-tracking branch 'ceb/fix-fast-ext-factor' into concedo_experimental Concedo 2023-11-03 11:21:14 +08:00
  • c3b0af424c
    Update FindSIMD.cmake Eve 2023-11-03 03:21:13 +00:00
  • c07c9b857d Merge branch 'master' into concedo_experimental Concedo 2023-11-03 11:17:07 +08:00
  • 447e50a96f style netrunnereve 2023-11-02 22:57:49 -04:00
  • 14aded64cf msvc only version netrunnereve 2023-11-02 22:46:46 -04:00
  • 41a98c618e
    WIP: Initial setup for GGUF writer configuration teleprint-me 2023-11-02 22:40:28 -04:00
  • 5c5281b2f6 cleanup netrunnereve 2023-11-02 22:30:01 -04:00
  • 3cc91b2d66 msvc combines avx2 and fma into /arch:AVX2 so check for both netrunnereve 2023-11-02 22:18:41 -04:00
  • ec40b70ea9 linux/gcc version for testing netrunnereve 2023-11-02 22:06:32 -04:00
  • 4998f59e57 fix merge netrunnereve 2023-11-02 22:01:17 -04:00
  • 25fef506cf llama : change yarn_ext_factor placeholder to -1 cebtenzzre 2023-11-02 21:53:59 -04:00
  • d104725a46 pull in https://github.com/JDunn3/llama.cpp/tree/cmake netrunnereve 2023-11-02 20:57:01 -04:00
  • b7ecd0a781 cleanup netrunnereve 2023-11-02 20:56:31 -04:00
  • 51c99e467e merge in https://github.com/howard0su/llama.cpp/tree/cmake netrunnereve 2023-11-02 20:43:54 -04:00
  • 166e44b739 ggml-cuda : move row numbers to x grid dim in mmv kernels slaren 2023-11-03 00:00:18 +01:00
  • 803703478d build with cmake, not tested (WIP) M. Yusuf Sarıgöz 2023-11-03 01:34:52 +03:00
  • e84003b430 Move llava back to examples M. Yusuf Sarıgöz 2023-11-03 01:10:26 +03:00
  • f30b4e69d1 fix C includes in C++ source files cebtenzzre 2023-11-02 18:01:13 -04:00
  • a9162dd01f make : remove unneeded deps and add test-rope target cebtenzzre 2023-11-02 17:54:57 -04:00
  • 635e9fadfd fix includes with help from include-what-you-use cebtenzzre 2023-11-01 13:09:21 -04:00
  • 2b5136e1c2 cmake : fix joining of REAL_GIT_DIR cebtenzzre 2023-11-02 12:42:36 -04:00
  • ab54e65f02 Merge branch 'master' of github.com:ggerganov/llama.cpp Laura 2023-11-02 21:46:51 +01:00
  • 629f917cd6
    cuda : add ROCM aliases for CUDA pool stuff (#3918) b1477 Kerfuffle 2023-11-02 13:58:22 -06:00
  • 9276bff78e Add ROCM aliases for CUDA pool stuff KerfuffleV2 2023-11-02 13:55:15 -06:00
  • 51b2fc11f7
    cmake : fix relative path to git submodule index (#3915) b1476 Andrei 2023-11-02 15:40:31 -04:00
  • 409abe9fb9 Fix relative path to git submodule index Andrei Betlen 2023-11-02 15:04:26 -04:00
  • 7f8e2a5407
    sync : update sync-ggml.sh with new files Georgi Gerganov 2023-11-02 20:52:19 +02:00
  • 224e7d5b14
    readme : add notice about #3912 Georgi Gerganov 2023-11-02 20:44:12 +02:00
  • f3fb45b139
    Merge branch 'master' into sync Georgi Gerganov 2023-11-02 20:33:09 +02:00
  • c7743fe1c1
    cuda : fix const ptrs warning causing ROCm build issues (#3913) b1474 Georgi Gerganov 2023-11-02 20:32:11 +02:00
  • e2349ec13b
    sync : update graph copies to new ggml API Georgi Gerganov 2023-11-02 20:29:43 +02:00
  • 16e819d53c
    sync : pass custom graph sizes in training examples Georgi Gerganov 2023-11-02 19:59:35 +02:00
  • 815f44e5a3
    sync : try to fix build on tvOS Georgi Gerganov 2023-11-02 19:22:06 +02:00
  • d1a1678b07
    Merge branch 'master' into fix-cuda-warnings Georgi Gerganov 2023-11-02 19:19:21 +02:00
  • 3701a06253
    cuda : fix const ptrs warning causing ROCm build issues Georgi Gerganov 2023-11-02 19:13:30 +02:00
  • d6069051de
    cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) b1473 Oleksii Maryshchenko 2023-11-02 18:10:39 +01:00
  • 8401e3ebcd
    llama : fix save/load state context size Georgi Gerganov 2023-11-02 19:02:35 +02:00
  • 83c96d5809
    sync : ggml-cuda Georgi Gerganov 2023-11-02 18:40:33 +02:00
  • 4fe646ffbe
    sync : update tests + fix max op params to 64 Georgi Gerganov 2023-11-02 18:32:29 +02:00
  • e8190705fb
    sync : migrate examples and llama.cpp to dynamic graphs (wip) Georgi Gerganov 2023-11-02 18:29:10 +02:00
  • 007be85087 model.py : add missing future import cebtenzzre 2023-11-02 12:08:44 -04:00
  • aa7a2c445d
    sync : ggml (backend v2) (wip) Georgi Gerganov 2023-11-02 18:06:16 +02:00
  • 879061c5d5 noavx2 clblast selector Concedo 2023-11-02 23:13:16 +08:00
  • c7c3f3d9ab updated lite Concedo 2023-11-02 22:46:54 +08:00
  • b0c7b88eac try fix clouflare tunnel (+2 squashed commit) Concedo 2023-11-02 21:16:12 +08:00
  • 4ff1046d75
    gguf : print error for GGUFv1 files (#3908) b1472 Georgi Gerganov 2023-11-02 16:22:30 +02:00
  • 6dbb8d82b0 Merge branch 'master' into concedo_experimental Concedo 2023-11-02 20:51:45 +08:00
  • 42eabf2f2f rope fixes Concedo 2023-11-02 20:41:16 +08:00
  • f3069478ad
    gguf : print error for GGUFv1 files Georgi Gerganov 2023-11-02 14:31:14 +02:00
  • 21958bb393
    cmake : disable LLAMA_NATIVE by default (#3906) b1471 slaren 2023-11-02 13:10:33 +01:00
  • c217a66bb2 disable LLAMA_NATIVE by default slaren 2023-11-02 13:04:19 +01:00
  • 587ff3bf1a Removed redundant cublasSetStream Oleksii Maryshchenko 2023-11-02 12:32:01 +01:00
  • 7e6f41327a If cuda device doesnt support memory pool than use old implementation. Oleksii Maryshchenko 2023-11-02 11:26:38 +01:00
  • bc4ff72317 not working merge Concedo 2023-11-02 17:52:40 +08:00
  • 08868a4474 Using cuda memory pools for async alloc/dealloc. Oleksii Maryshchenko 2023-11-02 10:45:07 +01:00
  • 2756c4fbff
    gguf : remove special-case code for GGUFv1 (#3901) b1470 Georgi Gerganov 2023-11-02 11:20:21 +02:00
  • 347d587a6e
    gguf : remove special-case code for GGUFv1 Georgi Gerganov 2023-11-02 10:18:09 +02:00
  • 1efae9b7dc
    llm : prevent from 1-D tensors being GPU split (#3697) b1469 Georgi Gerganov 2023-11-02 09:54:18 +02:00
  • fca7a4c054 added noavx2 model for clblast (+1 squashed commits) Concedo 2023-11-02 14:53:01 +08:00