Commit graph

  • c58e809c12 protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build Meng Zhang 2023-11-04 15:15:12 -07:00
  • b9bacc78b8 Revert changes in cmake M. Yusuf Sarıgöz 2023-11-06 04:38:45 +03:00
  • 5b8b9ef987 attempt to fix build on Windows+CUDA M. Yusuf Sarıgöz 2023-11-06 04:08:21 +03:00
  • d6be69faff Upd TODOs M. Yusuf Sarıgöz 2023-11-06 03:36:53 +03:00
  • 1f8c866408 attempt to fix build on Windows M. Yusuf Sarıgöz 2023-11-06 03:27:03 +03:00
  • 6c2c67d5d7 server : handle abort case in runCompletion Jhen 2023-11-06 07:58:17 +08:00
  • fefc3db527 address review comments Jared Van Bortel 2023-11-05 16:24:48 -05:00
  • 71ea278ad8 Merge branch 'master' into llava-lib M. Yusuf Sarıgöz 2023-11-05 23:40:51 +03:00
  • ad97e0eda8 attempt to fix build on Windows M. Yusuf Sarıgöz 2023-11-05 23:36:16 +03:00
  • 2f16eccb89 special colab build Concedo 2023-11-06 01:46:58 +08:00
  • 2833a6f63c
    ggml-cuda : fix f16 mul mat (#3961) b1492 slaren 2023-11-05 18:45:16 +01:00
  • 8bcdb88e5c silence common.cpp warning (bonus) slaren 2023-11-05 18:33:52 +01:00
  • d4d45c79a9 ggml-cuda : fix f16 mul mat slaren 2023-11-05 18:21:33 +01:00
  • d9ccce2e33
    Allow common process_escapes to handle \x sequences (#3928) b1491 Kerfuffle 2023-11-05 10:06:06 -07:00
  • bb60fd0bf6
    server : fix typo for --alias shortcut from -m to -a (#3958) Thái Hoàng Tâm 2023-11-05 23:15:27 +07:00
  • 132d25b8a6
    cuda : fix disabling device with --tensor-split 1,0 (#3951) b1489 Jared Van Bortel 2023-11-05 10:08:57 -05:00
  • 2b32b170a1 clang 15 check for macOS Concedo 2023-11-05 22:57:05 +08:00
  • ea81eae189 cleanup, up ver (+1 squashed commits) Concedo 2023-11-05 22:29:36 +08:00
  • 01f06e26c3 Fix cyclical depts on Windows M. Yusuf Sarıgöz 2023-11-05 17:34:48 +03:00
  • e2e5fe56a8
    KCPP Fetches AMD ROCm Memory without a stick, CC_TURING Gets the Boot, koboldcpp_hipblas.dll Talks To The Hand, and hipBLAS Compiler Finds Its Independence! (#517) YellowRoseCx 2023-11-05 08:23:18 -06:00
  • 13f07013ee
    Add special token type wonjun Jang 2023-11-05 23:18:26 +09:00
  • a62468ec4c Merge branch 'master' into concedo_experimental should fix multigpu Concedo 2023-11-05 22:14:40 +08:00
  • bdf16d7a3c aria2 needs to show more info Concedo 2023-11-05 22:13:22 +08:00
  • b9277727a6 Build with make M. Yusuf Sarıgöz 2023-11-05 17:10:54 +03:00
  • e151fa6151
    Fix typo for --alias shortcut from -m to -a Thái Hoàng Tâm 2023-11-05 21:10:10 +07:00
  • 53dca51fd1 Build with make M. Yusuf Sarıgöz 2023-11-05 17:00:35 +03:00
  • 4adb8b9862
    Update convert.py wonjun Jang 2023-11-05 22:41:40 +09:00
  • 3d48f42efc
    llama : mark LLM_ARCH_STARCODER as full offload supported (#3945) b1488 Meng Zhang 2023-11-05 04:40:08 -08:00
  • 32bf7bf61f Editorconfig M. Yusuf Sarıgöz 2023-11-05 15:33:16 +03:00
  • c6b88446e9 Merge branch 'master' into llava-lib M. Yusuf Sarıgöz 2023-11-05 15:25:31 +03:00
  • 52143f799b Editorconfig M. Yusuf Sarıgöz 2023-11-05 15:22:47 +03:00
  • 47d604fa2d fix issues fix-tensor-split-zero slaren 2023-11-05 13:20:22 +01:00
  • 73c0010e18 Merge remote-tracking branch 'origin/master' into fix-tensor-split-zero slaren 2023-11-05 12:42:43 +01:00
  • 7f05c7f33e Fix q4_k dmmv K_QUANTS_PER_ITERATION 1 shader 0cc4m 2023-11-05 12:40:20 +01:00
  • bd7fa3f9e4 Fix matmul k-split bug 0cc4m 2023-11-05 12:24:09 +01:00
  • 28f09beb60
    remove added tokens and check newline token to decide spm or bpe wonjun Jang 2023-11-05 20:22:25 +09:00
  • c41ea36eaa
    cmake : MSVC instruction detection (fixed up #809) (#3923) b1487 Eve 2023-11-05 08:03:09 +00:00
  • a7fac013cf
    ci : use intel sde when ci cpu doesn't support avx512 (#3949) b1486 Eve 2023-11-05 07:46:44 +00:00
  • 781bc54986 Move everything to convert-hf-to-gguf.py Galunid 2023-11-05 08:42:11 +01:00
  • 48ade94538
    cuda : revert CUDA pool stuff (#3944) b1485 slaren 2023-11-05 08:12:13 +01:00
  • 351dcabd3e lite fix Concedo 2023-11-05 14:47:02 +08:00
  • c95937642a Update after #3382 Galunid 2023-11-05 07:40:12 +01:00
  • 05c51f96fe cuda : fix disabling device with --tensor-split 1,0 Jared Van Bortel 2023-11-05 00:56:32 -04:00
  • f2b31451a5 server : allow continue edit on completion mode Jhen 2023-11-05 10:36:19 +08:00
  • faae84ee1d removed c flag in wget Concedo 2023-11-05 10:21:28 +08:00
  • 02595f9d21
    Colabcpp improvements (#512) henk717 2023-11-05 03:19:09 +01:00
  • 5e5be717c3 fix for removing inaccessible backends in gui Concedo 2023-11-05 10:12:12 +08:00
  • a75ea1bf1b
    use intel sde when ci cpu doesn't support avx512 Eve 2023-11-05 00:08:01 +00:00
  • 8917767f56 Merge branch 'master' into stablelm-support Galunid 2023-11-05 01:01:48 +01:00
  • f7de892ee5 Move util to gguf-py/gguf Galunid 2023-11-05 00:43:56 +01:00
  • 087f88cc15 Rename convert-generic -> convert-hf-to-gguf Galunid 2023-11-05 00:37:00 +01:00
  • f28af0d81a
    gguf-py: Support 01.AI Yi models (#3943) Kerfuffle 2023-11-04 16:20:34 -06:00
  • 2120195bb1 Yarn rope for baichuan Galunid 2023-11-04 23:15:41 +01:00
  • f14faacdf2
    feat: mark LLM_ARCH_STARCODER as full offload supported Meng Zhang 2023-11-04 15:11:24 -07:00
  • e64f4de189 Revert "Remove 'old' conversion scripts" - needed for testing Galunid 2023-11-04 23:10:39 +01:00
  • fd30850576 Add big endian support Galunid 2023-11-04 23:01:38 +01:00
  • 3ef358fffd Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)" revert-pool slaren 2023-11-04 21:25:43 +01:00
  • 6b10aa9f0e Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)" slaren 2023-11-04 21:23:48 +01:00
  • 03c9683eb7 Restore support for RWForCausalLM Galunid 2023-11-04 20:43:29 +01:00
  • bc068ab275 gguf-py: Support 01.AI Yi models KerfuffleV2 2023-11-04 12:35:21 -06:00
  • 2b0303add6 CUDA pool is optional now. Oleksii Maryshchenko 2023-11-04 18:41:11 +01:00
  • 1e7088a80b autopick cublas in gui if possible, better layer picking logic Concedo 2023-11-05 01:35:27 +08:00
  • 863166b4c3 Skip GPUs without mem pool support. Oleksii Maryshchenko 2023-11-04 17:50:59 +01:00
  • 81931b2ea7 Multi GPU memory pool access + Check memory pool support of multiple GPUs and main GPU. Oleksii Maryshchenko 2023-11-04 17:29:08 +01:00
  • 56e516240a All memory pool operation are checked during init phase. For CUDA 12+ device properties checked. Oleksii Maryshchenko 2023-11-04 10:25:51 +01:00
  • 063eb05a29
    termux: use clblast in repo rhjdvsgsgks 2023-11-04 04:38:06 +00:00
  • 7a8c0df2e5 Merge branch 'master' into concedo_experimental Concedo 2023-11-04 09:18:28 +08:00
  • 135001abc4 try to make the tunnel more reliable Concedo 2023-11-04 09:18:19 +08:00
  • 38471fbe06 tensor core info better printout (+1 squashed commits) Concedo 2023-11-04 08:37:12 +08:00
  • 05bbadf3a5 Fix edge case when second hex digit is NUL KerfuffleV2 2023-11-03 16:33:39 -06:00
  • f88b198885 llama : fix Vulkan whitelist (#11) cebtenzzre 2023-11-01 09:46:15 -04:00
  • ffd0624be2 Remove this debug code. Adam Treat 2023-10-30 11:38:21 -04:00
  • a5eb001eab Revert the prompt processing on gpu for now. Adam Treat 2023-10-27 18:32:51 -04:00
  • e006d377dd Scale the workgroup count down to allow correct generation for falcon with AMD radeon cards with lower workgroup count limit Adam Treat 2023-10-27 18:32:29 -04:00
  • 89b71278ff llama : decide to disable Vulkan before loading tensors (#7) cebtenzzre 2023-10-27 19:04:26 -04:00
  • 1c17010188 vulkan : fix missing break in matmul selection (#9) cebtenzzre 2023-10-23 12:22:27 -04:00
  • 74ddf0f17d Fix synchronization problem for AMD Radeon with amdvlk driver or windows drivers. Does not have any performance or fidelity effect on other gpu/driver combos I've tested. Adam Treat 2023-10-27 12:05:24 -04:00
  • 8d9efbf97a Lower the workgroup count for some shaders by providing a loop that processes four floats at a time. Adam Treat 2023-10-26 11:48:36 -04:00
  • 752f7ebd61 Remove unused push constant that was giving validation errors. Adam Treat 2023-10-26 13:01:40 -04:00
  • 8400015337 Don't try an allocation on a heap that is smaller than the size we require. Adam Treat 2023-10-26 13:00:53 -04:00
  • cbc0d1af79 kompute : make scripts executable cebtenzzre 2023-10-23 11:46:26 -04:00
  • 21841d3163 kompute : enable kp_logger and make it static (#8) cebtenzzre 2023-10-16 16:51:41 -04:00
  • cc05a602d6 use mat*vec shaders for mat*mat Aaron Miller 2023-10-16 10:00:25 -07:00
  • c1fd64548d attempted speedups 2 Aaron Miller 2023-10-13 13:14:36 -07:00
  • 9bc52ebae3 attempted speedups Aaron Miller 2023-10-13 11:10:02 -07:00
  • 8dc79ac380 clean up vulkan/cpu switch Aaron Miller 2023-10-12 11:46:30 -07:00
  • cd0257ed0d q4_1 mat*mat Aaron Miller 2023-10-12 11:22:31 -07:00
  • 4809890d80 rm commented dbg print Aaron Miller 2023-10-12 10:23:09 -07:00
  • b78a94bc6d q6k mm works Aaron Miller 2023-10-11 17:10:42 -07:00
  • d5741c07a5 use op param epsilon for norms Aaron Miller 2023-10-11 18:40:07 -07:00
  • 3327d84a7f perf: use bigger threadgroups in mm Aaron Miller 2023-10-11 16:02:53 -07:00
  • 46385ee0d5 misc vulkan cleanup Aaron Miller 2023-10-10 21:38:18 -07:00
  • f0cd38b9ad add mat*mat ops Aaron Miller 2023-10-10 21:37:07 -07:00
  • 09d83f0401 Delete TODO now that we have q8_0. Adam Treat 2023-10-05 10:52:04 -04:00
  • 8564f79036 falcon h2d + reenable vulkan Aaron Miller 2023-10-04 21:03:27 -07:00
  • 020b1745a0 vulkan: implement neox mode for rope Aaron Miller 2023-10-04 23:36:24 -07:00
  • ff4212d20f q8 mat*vec Aaron Miller 2023-10-04 21:02:17 -07:00
  • 9db90cbe12 f16 mv broadcasting fix (gqa fix) Aaron Miller 2023-10-04 21:49:55 -07:00
  • 3d850db767 kompute : remove Q6_K from list of supported quant types Cebtenzzre 2023-10-04 16:19:19 -04:00
  • 24a4a5956a kompute : only try to use Vulkan for LLaMA itself Cebtenzzre 2023-10-04 16:16:04 -04:00