Commit graph

  • 6138963fb2
    build : target Windows 8 for standard mingw-w64 (#4405) b1626 Jared Van Bortel 2023-12-12 04:27:26 -05:00
  • 6391817cd1
    llama : document logits_all deprecation (#4418) b1625 crasm 2023-12-12 04:25:57 -05:00
  • d9d4cfef64
    server : fix local model name in server (#4420) b1624 Vladimir Zorin 2023-12-12 11:25:29 +02:00
  • 41a11aaf99
    ggml : increased GGML_MAX_PARAMS to allow finetuning of 70b models (#4424) b1623 Taikono-Himazin 2023-12-12 18:24:32 +09:00
  • e44f6401ec
    Merge pull request #3 from hodlen/fix/gpu-dependency Jeremy Song 2023-12-12 15:40:29 +08:00
  • 182316ecfd support powerinfer without GPU Yixin Song 2023-12-12 15:40:07 +08:00
  • e4b798a735
    Merge pull request #2 from hodlen/fix/axpy_q4 Jeremy Song 2023-12-12 15:05:31 +08:00
  • c796dd4c90 support axpy q4_0 for loop syx 2023-12-12 15:03:10 +08:00
  • 9d475ef2c5 Increased GGML_MAX_PARAMS to allow finetuning of 70b models Taikono-Himazin 2023-12-12 15:44:26 +09:00
  • 9975f4aaa7
    Merge pull request #1 from hodlen/fix/axpy Jeremy Song 2023-12-12 13:53:16 +08:00
  • 6f997d299a add fall back for axpy mulmat syx 2023-12-12 13:50:25 +08:00
  • a3c295a2ae merge PowerInfer impl from the internal codebase Holden 2023-12-12 11:05:32 +08:00
  • 74acc5441d Revert "Hide hipBLAS (ROCm) if CuBLAS exists - vice versa" Concedo 2023-12-12 10:53:34 +08:00
  • 1a90d4ef62
    Set model field in llama_params Vladimir Zorin 2023-12-12 01:25:33 +02:00
  • a13ba22a3c llama : document logits_all deprecation crasm 2023-12-11 17:53:02 -05:00
  • a81a34add0 cmake : detect host compiler and cuda compiler separately Jared Van Bortel 2023-12-11 17:12:37 -05:00
  • abacb27868 cmake : silence linker check stdout Jared Van Bortel 2023-12-11 17:13:19 -05:00
  • 88781479f1 make : honor NVCC, LLAMA_CUDA_CCBIN, NVCCFLAGS Jared Van Bortel 2023-12-11 16:42:22 -05:00
  • b2a5e70f0d now linking locally mike dupont 2023-12-11 16:15:16 -05:00
  • 93ca80fa3a make editorconfig checker happy Jared Van Bortel 2023-12-11 15:17:07 -05:00
  • 91df2623d7 make : detect host compiler and cuda compiler separately Jared Van Bortel 2023-12-11 15:09:56 -05:00
  • 9b28f3413b make : simplify nvcc flags Jared Van Bortel 2023-12-11 14:14:48 -05:00
  • f1cbfabd64 convert : fix style slaren 2023-12-11 20:02:55 +01:00
  • 7dc75e3923 convert : use 1e6 rope_freq_base for mixtral slaren 2023-12-11 20:00:28 +01:00
  • 296c945de5 cuda : fix mul_mat_id with multi gpu slaren 2023-12-11 16:53:25 +01:00
  • 33e50f1b53 test-backend-ops : disable MOE test with thread sanitizer slaren 2023-12-11 12:27:48 +01:00
  • ffda94c87f test-backend-ops : simplify and disable slow tests to avoid CI timeout slaren 2023-12-11 12:15:31 +01:00
  • 06581f243f perf endpoint lets you monitor if the embedded horde worker has issues Concedo 2023-12-11 16:54:42 +08:00
  • fce971d541 do not build the clblast noavx2 binary if not on windows Concedo 2023-12-11 16:17:10 +08:00
  • 8cbaed1d9a
    llama : fix hard-coded number of experts Georgi Gerganov 2023-12-11 08:55:16 +02:00
  • 2fa63e0421
    Use typos to fix comments and logs. Richard Kiss 2023-12-06 21:30:20 -08:00
  • 4b854d46a4 Hide hipBLAS (ROCm) if CuBLAS exists - vice versa YellowRoseCx 2023-12-10 22:49:35 -06:00
  • b0029815e4 test-backend-ops : fix dequantize block offset slaren 2023-12-11 02:43:52 +01:00
  • 7025832f4d now working v1 mike dupont 2023-12-10 17:54:51 -05:00
  • 8a7b2fa528
    Update README.md (#4388) Yueh-Po Peng 2023-12-11 06:27:38 +08:00
  • ad7fc5450e remove bad cast mike dupont 2023-12-10 17:04:51 -05:00
  • f1380d7897 test-backend-ops : add cpy from f32 -> all types test slaren 2023-12-10 22:58:31 +01:00
  • 54d254bbed test-backend-ops : cleanup, add moe test for batches slaren 2023-12-10 21:52:11 +01:00
  • 9592bc5676 fixing first bug mike dupont 2023-12-10 15:31:08 -05:00
  • fa49f64e28 cmake not working yet mike dupont 2023-12-10 15:07:58 -05:00
  • d239bc94d2 makefile now building and exec crashing mike dupont 2023-12-10 15:07:39 -05:00
  • d739470198 now linking and crashing mike dupont 2023-12-10 15:07:04 -05:00
  • ae2517f767 make : fix missing console.o deps Jared Van Bortel 2023-12-10 14:29:41 -05:00
  • 0ec5fdb5ce main loop finished, starting to debug Leon Ericsson 2023-12-10 20:20:01 +01:00
  • dd2680cf22 build : target Windows 8 for standard mingw-w64 Jared Van Bortel 2023-12-10 14:04:41 -05:00
  • 2f5529e2dc Merge upstream changes, fix conflicts, adapt per-layer kv 0cc4m 2023-12-10 18:16:57 +01:00
  • 0c708c1dca Upload generated file ggml-vulkan-shaders.hpp, remove redundant shaders 0cc4m 2023-12-10 17:56:05 +01:00
  • ff93769cb1 Finish full offloading support, add last remaining ops, fix bugs, remove redundant code 0cc4m 2023-12-10 14:59:08 +01:00
  • e2cf3b7aca
    koboldcpp.sh - The Mamba Multitool (#554) henk717 2023-12-10 14:30:17 +01:00
  • 54ba263410
    test-backend-ops : make experts more evenly probable (test_moe) Georgi Gerganov 2023-12-10 15:27:41 +02:00
  • b0b83dd9e2
    metal : fix ggml_mul_mat_id for F32 Georgi Gerganov 2023-12-10 14:30:38 +02:00
  • 65923a8ede
    convert : determine n_ctx correctly Georgi Gerganov 2023-12-10 14:17:46 +02:00
  • 8614aa736d cuda : fix get_rows when ncols is odd slaren 2023-12-10 13:12:11 +01:00
  • cefebb3660 test-backend-ops : add moe test slaren 2023-12-10 13:11:39 +01:00
  • e640cbe055
    llama : add n_expert and n_expert_used to hparams + change quants Georgi Gerganov 2023-12-10 13:57:54 +02:00
  • d1259b7b35
    llama : do not quantize expert gating tensors Georgi Gerganov 2023-12-10 13:00:13 +02:00
  • 6cfb31f9ea
    metal : add indirect mat-vec kernels for all quantization types Georgi Gerganov 2023-12-10 10:59:13 +02:00
  • 016f9bb55a
    metal : fix ggml_get_rows to work with non-cont src1 Georgi Gerganov 2023-12-10 09:38:21 +02:00
  • da5bbd73a8 linker error mike dupont 2023-12-09 17:46:25 -05:00
  • 0710b0f726 llama : offload missing ffn_moe_silu slaren 2023-12-09 23:29:47 +01:00
  • 62b95f93d0 cuda : support non-contiguous src1 in get_rows slaren 2023-12-09 22:39:34 +01:00
  • 2e4db48291 ggml : update get_rows f16 and q slaren 2023-12-09 22:38:22 +01:00
  • e18f7345a3
    grammar : revert the replacement of llama_token_to_piece with id_to_token (#4396) b1621 Xiang (Kevin) Li 2023-12-09 16:29:27 -05:00
  • a05ec4ae5e [grammar] Revert replacement of llama_token_to_piece with id_to_token Kevin Li 2023-12-09 14:36:44 -05:00
  • ac3f7d8e23 ggml : get_rows : support non-contiguos tensors with gaps, generalize up to 3D slaren 2023-12-09 19:19:03 +01:00
  • 1f3a501e9b working mike dupont 2023-12-09 11:32:20 -05:00
  • 1f522319c5 wip mike dupont 2023-12-09 11:29:16 -05:00
  • 01f81fc506 Merge branch 'common_json' of https://github.com/MaggotHATE/llama.cpp-samplers-order into common_json MaggotHATE 2023-12-09 20:35:58 +05:00
  • faefd46f99 Reworked to be self-contained and more universal MaggotHATE 2023-12-09 20:34:43 +05:00
  • da1d8459be dont forget the code mike dupont 2023-12-09 09:47:55 -05:00
  • 34cf9d6fb6 not executing mike dupont 2023-12-09 09:34:22 -05:00
  • 11937662ef for metacall add the cmake mike dupont 2023-12-09 08:58:10 -05:00
  • 8c5b66eeaa
    metal : reduce the kernel launches for ggml_mul_mat_id Georgi Gerganov 2023-12-09 15:30:34 +02:00
  • 7e2006b0c0
    metal : add/mul/div use general kernel when src1 not cont Georgi Gerganov 2023-12-09 14:24:58 +02:00
  • 06dfde3e94 llama : add basic support for offloading moe with CUDA slaren 2023-12-09 13:21:09 +01:00
  • 2cbcba829f
    metal : add more general support for ggml_get_rows + tests Georgi Gerganov 2023-12-09 14:18:42 +02:00
  • 9064b1ca05
    ggml : fix ggml_get_rows to take into account ne02 / ne11 Georgi Gerganov 2023-12-09 14:04:54 +02:00
  • ee8fb399aa ggml : add n_as argument to ggml_mul_mat_id slaren 2023-12-09 12:42:25 +01:00
  • 7372b62271
    ggml : ggml_get_rows support 2D indexing [n_tokens, n_experts] (cpu only) Georgi Gerganov 2023-12-09 13:18:58 +02:00
  • 8b185b7030
    llama : fix expert weighting in the FFN Georgi Gerganov 2023-12-09 13:01:42 +02:00
  • 7ea36953ba
    llama : first working version Georgi Gerganov 2023-12-09 12:45:15 +02:00
  • af1a096bf8
    llama : fix cur -> cur_expert Georgi Gerganov 2023-12-09 12:07:39 +02:00
  • aedfad120a
    llama : update graph to support MoE Georgi Gerganov 2023-12-09 11:47:40 +02:00
  • 861cd67899
    ggml : sync latest ggml_mul_mat_id Georgi Gerganov 2023-12-09 11:19:46 +02:00
  • a3eefe95a8
    llama : model loading Georgi Gerganov 2023-12-09 11:14:03 +02:00
  • d38e41ee69
    convert : fix n_ff typo Georgi Gerganov 2023-12-09 10:59:37 +02:00
  • dff8cbeb39
    convert : support Mixtral as LLAMA arch Georgi Gerganov 2023-12-09 10:51:58 +02:00
  • f141504938 llama : fix logits_all parameter being ignored crasm 2023-12-09 01:00:13 -05:00
  • 619f399525
    Update README.md Yueh-Po Peng 2023-12-09 12:32:56 +08:00
  • 9ec7eb1c3b nodejs mike dupont 2023-12-08 13:44:39 -05:00
  • 593985dad3 update mike dupont 2023-12-08 13:04:24 -05:00
  • 09a48ec2ae linking, loading, segfaulting mike dupont 2023-12-08 13:01:59 -05:00
  • 3e45608d94
    Merge branch 'ggerganov:master' into common_json MaggotHATE 2023-12-08 19:25:02 +05:00
  • d8350a6a1b Example of using it on server MaggotHATE 2023-12-08 19:24:46 +05:00
  • 7a691522a6 lowvram var defaults Concedo 2023-12-08 21:06:32 +08:00
  • bc38194ef4
    Fix Metal API validation errors Finn Voorhees 2023-12-08 11:44:08 +00:00
  • 7418bca910 up ver Concedo 2023-12-08 19:20:30 +08:00
  • c47bc28488 slight refactor for noscript ui Concedo 2023-12-08 18:35:45 +08:00
  • 998742c48a
    Fix ggml_metal_log on Intel macs Finn Voorhees 2023-12-08 10:35:26 +00:00
  • 7469f202ea use lowvram flag for offload qkv Concedo 2023-12-08 18:16:14 +08:00