Commit graph

  • b3f6da3d60 error log messages Jan Boon 2024-04-02 04:10:17 +08:00
  • 3d6fa5bdd7 catch exceptions on save as well Jan Boon 2024-04-02 04:06:23 +08:00
  • f87f7b8986
    flake.lock: Update (#6402) b2585 Georgi Gerganov 2024-04-01 19:05:57 +03:00
  • b3e66f1844 no ncols == 64 Johannes Gäßler 2024-04-01 15:54:50 +02:00
  • dce3e27ba2
    Update llama.cpp - adjustements Nexesenex 2024-04-01 14:11:20 +02:00
  • cfcbc7adf3 Match openchat template with jinja output Kai Zau 2024-04-01 19:35:20 +08:00
  • 1eebfc9f0f Separate deepseek bos from system message Kai Zau 2024-04-01 19:34:31 +08:00
  • 9165380c52 Regenerate chat template test with add_generation_prompt Kai Zau 2024-04-01 19:21:43 +08:00
  • 33a5244806
    compare-llama-bench.py: fix long hexsha args (#6424) Johannes Gäßler 2024-04-01 13:30:43 +02:00
  • 36cad961e1 compare-llama-bench.py: fix long hexsha args Johannes Gäßler 2024-04-01 11:25:29 +02:00
  • 226e819371
    ci: server: verify deps are coherent with the commit (#6409) Pierrick Hymbert 2024-04-01 12:36:40 +02:00
  • 8502a015a5 remove eos zhangkaihuo 2024-04-01 17:00:07 +08:00
  • 0b70ac0f66 ingore unix sockets on windows Adrian Liechti 2024-04-01 10:25:47 +02:00
  • d297225e98 Remove alpaca, match deepseek with jinja output Kai Zau 2024-04-01 16:40:07 +09:00
  • 9ecc666b94 compatible with old and new minicpm versions zhangkaihuo 2024-04-01 15:07:42 +08:00
  • e691f67fbe ci: server: change the ref to build as now it's a pull event target Pierrick HYMBERT 2024-04-01 08:02:42 +02:00
  • 4a4e549748 4 warps, 256 stride for all D Johannes Gäßler 2024-03-31 18:39:02 +02:00
  • 77d29ed78b Unix Socket PoC Adrian Liechti 2024-04-01 00:06:26 +02:00
  • 8c2f7b8169
    Update convert-hf-to-gguf.py slaren 2024-03-31 19:52:46 +02:00
  • 561c8b82dc ci: server: verify deps are coherent with the commit Pierrick HYMBERT 2024-03-31 12:23:52 +02:00
  • 824adc923b adjust kernel selection logic Johannes Gäßler 2024-03-31 16:01:27 +02:00
  • c54c76cb5f TEST dirty Pierrick HYMBERT 2024-03-31 12:33:16 +02:00
  • c4062df51a ci: server: verify deps are coherent with the commit Pierrick HYMBERT 2024-03-31 12:23:52 +02:00
  • c50a82ce0f
    readme : update hot topics b2582 Georgi Gerganov 2024-03-31 11:56:30 +03:00
  • 805d705032
    license : add AUTHORS Georgi Gerganov 2024-03-31 10:17:36 +03:00
  • 095647bf5d kompute: implement op_getrows_f32 woachk 2024-03-31 08:15:45 +02:00
  • 3b3298af17 update convert.py for mixtral hf models slaren 2024-03-31 01:35:10 +01:00
  • 4a5d50eb61 update convert-hf-to-gguf.py slaren 2024-03-31 01:24:05 +01:00
  • e74d82494d flake.lock: Update github-actions[bot] 2024-03-31 00:18:05 +00:00
  • 6203d72651 update convert.py slaren 2024-03-30 23:49:41 +01:00
  • 8af72118ec move sequence state file functionality from server to llama to match session api and add version tags Jan Boon 2024-03-31 03:26:25 +08:00
  • 129b6ffea6 removing a whole sequence never fails Jan Boon 2024-03-31 00:43:47 +08:00
  • b509b8b3de add special Jan Boon 2024-03-30 23:57:38 +08:00
  • ea717f773e cleanup style Jan Boon 2024-03-30 23:39:53 +08:00
  • d38eef468f add cake Jan Boon 2024-03-30 23:23:21 +08:00
  • 2abb6c7225
    Update ggml-metal.m slaren 2024-03-30 11:42:28 +01:00
  • 37e7854c10
    ci: bench: fix Resource not accessible by integration on PR event (#6393) b2581 Pierrick Hymbert 2024-03-30 11:36:07 +01:00
  • a4986dd52e Add separate template name for vicuna-orca Kai Zau 2024-03-30 19:29:27 +09:00
  • f1a3b12ced Add chat template for alpaca Kai Zau 2024-03-30 19:04:49 +09:00
  • 3a44bfecb0 no vec for hs, no hs==256 ncols==32 for Volta Johannes Gäßler 2024-03-30 10:34:09 +01:00
  • ce48a6e4de
    Merge branch 'ggerganov:master' into master kaizau 2024-03-30 16:56:14 +08:00
  • c708544cd6 Add tests for openchat and vicuna chat templates Kai Zau 2024-03-30 17:48:15 +09:00
  • 5305d6822a Combine vicuna chat templates Kai Zau 2024-03-30 17:47:37 +09:00
  • e3794efcec 16 cols for Phi-2 Johannes Gäßler 2024-03-30 09:19:19 +01:00
  • 134a904805 ci: bench: fix Resource not accessible by integration on PR event Pierrick HYMBERT 2024-03-30 07:35:27 +01:00
  • e423aa1adf Add EOS for vicuna templates Kai Zau 2024-03-30 14:54:12 +09:00
  • e0f9d9d732 Add chat template for orca-vicuna Kai Zau 2024-03-30 14:41:43 +09:00
  • e913ac9c38 for new minicpm zhangkaihuo 2024-03-30 10:34:40 +08:00
  • f6104b9b77 Add chat template for vicuna Kai Zau 2024-03-30 11:23:18 +09:00
  • 0d24c6af89 Add chat template test for openchat Kai Zau 2024-03-30 10:52:55 +09:00
  • d19df2c5b9 Add openchat chat template Kai Zau 2024-03-30 09:43:47 +09:00
  • 26c09adce6 fix cuda slaren 2024-03-30 00:10:02 +01:00
  • 325e5efa0d update test-backend-ops slaren 2024-03-29 23:48:10 +01:00
  • 60f685ff7a cleanup Jan Boon 2024-03-30 06:14:33 +08:00
  • 92c468105b add server test case for slot save restore Jan Boon 2024-03-30 06:03:41 +08:00
  • 912a6aa9b1 CUDA: faster FlashAttention, kernel for bs == 1 Johannes Gäßler 2024-03-29 23:02:39 +01:00
  • f2e41b3239 fix return types Jan Boon 2024-03-30 06:01:38 +08:00
  • c342d070c6
    Fedora build update (#6388) Mohammadreza Hendiani 2024-03-30 01:29:56 +03:30
  • f7fc5f6c6f
    split: allow --split-max-size option (#6343) b2579 Xuan Son Nguyen 2024-03-29 22:34:44 +01:00
  • 93db37e274 update metal slaren 2024-03-29 21:19:32 +01:00
  • 5a4972fc28
    reverted back to only the MIT license Mohammadreza Hendiani 2024-03-30 00:33:02 +03:30
  • 2479900a1c minor slaren 2024-03-29 20:41:27 +01:00
  • 9c9fe60f53 update cuda slaren 2024-03-29 20:06:00 +01:00
  • 389ab6125b
    Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions Mohammadreza Hendiani 2024-03-29 22:27:04 +03:30
  • a8c8cecbc7
    Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions Mohammadreza Hendiani 2024-03-29 22:26:21 +03:30
  • 9bb04b9f58
    Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions Mohammadreza Hendiani 2024-03-29 22:25:45 +03:30
  • 29f18c29b4 keep in the size check Jan Boon 2024-03-30 02:49:31 +08:00
  • 0d2213678c unused param Jan Boon 2024-03-30 02:41:28 +08:00
  • 8ab1a17251 handle seq rm return value Jan Boon 2024-03-30 02:39:33 +08:00
  • bf1d4932f8 fix restoring zero cell count Jan Boon 2024-03-30 02:35:33 +08:00
  • a71ec3db7b adjust endpoints to preferred style Jan Boon 2024-03-30 02:19:50 +08:00
  • 0c7e21d7b2 ggml : update mul_mat_id to use the same tensor for all the experts slaren 2024-03-29 19:10:20 +01:00
  • 8b5ae299ec update doc Jan Boon 2024-03-30 02:03:28 +08:00
  • bbcbf47b6d add previous function names back in with DEPRECATED notice Jan Boon 2024-03-30 01:53:51 +08:00
  • 9121b14bb5
    fixed deprecated address Mohammadreza Hendiani 2024-03-29 20:50:06 +03:30
  • 0ca6c4fee7
    fixed deprecated address Mohammadreza Hendiani 2024-03-29 20:49:37 +03:30
  • cbbce703e0
    fixed deprecated address Mohammadreza Hendiani 2024-03-29 20:47:28 +03:30
  • ba0c7c70ab
    Vulkan k-quant mmq and ggml-backend offload functionality (#6155) b2578 0cc4m 2024-03-29 17:29:21 +01:00
  • d48ccf3ad4
    sync : ggml (#6351) Georgi Gerganov 2024-03-29 17:45:46 +02:00
  • bd3d9f1bad cuda : move GGML_CUDA_DMMV constants to dmmv.cuh slaren 2024-03-29 16:01:44 +01:00
  • b7863ab7d8 Remove Vulkan warning 0cc4m 2024-03-29 15:03:53 +01:00
  • 069574775c
    [Model] Add support for xverse (#6301) b2576 hxer7963 2024-03-29 21:37:03 +08:00
  • 7dcd160f4b
    Update llama.cpp slaren 2024-03-29 14:36:27 +01:00
  • cfde806eb9
    ci : fix BGE wget (#6383) Georgi Gerganov 2024-03-29 14:34:28 +02:00
  • ed4be6bb0d
    Update llama.cpp Nexesenex 2024-03-29 13:12:27 +01:00
  • ed3bc3a3bf
    ci : fix BGE wget Georgi Gerganov 2024-03-29 14:04:04 +02:00
  • b910287954
    readme : add project (#6356) zhouwg 2024-03-29 15:33:46 +08:00
  • 8093987090
    cmake : add explicit metal version options (#6370) b2573 Matt Clayton 2024-03-29 03:27:42 -04:00
  • 82cc37084f
    Update CMakeLists.txt Georgi Gerganov 2024-03-29 09:27:36 +02:00
  • 057400a3fd
    llama : remove redundant reshape in build_kv_store (#6369) Daniel Bevenius 2024-03-29 08:23:22 +01:00
  • d907f70b0b
    llama : add assert Georgi Gerganov 2024-03-29 09:22:38 +02:00
  • b75c38166c
    convert : allow conversion of Mistral HF models (#6144) Pedro Cuenca 2024-03-29 08:15:00 +01:00
  • f8707cf38b
    Update README.md zhouwg 2024-03-29 11:22:14 +08:00
  • 9de0a17900
    Merge branch 'ggerganov:master' into add-android-ui-in-toplevel-readme zhouwg 2024-03-29 11:13:00 +08:00
  • f33b6c7b2e convert-hf : add vocab size to metadata Jared Van Bortel 2024-03-28 18:23:42 -04:00
  • 9bf29e6e8b convert-hf : fix duplicated block_count Jared Van Bortel 2024-03-28 18:08:40 -04:00
  • 6dba2de027 convert-hf : small fix for mypy Jared Van Bortel 2024-03-28 18:01:05 -04:00
  • bfe7dafc9c
    readme : add notice for UI list b2570 Georgi Gerganov 2024-03-28 22:56:03 +02:00
  • 4c190ba676
    cuda : reduce registers gg/flash-attn-a Georgi Gerganov 2024-03-28 21:17:08 +02:00
  • 5dd355fe26
    cuda : bump nwarps by 1 Georgi Gerganov 2024-03-28 20:21:09 +02:00