Commit graph

  • d44756e8d9 fix concatenation method to avoid invalid UTF8 stringfication ensan-hcl 2023-12-05 00:31:30 +09:00
  • 5c9f90cba1
    swift : fix prompt tokenization logic (#4321) b1608 Miwa / Ensan 2023-12-04 22:43:45 +09:00
  • b881f630ca
    cuda : use mmv kernel for quantum cache ops Georgi Gerganov 2023-12-04 15:41:20 +02:00
  • 4db34a3458 cmake: fix clang build when CUDA is enabled (#4208) Johannes Aalto 2023-12-04 10:44:18 +02:00
  • a5a5839f5c handle accidentally selecting a kcpps file as model instead Concedo 2023-12-04 21:10:42 +08:00
  • a1bf6c09f8
    cuda : add F32 -> Q8_0 copy kernel Georgi Gerganov 2023-12-04 15:08:36 +02:00
  • 39d6b36b7f fix prompt tokenization logic ensan-hcl 2023-12-04 18:47:18 +09:00
  • 0eb36c1e62
    Update ggml.c Judd 2023-12-04 17:45:15 +08:00
  • bcfebf241d
    metal : add F32 -> Q8_0 copy kernel Georgi Gerganov 2023-12-04 10:42:10 +02:00
  • 4fa44e84ad
    grammar-parser : fix typo (#4318) b1607 Ikko Eltociear Ashimine 2023-12-04 16:57:35 +09:00
  • 15d1ce3ba3
    fix typo in grammar-parser.cpp Ikko Eltociear Ashimine 2023-12-04 13:38:48 +09:00
  • 923ef855ad
    Update convert-image-encoder-to-gguf.py John 2023-12-03 20:44:51 +01:00
  • d04ee928a2
    llama : support quantum K cache (wip) Georgi Gerganov 2023-12-03 21:31:05 +02:00
  • cfddff3e7d
    Update convert-image-encoder-to-gguf.py John 2023-12-03 20:33:45 +01:00
  • 2f55ff6d97
    Update convert-image-encoder-to-gguf.py John 2023-12-03 20:23:28 +01:00
  • 66aaac9867
    llama : update session save/load Georgi Gerganov 2023-12-03 21:10:16 +02:00
  • e262947d43
    common : add command-line arg to disable KV cache offloading Georgi Gerganov 2023-12-03 20:31:01 +02:00
  • c07633ff05 first running but the results are bad mike dupont 2023-12-03 13:06:00 -05:00
  • c80b8a2bff
    llama : remove mirrors, perform Device -> Host when partial offload Georgi Gerganov 2023-12-03 19:46:06 +02:00
  • c44bc1ee00
    llama : keep the KV related layers on the device Georgi Gerganov 2023-12-03 19:22:47 +02:00
  • f5f9d9620b
    Didn't mean to push test gbnf kalomaze 2023-12-03 10:55:37 -06:00
  • 245de1fc67 Adjust comment kalomaze 2023-12-03 10:41:56 -06:00
  • 1fa91a4833
    llama : enable offload debug temporarily Georgi Gerganov 2023-12-03 18:36:02 +02:00
  • 3d3e6bd0e4
    llama : offload for rest of the model arches Georgi Gerganov 2023-12-03 17:52:23 +02:00
  • f3dbfb9f60
    llama : offload K shift tensors Georgi Gerganov 2023-12-03 17:43:04 +02:00
  • 986b3da76a
    llama : offload KV cache per-layer Georgi Gerganov 2023-12-03 17:18:15 +02:00
  • c294c78eb7
    Merge branch 'master' into per-layer-kv Georgi Gerganov 2023-12-03 16:18:21 +02:00
  • 8602f5abbd Merge branch 'master' into concedo_experimental Concedo 2023-12-03 22:00:14 +08:00
  • fbbc42827b
    ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (#4308) b1606 Georgi Gerganov 2023-12-03 15:56:35 +02:00
  • ac36aee001 Merge branch 'master' into concedo_experimental Concedo 2023-12-03 21:56:29 +08:00
  • adf3de4f69
    ggml : fix soft max out-of-bounds access (#4307) b1605 Georgi Gerganov 2023-12-03 15:56:22 +02:00
  • 48544cd2ef Revert "Revert "ggml : add ggml_soft_max_ext (#4256)"" Concedo 2023-12-03 21:46:50 +08:00
  • 45331b6a6a
    ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() Georgi Gerganov 2023-12-03 15:46:31 +02:00
  • bfb98ada8a
    ggml : fix soft max out-of-bounds access Georgi Gerganov 2023-12-03 15:37:46 +02:00
  • fd3baee14e
    Update and rename api_like_OAI.sh to api-like-OAI.sh Yazan Agha-Schrader 2023-12-03 14:16:41 +01:00
  • 83e211026d
    Merge branch 'ggerganov:master' into sh-api-like-OAI Yazan Agha-Schrader 2023-12-03 14:11:40 +01:00
  • de454b9ef5 Fix whitespace / formatting kalomaze 2023-12-03 05:43:25 -06:00
  • 281e2bad8c Fix missing logit restoration step (?) kalomaze 2023-12-03 05:25:44 -06:00
  • 2e3b4f6237 Check the full vocab for grammar only if necessary kalomaze 2023-12-03 04:23:14 -06:00
  • 33e171d1e9
    server : fix OpenAI API stop field to be optional (#4299) b1604 Ed Lee 2023-12-03 01:10:43 -08:00
  • 6949b50df5
    py : add grammar to oai like api (#4294) Rickard Edén 2023-12-03 10:03:25 +01:00
  • d7b800b8bc
    llama : pad KV cache size (#4280) b1602 Georgi Gerganov 2023-12-03 10:58:16 +02:00
  • ddda6ddb37 Style fix in sampler_queue MaggotHATE 2023-12-03 13:27:22 +05:00
  • 3fa6726351 More readable samplers input string, fixed help MaggotHATE 2023-12-03 12:46:09 +05:00
  • 6570a2005b token count includes ids Concedo 2023-12-03 15:44:53 +08:00
  • 28a64da531 Use ggml_reshape_3d Galunid 2023-12-03 04:36:46 +01:00
  • 0ca814e544 added minP preset Concedo 2023-12-03 11:18:03 +08:00
  • 502bb38770
    server : fix OpenAI API stop field to be optional Ed Lee 2023-12-02 11:31:53 -08:00
  • 281dca45d4 v1 mike dupont 2023-12-02 13:17:27 -05:00
  • c142c5634a fixed segfault with clblast by reversing commit in issue https://github.com/ggerganov/llama.cpp/issues/4296 Concedo 2023-12-03 00:56:00 +08:00
  • a8e66ef31c Revert "ggml : add ggml_soft_max_ext (#4256)" Concedo 2023-12-03 00:42:01 +08:00
  • c0d083ca25
    Create api_like_OAI.sh Yazan Agha-Schrader 2023-12-02 17:36:22 +01:00
  • e6dc166566 Code style fixes according to review MaggotHATE 2023-12-02 21:06:21 +05:00
  • a829a1ee56 fix for janitorai Concedo 2023-12-02 23:58:41 +08:00
  • ff8adc1196 Fixed code style MaggotHATE 2023-12-02 20:52:56 +05:00
  • e115b5ab3c
    Merge branch 'ggerganov:master' into master MaggotHATE 2023-12-02 20:14:42 +05:00
  • 8d2b4603d7 Revert and rewrite, too many problems and safeguards would be needed MaggotHATE 2023-12-02 18:16:56 +05:00
  • 7870c8292f add grammar to oai like api rickard 2023-12-02 13:22:17 +01:00
  • bd08c8fab3 Rewrote with unordered_map MaggotHATE 2023-12-02 16:20:27 +05:00
  • 12f66eaa1d adjust fragmentation fix Concedo 2023-12-02 15:59:08 +08:00
  • 1c422f45cb more printouts Concedo 2023-12-02 11:48:48 +08:00
  • 89fd914926 模型整理 supermy 2023-12-02 07:56:55 +08:00
  • fe6f5e188d add dark and light color themes Yazan Agha-Schrader 2023-12-01 21:11:25 +01:00
  • 3cb1c348b3
    metal : try to improve batched decoding gg/pad-kv-cache Georgi Gerganov 2023-12-01 21:47:42 +02:00
  • 97f4ec4631
    Merge branch 'ggerganov:master' into server-ui-improvements Yazan Agha-Schrader 2023-12-01 20:38:46 +01:00
  • 45f0415ba9
    Update README.md Yazan Agha-Schrader 2023-12-01 20:38:19 +01:00
  • 3e68df8616
    llama : pad KV cache size to 32 Georgi Gerganov 2023-12-01 10:55:27 +02:00
  • 5a7d3125e7
    llama : avoid using "optional" keyword (#4283) b1601 Georgi Gerganov 2023-12-01 20:39:12 +02:00
  • d5a1cbde60
    llama : support optional tensors (#4283) b1600 Georgi Gerganov 2023-12-01 20:35:03 +02:00
  • b220222a64
    swift : fix token_to_piece implementation (#4278) b1599 Miwa / Ensan 2023-12-02 03:19:45 +09:00
  • 511f52c334
    build : enable libstdc++ assertions for debug builds (#4275) b1598 Jared Van Bortel 2023-12-01 13:18:35 -05:00
  • 03562f3a86
    llama : support attention bias on LLaMA architecture (#4283) b1597 CausalLM 2023-12-02 02:17:06 +08:00
  • 37c746d687
    llama : add Qwen support (#4281) b1596 Shijie 2023-12-02 02:16:31 +08:00
  • b1efaed381
    Update llama.cpp CausalLM 2023-12-02 01:09:17 +08:00
  • e192572d21
    check existence of qkvo bias while loading llama models CausalLM 2023-12-02 00:56:48 +08:00
  • d363df3444 Fixed formatting MaggotHATE 2023-12-01 21:43:21 +05:00
  • 880f57973b
    llama : fix integer overflow during quantization (#4284) b1595 Georgi Gerganov 2023-12-01 18:42:11 +02:00
  • f9ee9dbfbe
    llama : fix integer overflow during quantization Georgi Gerganov 2023-12-01 18:19:57 +02:00
  • 495bb3ab1e Merge branch 'master' into concedo_experimental Concedo 2023-12-01 23:48:20 +08:00
  • 4f40c226a0 Merge branch 'master' into concedo_experimental Concedo 2023-12-01 23:46:59 +08:00
  • 7601a49130 Cleaned commented code MaggotHATE 2023-12-01 20:45:43 +05:00
  • d4dc3d26fc Samplers sequence order w parameter MaggotHATE 2023-12-01 20:35:02 +05:00
  • 4283a889a7 Fix errors ensan-hcl 2023-12-01 23:58:19 +09:00
  • c48679a8e8
    Support attention_bias on LLaMA architecture CausalLM 2023-12-01 21:54:58 +08:00
  • ad04d174f6
    llama : do not GPU split bias tensors Georgi Gerganov 2023-12-01 15:06:23 +02:00
  • 297c26002c pivot to write data mike dupont 2023-12-01 04:50:19 -05:00
  • 8d6d9f033b
    py : add requirements file for convert-hf-to-gguf.py (#4277) Daniel Bevenius 2023-12-01 10:41:56 +01:00
  • 60d80856c0 enable qwen to llama.cpp simonJJJ 2023-12-01 17:23:04 +08:00
  • ef47ec18da
    ggml : add ggml_soft_max_ext (#4256) b1593 Georgi Gerganov 2023-12-01 10:51:24 +02:00
  • eb594c0f7d
    alloc : fix build with debug gg/soft-max-ext Georgi Gerganov 2023-12-01 10:45:54 +02:00
  • d9c8fa3bce
    metal : simplify soft max kernel Georgi Gerganov 2023-12-01 10:31:21 +02:00
  • ce67fb5651
    build: add requirements file for convert-hf-to-gguf.py Daniel Bevenius 2023-12-01 06:16:31 +01:00
  • 67d2bd2466 Fix token_to_piece implementation in Swift ensan 2023-12-01 14:43:00 +09:00
  • 9f8e60f0ee zero logits vector before writing new data Jared Van Bortel 2023-11-30 22:35:41 -05:00
  • 6bd34720a6 dynet mike dupont 2023-11-30 19:36:24 -05:00
  • 5b74310e6e build : enable libstdc++ assertions for debug builds ceb/libstdcpp-assertions Jared Van Bortel 2023-11-30 18:09:23 -05:00
  • 5284e72aa6
    n_vocab -> n_tokens Jared Van Bortel 2023-11-30 17:36:21 -05:00
  • 1d144112c0
    server : add --log-disable to disable logging to file (#4260) b1592 Ziad Ben Hadj-Alouane 2023-11-30 17:25:49 -05:00
  • f43f09366d
    server : add single-client multi-prompt support (#4232) b1591 Ziad Ben Hadj-Alouane 2023-11-30 17:25:04 -05:00
  • d2809a3ba2
    make : fix Apple clang determination bug (#4272) b1590 WillCorticesAI 2023-11-30 17:23:44 -05:00