Commit graph

  • bedf37c9d1 server: tests: reducing n_ctx and n_predict for // prompts as it is too slow in the CI. Pierrick HYMBERT 2024-02-23 02:38:37 +01:00
  • 5110de08e3 server: tests: fix coloring console Pierrick HYMBERT 2024-02-23 02:31:44 +01:00
  • 6bba3be151 server: tests: ci adding psmisc as it is not present by default in ubuntu base killall Pierrick HYMBERT 2024-02-23 02:31:30 +01:00
  • 6e71126c12 server: tests: ci adding curl as it is not present by default in ubuntu base for the hf.sh script Pierrick HYMBERT 2024-02-23 02:19:47 +01:00
  • d0e0050843 server: tests: ci adding python3-pip as it is not present by default in ubuntu base Pierrick HYMBERT 2024-02-23 02:16:56 +01:00
  • 2bb4732c01 server: tests: ci adding cmake as it is not present by default in ubuntu base Pierrick HYMBERT 2024-02-23 02:13:30 +01:00
  • 6a215e5359 server: tests: ci adding container to specify server port and allow the server to listen to Pierrick HYMBERT 2024-02-23 02:06:36 +01:00
  • 2f756f84df server: tests: allow to override the server port before launching tests Pierrick HYMBERT 2024-02-23 01:59:29 +01:00
  • 70e90558ae server: tests: add log in server start to identify why the server does not listen on the CI Pierrick HYMBERT 2024-02-23 01:46:08 +01:00
  • b38b9e60a1 server: tests: minor fix server --alias param passed twice Pierrick HYMBERT 2024-02-23 01:31:56 +01:00
  • 14b6ede152 server: tests: minor color change Pierrick HYMBERT 2024-02-23 01:29:39 +01:00
  • 1bd07e56c4 server: tests: assert embeddings are actually computed, make the embeddings endpoint configurable. Add logs to investigate why the CI server test job is not starting Pierrick HYMBERT 2024-02-23 01:25:08 +01:00
  • cba6d4ea17 server: tests: minor fix missing param. Pierrick HYMBERT 2024-02-23 00:54:44 +01:00
  • 51f527440a server: tests: ci triggered on any changes on server example path Pierrick HYMBERT 2024-02-23 00:37:42 +01:00
  • 26b66c5496 server: tests: Fix some random behavior where the wait for busy status is missing Pierrick HYMBERT 2024-02-22 23:38:47 +01:00
  • aa591ef12d server: tests: add Multi users with total number of tokens to predict exceeds the KV Cache size Pierrick HYMBERT 2024-02-22 23:37:56 +01:00
  • f820e10fa7 server: tests: ci ensure the server is stopped before scenario, and do not quit while the server is listening Pierrick HYMBERT 2024-02-22 23:18:42 +01:00
  • 15499eb942
    mpt : do not duplicate token_embd.weight on disk (#5670) b2249 Jared Van Bortel 2024-02-22 17:05:23 -05:00
  • daf88100eb mpt : remove output tensor name to satisfy quantize check Jared Van Bortel 2024-02-22 16:54:31 -05:00
  • fb72b1e05f Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ceb/mpt-tied-output Jared Van Bortel 2024-02-22 16:54:14 -05:00
  • 96633eeca1
    gemma : use more bits for the token_embd.weight tensor (#5650) b2248 Georgi Gerganov 2024-02-22 23:23:46 +02:00
  • 847eedbdb2
    py : add Gemma conversion from HF models (#5647) b2247 Georgi Gerganov 2024-02-22 23:22:48 +02:00
  • fc69c408e8
    Update convert-hf-to-gguf.py Georgi Gerganov 2024-02-22 23:22:32 +02:00
  • 7e4f339c40
    ggml : always define ggml_fp16_t as uint16_t (#5666) b2246 Georgi Gerganov 2024-02-22 23:21:39 +02:00
  • 334f76fa38
    sync : ggml b2245 Georgi Gerganov 2024-02-22 23:21:05 +02:00
  • efd56b1c21
    ggml : 32-bit arm compat (whisper/1891) Georgi Gerganov 2024-02-22 18:31:40 +02:00
  • 8b96bdaf08 Merge remote-tracking branch 'origin/master' into test/server-add-ci-test Pierrick HYMBERT 2024-02-22 22:11:36 +01:00
  • 549fe807fd mpt : do not duplicate token_embd.weight on disk Jared Van Bortel 2024-02-22 16:02:39 -05:00
  • 597c181abb server: tests: ci do not take a model anymore, fix trigger patch Pierrick HYMBERT 2024-02-22 21:58:28 +01:00
  • e43406e36d server: tests: switch to asyncio for concurrent tests, match result content with regex Pierrick HYMBERT 2024-02-22 21:55:00 +01:00
  • 016b221549 server: fix health/slots endpoint slot state access available race condition Pierrick HYMBERT 2024-02-22 21:47:51 +01:00
  • 201294ae17
    nix: init singularity and docker images (#5056) b2243 Someone 2024-02-22 19:44:10 +00:00
  • 5a9e2f60ba
    py : minor fixes (#5668) Georgi Gerganov 2024-02-22 20:13:25 +02:00
  • 373ee3fbba
    Add Gemma chat template (#5665) b2241 Xuan Son Nguyen 2024-02-22 19:10:21 +01:00
  • 19377a3fc4
    ggml : more FP16 -> FP32 conversion fixes Georgi Gerganov 2024-02-22 20:08:54 +02:00
  • dd04b7c480
    ggml : fix q6_K FP16 -> FP32 conversion Georgi Gerganov 2024-02-22 19:59:05 +02:00
  • 56c047156a
    py : minor fixes gg/py-minor-fixes Georgi Gerganov 2024-02-22 19:22:56 +02:00
  • 80196bd76c
    cuda : no longer ggml headers last Georgi Gerganov 2024-02-22 19:10:20 +02:00
  • 1932d614c5
    ggml : cont Georgi Gerganov 2024-02-22 19:02:52 +02:00
  • 0cff93277f
    ggml : cont Georgi Gerganov 2024-02-22 18:53:40 +02:00
  • 9c99ef43d7 small alterations pudepiedj 2024-02-22 16:45:05 +00:00
  • 8b5059c279
    ggml : cont Georgi Gerganov 2024-02-22 18:39:06 +02:00
  • 4cb4d8b22d
    workflows: nix: hardcode cachix ids, build unconditionally (#5663) b2240 Someone 2024-02-22 16:32:09 +00:00
  • edf3b53a5c gemma: only apply system_prompt on non-model message ngxson 2024-02-22 16:48:59 +01:00
  • 57b4c2613b add gemma chat template ngxson 2024-02-22 16:45:46 +01:00
  • 3c5cc30023
    ggml : cont Georgi Gerganov 2024-02-22 17:12:12 +02:00
  • bf74c0fdf2
    ggml : always define ggml_fp16_t as uint16_t Georgi Gerganov 2024-02-22 16:57:22 +02:00
  • 8efaa634ba
    workflows: nix: hardcode cachix ids, build unconditionally Someone Serge 2024-02-22 14:43:17 +00:00
  • 768c339c72
    nix: init singularity and docker images Someone Serge 2024-01-21 03:01:25 +00:00
  • 488bd973a7
    llama : quantize token_embd.weight using output type Georgi Gerganov 2024-02-22 14:42:24 +02:00
  • 3a03541ced
    minor : fix trailing whitespace (#5638) b2239 Georgi Gerganov 2024-02-22 13:54:03 +02:00
  • 41676d9920
    ci : actually no reason to exclude GPU code from triggers Georgi Gerganov 2024-02-22 13:33:00 +02:00
  • a697cd1314
    minor : fix missing new line Georgi Gerganov 2024-02-22 13:29:20 +02:00
  • 7ad7da6af8
    Update convert-hf-to-gguf.py Georgi Gerganov 2024-02-22 11:27:32 +02:00
  • 216386f3f5
    Update convert-hf-to-gguf.py Georgi Gerganov 2024-02-22 11:27:27 +02:00
  • 56d03d92be
    readme : update hot topics Georgi Gerganov 2024-02-22 10:35:54 +02:00
  • a46f50747b
    server : fallback to chatml, add AlphaMonarch chat template (#5628) b2237 Xuan Son Nguyen 2024-02-22 09:33:24 +01:00
  • c5688c6250
    server : clarify some params in the docs (#5640) Alexey Parfenov 2024-02-22 08:27:32 +00:00
  • d0e4fe0bbe Partially revert #4280 which broke parallel WordsHk 2024-02-22 15:46:15 +08:00
  • 4ef245a92a
    mpt : add optional bias tensors (#5638) b2235 Dat Quoc Nguyen 2024-02-22 18:15:13 +10:00
  • 4694edde14 fix #5657: force greedy sampling with probs when temp is 0 Minsoo Cheong 2024-02-22 14:46:19 +09:00
  • a9335a5c2a sample from residual distribution on draft accept failure Minsoo Cheong 2024-02-22 13:50:30 +09:00
  • 973053d8b0
    llama : fix loading models with shared tok_embd and output (#5651) b2234 slaren 2024-02-22 00:42:09 +01:00
  • 7c8bcc11dc
    Add docs for llama_chat_apply_template (#5645) b2233 Xuan Son Nguyen 2024-02-22 00:31:00 +01:00
  • 5271c75666 llama : fix K-shift with quantized K (wip) sl/fix-quant-kv-shift slaren 2024-02-22 00:28:39 +01:00
  • 77370b3a7c llama : fix loading models with shared tok_embd and output slaren 2024-02-22 00:13:12 +01:00
  • 534998dbb9 server: tests: ci tests.sh exit code Pierrick HYMBERT 2024-02-21 23:06:20 +01:00
  • 7fe4678b02
    llama : fix session save/load with quantized KV (#5649) b2232 slaren 2024-02-21 22:52:39 +01:00
  • 01cca6625b server: tests: ci fix model download path Pierrick HYMBERT 2024-02-21 22:43:39 +01:00
  • f181e601a1
    gemma : use Q8_0 for the token_embd.weight tensor Georgi Gerganov 2024-02-21 23:23:17 +02:00
  • ba2135ccae
    gemma : allow offloading the output tensor (#5646) b2231 slaren 2024-02-21 22:18:23 +01:00
  • 483693bd7c llama : fix session save/load with quantized KV slaren 2024-02-21 22:12:45 +01:00
  • 6406208174 server: tests: * start the server at each scenario * split the features as each requires different server config Pierrick HYMBERT 2024-02-21 22:13:37 +01:00
  • 298207185d small changes and threads 64 pudepiedj 2024-02-21 21:10:54 +00:00
  • 83fe714b32
    py : add gemma conversion from HF models Georgi Gerganov 2024-02-21 22:50:10 +02:00
  • 22ca4ddb20 gemma : allow offloading the output tensor slaren 2024-02-21 21:37:54 +01:00
  • e80e291410 fix typo ngxson 2024-02-21 21:22:05 +01:00
  • f6b2e1d8a7 remove TODO ngxson 2024-02-21 21:21:10 +01:00
  • 7c76140a6f add docs for llama_chat_apply_template ngxson 2024-02-21 21:17:24 +01:00
  • f81afc0399
    Fix MSVC compile errors UEXTM.com 2024-02-21 13:49:19 -05:00
  • 68b8d4eb55 Merge remote-tracking branch 'origin/master' into test/server-add-ci-test Pierrick HYMBERT 2024-02-21 18:41:14 +01:00
  • 600cbeb7eb server: test: ci change the GitHub workflow trigger Pierrick HYMBERT 2024-02-21 18:35:21 +01:00
  • 3800bc6c7f Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch pudepiedj 2024-02-21 17:33:06 +00:00
  • 1b04d5907b improved python client lower threadpool pudepiedj 2024-02-21 17:33:03 +00:00
  • 663a4048e4
    server: clarify some params in the docs ZXED 2024-02-21 20:24:31 +03:00
  • 2ab9cb96ed server: only check model template if there is no custom tmpl ngxson 2024-02-21 17:41:04 +01:00
  • 10d86733f3 server: add AlphaMonarch to test chat template ngxson 2024-02-21 17:35:47 +01:00
  • 89febfed93
    examples : do not assume BOS when shifting context (#5622) b2230 Jared Van Bortel 2024-02-21 10:33:54 -05:00
  • c1b6d85031
    Update for MPT with optional bias parameters Dat Quoc Nguyen 2024-02-22 01:33:04 +10:00
  • 6b34d50135 server : fix misplaced n_keep varible definition Jared Van Bortel 2024-02-21 10:32:48 -05:00
  • 5022cf242d
    sync : ggml Georgi Gerganov 2024-02-21 16:52:39 +02:00
  • 1ecea255eb
    server: health: fix race condition on slots data using tasks queue (#5634) b2228 Pierrick Hymbert 2024-02-21 15:47:48 +01:00
  • a00a35cef9
    readme : add LocalAI to the availables UI (#5629) Ettore Di Giacinto 2024-02-21 15:39:10 +01:00
  • eccd7a26dd
    sync : ggml (#5633) b2226 Georgi Gerganov 2024-02-21 16:17:10 +02:00
  • c14f72db9c
    readme : update hot topics Georgi Gerganov 2024-02-21 15:39:54 +02:00
  • cc6cac08e3
    llava : add --skip-unknown to 1.6 convert.py (#5632) Daniel Bevenius 2024-02-21 14:36:57 +01:00
  • ae11fefee4
    sync : ggml Georgi Gerganov 2024-02-21 14:59:31 +02:00
  • 2fe5c166b0
    ggml : compute forward no longer pass src tensors (ggml/729) Georgi Gerganov 2024-02-21 14:59:08 +02:00
  • fd0ccdb601
    Merge branch 'ggerganov:master' into server_branch pudepiedj 2024-02-21 13:28:07 +00:00
  • d73456ac59 server: health: * include_slots only if slots_endpoint * fix compile warning task.target_id not initialized. Pierrick HYMBERT 2024-02-21 14:19:18 +01:00