Commit graph

  • f865ea149d
    server: added more docs for response_fields field (#10995) Isaac McFadyen 2024-12-28 10:09:19 -05:00
  • 16cdce7b68
    server : fix token duplication when streaming with stop strings (#10997) b4394 Alexey Parfenov 2024-12-28 15:08:54 +00:00
  • f9a1cdb3a7
    Merge branch 'ggerganov:master' into master ymcki 2024-12-28 22:10:52 +08:00
  • 970b5ab7ca ggml-cuda : add TQ2_0 support Francis Couture-Harpin 2024-12-27 20:21:28 -05:00
  • b3b30e42fc
    Update mul_mat_vec_q6_k.comp Eve 2024-12-28 00:57:36 +00:00
  • 70676c3584 vulkan: change im2col to 512 elements per workgroup Jeff Bolz 2024-12-27 16:45:35 -06:00
  • 25d7ae429d go even further Eve 2024-12-27 15:44:59 -05:00
  • f833b75175 go all the way Eve 2024-12-27 15:07:34 -05:00
  • a56504fd7b hacky edition Eve 2024-12-27 14:54:08 -05:00
  • 0078ae4e08 seperate threaded read test, slower somehow Eve 2024-12-27 13:20:58 -05:00
  • 158ab15f4b q6_k extract scale Eve 2024-12-26 22:26:04 -05:00
  • b9b2b6371a move can_batch_with check Xuan Son Nguyen 2024-12-27 20:22:49 +01:00
  • 19c0925e97
    server : fix token duplication when streaming with stop strings ZXED 2024-12-27 21:51:56 +03:00
  • 9947b0776f test: force disable cache prompt Xuan Son Nguyen 2024-12-27 18:31:58 +01:00
  • 3930bcb277
    server: added more docs for response_fields field Isaac McFadyen 2024-12-27 10:12:26 -05:00
  • 9d84127fa6 lora per request Xuan Son Nguyen 2024-12-27 16:11:02 +01:00
  • fdb4a2af70 common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON Peter 2024-12-27 12:14:33 +11:00
  • 2ba6efc561 slot.can_batch_with Xuan Son Nguyen 2024-12-27 11:28:25 +01:00
  • fab46ca1ae server/bench: - fix when prometheus not started - wait for server to be ready before starting bench Pierrick HYMBERT 2024-12-27 11:11:14 +01:00
  • 523ebf8cba Simplify tool call grammars when there's only 1 tool ochafik 2024-12-27 02:20:52 +00:00
  • a2fe8a4922 Fix tool-call server tests ochafik 2024-12-27 02:15:43 +00:00
  • 0a5d527508 Update fetch_server_test_models.py ochafik 2024-12-27 00:58:59 +00:00
  • 0e87ae24cd rm trailing spaces ochafik 2024-12-27 00:07:58 +00:00
  • a362c74aa2 profiler: initial support for profiling graph ops graph-profiler Max Krasnyansky 2024-09-25 14:25:13 -07:00
  • f645887e0c Update minja.hpp 202aa2f3de ochafik 2024-12-26 21:36:34 +00:00
  • f0bd69380b Update test-tool-call.cpp ochafik 2024-12-26 21:26:25 +00:00
  • e70ce3f613 Merge remote-tracking branch 'origin/master' into tool-call ochafik 2024-12-26 21:26:21 +00:00
  • 7e8220b596 vulkan: Use push constant offset to handle misaligned descriptors Jeff Bolz 2024-12-26 11:03:16 -06:00
  • d79d8f39b4
    vulkan: multi-row k quants (#10846) b4393 Eve 2024-12-26 10:54:44 -05:00
  • d283d02bf2
    examples, ggml : fix GCC compiler warnings (#10983) b4392 Peter 2024-12-27 00:59:11 +11:00
  • 53534de70c examples, ggml : fix GCC compiler warnings Peter 2024-12-26 23:05:07 +11:00
  • 01b2b28306 better docs Xuan Son Nguyen 2024-12-25 17:03:15 +01:00
  • bd8e8273fa add docs Xuan Son Nguyen 2024-12-25 16:51:57 +01:00
  • 36033990d1 add test Xuan Son Nguyen 2024-12-25 14:41:28 +01:00
  • 90889fddc9 server : add OAI compat for /v1/completions Xuan Son Nguyen 2024-12-25 13:48:02 +01:00
  • 1fccfc9eb6 Removed unnecessary iteration of batch n_tokens on sequence embeddings generation. Emreerdog 2024-12-25 14:13:50 +03:00
  • 9ba399dfa7
    server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) b4391 Reza Kakhki 2024-12-24 21:33:04 +01:00
  • 2cd43f4900
    ggml : more perfo with llamafile tinyblas on x86_64 (#10714) b4390 Djip007 2024-12-24 18:54:49 +01:00
  • 0a7a20c140 Cosine similarity should be orthogonal when both vectors are zero Andy Martinez 2024-12-24 12:04:47 -05:00
  • 2c6864a922 remove not stable test Djip007 2024-12-24 15:01:44 +01:00
  • 09fe2e7613
    server: allow filtering llama server response fields (#10940) b4389 NeverLucky 2024-12-24 19:39:49 +03:00
  • 51dd27f790 improve test Xuan Son Nguyen 2024-12-24 17:00:17 +01:00
  • b8679c0bb5 change to "response_fields" Xuan Son Nguyen 2024-12-24 16:28:44 +01:00
  • 4cf1fef320 clarify docs Xuan Son Nguyen 2024-12-24 16:26:46 +01:00
  • c66b1a7611 fix base64 test Reza Kakhki 2024-12-24 15:56:18 +01:00
  • 0a753fbd1c add support for base64 Reza Kakhki 2024-12-24 15:24:12 +01:00
  • 22924d84f3 - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 Djip007 2024-12-23 00:37:10 +01:00
  • 30caac3a68
    llama : the WPM vocabs use the CLS token as BOS (#10930) b4388 Georgi Gerganov 2024-12-24 09:44:20 +02:00
  • c519ca290e
    llama : add comment Georgi Gerganov 2024-12-24 09:44:02 +02:00
  • 60cfa728e2
    ggml : use wstring for backend search paths (#10960) b4387 Diego Devesa 2024-12-24 04:05:27 +01:00
  • 3327bb0f8d
    ggml : fix arm enabled features check (#10961) b4386 Diego Devesa 2024-12-24 04:05:17 +01:00
  • 32d6ee6385
    ggml : fix const usage in SSE path (#10962) b4385 Diego Devesa 2024-12-23 20:25:52 +01:00
  • 0371d9ad51 ggml : fix const usage in SSE path slaren 2024-12-23 20:14:32 +01:00
  • 2c22d1f63f ggml : fix arm enabled features check slaren 2024-12-23 18:35:53 +01:00
  • 08cdb66490 ggml : use wstring for backend search paths slaren 2024-12-23 18:12:55 +01:00
  • 14b699ecde
    server : fix missing model id in /model endpoint (#10957) b4384 Xuan Son Nguyen 2024-12-23 12:52:25 +01:00
  • 485dc01214
    server : add system_fingerprint to chat/completion (#10917) b4383 Xuan Son Nguyen 2024-12-23 12:02:44 +01:00
  • 2f40ed013b fix ci Xuan Son Nguyen 2024-12-23 11:59:53 +01:00
  • 7a24dbe5e9 server : fix missing model id in /model endpoint Xuan Son Nguyen 2024-12-23 11:50:17 +01:00
  • 86bf31cfe6
    rpc-server : add support for the SYCL backend (#10934) b4382 Radoslav Gerganov 2024-12-23 10:39:30 +02:00
  • d521c0063c
    Merge 82efaafe9d into b92a14a841 Lucas Nogueira 2024-12-23 11:32:21 +08:00
  • b92a14a841
    llama : support InfiniAI Megrez 3b (#10893) b4381 Yun Dou 2024-12-23 08:35:44 +08:00
  • 6f0c9e034b
    llama : support for Llama-3_1-Nemotron-51B (#10669) b4380 ymcki 2024-12-23 08:22:33 +08:00
  • dab76c92cc
    llama-run : include temperature option (#10899) b4379 Eric Curtin 2024-12-23 00:21:40 +00:00
  • 7024d59e6a
    ggml : fix run-time on FreeBSD in get_executable_path() (#10948) b4378 yuri@FreeBSD 2024-12-22 16:20:11 -08:00
  • 5887497cdc
    Update ggml/src/ggml-backend-reg.cpp Diego Devesa 2024-12-23 00:59:00 +01:00
  • a68c7eeec5 Fix run-time on FreeBSD in get_executable_path() Yuri Victorovich 2024-12-22 15:52:33 -08:00
  • ac2b53c564 sgemm: add M blocs. Djip007 2024-12-19 01:21:09 +01:00
  • 6a4805f8a0
    Merge branch 'ggerganov:master' into master ymcki 2024-12-23 07:30:47 +08:00
  • d732874114 tinyblas dynamic dispaching Djip007 2024-12-14 14:10:28 +01:00
  • 3f2bc659e7 more perfo with llamafile tinyblas on x86_64. Djip007 2024-12-07 20:49:49 +01:00
  • 7c0e285858
    devops : add docker-multi-stage builds (#10832) Rudi Servo 2024-12-22 21:22:58 -01:00
  • 7ae33a616f
    llama : add Falcon3 support (#10883) b4376 Billel Mokeddem 2024-12-23 01:09:58 +03:00
  • e7623e5a38 added docker-multi-stage builds Rudi Servo 2024-12-14 13:48:05 -01:00
  • 64d8687e22 Refactoring Billel Mokeddem 2024-12-22 20:39:40 +00:00
  • a1f146dba1 Fix handling pre-normalized tokens Billel Mokeddem 2024-12-22 20:12:46 +00:00
  • cc76ffb3fd feat: server web ui - set usecompression default value to false Minwoo 2024-12-23 01:37:55 +09:00
  • 9194cbd718 Revert "build: production web ui zipped with compression feature" Minwoo 2024-12-23 01:36:25 +09:00
  • 3d3c6bae46 fix nvrxq 2024-12-22 19:18:54 +03:00
  • 0958ee96ac params fixes nvrxq 2024-12-22 19:16:28 +03:00
  • 2c6043670e Merge remote-tracking branch 'upstream/master' into llama_server_response_fields nvrxq 2024-12-22 18:59:40 +03:00
  • bc09b1acdf llama_server_response_fields_fix_issues nvrxq 2024-12-22 18:57:55 +03:00
  • 94e7d24e9d build: production web ui zipped with compression feature Minwoo 2024-12-22 23:57:02 +09:00
  • b3c5268d2e feat: compression for web ui local storage Minwoo 2024-12-22 23:54:55 +09:00
  • 03a44d8c60 build: add pako to web ui Minwoo 2024-12-22 23:50:27 +09:00
  • e68c76d141
    Merge branch 'ggerganov:master' into master ymcki 2024-12-22 21:18:16 +08:00
  • ebdee9478c
    vulkan: build fixes for 32b (#10927) b4375 Jeff Bolz 2024-12-22 03:44:01 -06:00
  • 01a0c36e04 Fix tokenizer_clean_spaces for megrez dixyes 2024-12-22 14:50:59 +08:00
  • e52a0f28e7 vulkan: increase small tile size for NV_coopmat2 Jeff Bolz 2024-12-21 22:36:56 -06:00
  • 26252831ac vulkan: optimize im2col, more elements per thread Jeff Bolz 2024-12-21 16:28:05 -06:00
  • a3aea0801c
    rm_kq=2 by default Eve 2024-12-22 02:58:33 +00:00
  • 643e5e8aea move comments after bracket to its own line Yee Man Chan 2024-12-22 10:18:44 +08:00
  • 12aded6c37
    Merge branch 'ggerganov:master' into master ymcki 2024-12-22 10:16:20 +08:00
  • 207449810e tests: Add im2col perf tests Jeff Bolz 2024-12-21 16:11:22 -06:00
  • a04db23fa7 vulkan: initialize some buffer/offset variables Jeff Bolz 2024-12-21 14:09:41 -06:00
  • d02e63b64d
    server : set default top-k to 1 in the web ui Georgi Gerganov 2024-12-21 12:20:16 +02:00
  • cb1215354b rpc-server : add support for the SYCL backend Radoslav Gerganov 2024-12-21 11:16:42 +02:00
  • 9d5c711587
    llama : the WPM vocabs use the CLS token as BOS Georgi Gerganov 2024-12-21 10:22:04 +02:00
  • 5cd85b5e00
    convert : add BertForMaskedLM (#10919) Georgi Gerganov 2024-12-21 10:10:18 +02:00
  • a91a41364b
    vulkan: optimize coopmat2 dequant functions (#10855) Jeff Bolz 2024-12-21 01:04:45 -06:00