Commit graph

  • fb93f70533 add request aggregation functionality kalab Yibeltal 2024-12-04 11:45:40 -05:00
  • 133d95fae6 Check if the file readable. Robert Ormandi 2024-12-04 10:37:14 -06:00
  • a8046c888a use calloc instead of malloc jg/gguf-refactor Johannes Gäßler 2024-12-04 17:24:35 +01:00
  • 55dc8e1d5a
    Update llama-server-cuda.Dockerfile 流年 2024-12-04 22:44:02 +08:00
  • 284aa27838
    Update llama-server-cuda.Dockerfile 流年 2024-12-04 22:39:13 +08:00
  • 0d6485f0f8 wip [no ci] Xuan Son Nguyen 2024-12-04 15:03:37 +01:00
  • dadbfe9167
    Merge branch 'ggerganov:master' into master Riccardo Orlando 2024-12-04 15:01:16 +01:00
  • 59f4db1088
    ggml : add predefined list of CPU backend variants to build (#10626) b4265 Diego Devesa 2024-12-04 14:45:40 +01:00
  • 2803540814
    ggml-cpu : fix HWCAP2_I8MM value (#10646) Diego Devesa 2024-12-04 14:40:44 +01:00
  • 096b847a0f fix wrong type in print Johannes Gäßler 2024-12-04 14:16:05 +01:00
  • 1011a51b87 move all response types to struct Xuan Son Nguyen 2024-12-04 14:16:01 +01:00
  • d7da7ff8de
    Update llama-server-cuda.Dockerfile 流年 2024-12-04 20:52:24 +08:00
  • b88727009d GGUF: backend support, fixed-width I/O, misc fixes Johannes Gäßler 2024-12-03 21:43:57 +01:00
  • 81611bef72
    server : add tests gg/server-fix-spec-ctx-shift Georgi Gerganov 2024-12-04 13:11:26 +02:00
  • 253b7fde91
    Fix HF repo commit to clone lora test models (#10649) ltoniazzi 2024-12-04 09:45:48 +00:00
  • 8d0cfd554a
    llama: Support MiniCPM-1B (with & w/o longrope) (#10559) b4262 JFLFY2255 2024-12-04 17:42:50 +08:00
  • 92e54fb8d8
    server : fix free of spec context and batch Georgi Gerganov 2024-12-04 11:33:09 +02:00
  • fbe502be60 Fix HF repo commit to clone lora test models ltoniazzi 2024-12-04 09:01:01 +00:00
  • b436edaad9
    server : take into account speculative limits Georgi Gerganov 2024-12-04 10:44:48 +02:00
  • 5cb50e779b Add back workflow ltoniazzi 2024-12-04 08:56:22 +00:00
  • 2759916d86
    vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (#10642) b4261 Jeff Bolz 2024-12-04 01:28:59 -06:00
  • 94d814c559 fix reader on linux isotr0py 2024-12-04 15:25:19 +08:00
  • e52a22d8d8 modify unnecessary calculations lihan 2024-12-04 14:33:55 +08:00
  • bc93d2a44e
    Merge branch 'ggerganov:master' into support_glm_edge_model piDack 2024-12-04 11:50:29 +08:00
  • 6e9fdb0b52 fix chat template liyuhang 2024-12-04 11:27:01 +08:00
  • 40c6d79fb5
    SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584) b4260 Nicolò Scipione 2024-12-04 02:29:20 +01:00
  • 98036d5670
    fix typo of README.md (#10605) Wang Ran (汪然) 2024-12-04 09:22:50 +08:00
  • cd2f37b304
    Avoid using __fp16 on ARM with old nvcc (#10616) b4258 Frankie Robertson 2024-12-04 02:41:37 +02:00
  • da6aac91f1
    Add docs for creating a static build (#10268) (#10630) Benson Wong 2024-12-03 16:40:36 -08:00
  • 88cf9f9f3e ggml-cpu : fix HWCAP2_I8MM value slaren 2024-12-04 01:39:39 +01:00
  • 095f721510
    Update docs/build.md Diego Devesa 2024-12-04 01:27:07 +01:00
  • 01e6d9bb71
    clip : add sycl support (#10574) b4256 piDack 2024-12-04 08:26:37 +08:00
  • 52e69b58a0 Merge remote-tracking branch 'origin/master' into sl/dl-backend-6 slaren 2024-12-04 01:12:25 +01:00
  • 478194b1b6 update CPU dockerfiles slaren 2024-12-04 01:10:37 +01:00
  • 938dbd4d6e server: add OpenAI compatible response format for /completions with backward compatibility Oren Collaco 2024-12-03 16:59:29 -07:00
  • b7d38eef0c server : (refactoring) reduce usage of json internally Xuan Son Nguyen 2024-12-03 23:37:03 +01:00
  • fc3eb4c19a vulkan: Implement "fast divide" (mul+shift) for unary ops like copy Jeff Bolz 2024-12-03 11:12:54 -06:00
  • ad5734c34b Extend how Llama.cpp locates metal resources Robert Ormandi 2024-12-03 15:02:50 -06:00
  • a5a915b51e
    server : fix speculative decoding with context shift Georgi Gerganov 2024-12-03 22:44:19 +02:00
  • 8743d1e45f
    Merge branch 'ggerganov:master' into testing/add-lora-tests-workflow ltoniazzi 2024-12-03 20:00:05 +00:00
  • d0fbd18e2b Remove workflow ltoniazzi 2024-12-03 19:51:08 +00:00
  • cc98896db8
    vulkan: optimize and reenable split_k (#10637) b4255 Jeff Bolz 2024-12-03 13:29:54 -06:00
  • 91c36c269b
    server : (web ui) Various improvements, now use vite as bundler (#10599) b4254 Xuan Son Nguyen 2024-12-03 19:38:44 +01:00
  • 1cd3df46bd scripts : remove amx sync b4253 Georgi Gerganov 2024-12-03 19:42:30 +02:00
  • c505471857 sync : ggml Georgi Gerganov 2024-12-03 19:40:25 +02:00
  • e9e661bd59 CUDA: remove unnecessary warp reduce in FA (ggml/1032) mahorozte 2024-12-03 21:11:43 +08:00
  • efb6ae9630 feat: add GGML_UNARY_OP_ARGMAX Metal kernel (ggml/1019) PAB 2024-12-02 19:27:24 +01:00
  • 667d70d170 metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026) PAB 2024-11-28 09:25:06 +01:00
  • a4a350c6fd
    scripts : remove amx sync Georgi Gerganov 2024-12-03 19:42:30 +02:00
  • b797c8cc1b
    sync : ggml Georgi Gerganov 2024-12-03 19:40:25 +02:00
  • b737407eb5
    CUDA: remove unnecessary warp reduce in FA (ggml/1032) mahorozte 2024-12-03 21:11:43 +08:00
  • c35574ab93
    feat: add GGML_UNARY_OP_ARGMAX Metal kernel (ggml/1019) PAB 2024-12-02 19:27:24 +01:00
  • cc410a9eb7
    metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026) PAB 2024-11-28 09:25:06 +01:00
  • b7ad234517 dot and delta optimization Eve 2024-12-03 11:57:37 -05:00
  • 64a6001a1a update omni audio cmake Te993 2024-12-03 23:41:14 +08:00
  • 7a5a42b99f vulkan: optimize and reenable split_k Jeff Bolz 2024-12-03 08:46:58 -06:00
  • 1f6855faa0 ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemm_q4_0_4x4_q8_0() Adrien Gallouët 2024-12-02 12:06:48 +00:00
  • 3b4f2e33e2
    llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636) b4248 Xuan Son Nguyen 2024-12-03 12:54:30 +01:00
  • 3b409c1e92 add edge template liyuhang 2024-12-03 19:23:11 +08:00
  • e5d812ba78 llama : add missing LLAMA_API for llama_chat_builtin_templates Xuan Son Nguyen 2024-12-03 12:20:09 +01:00
  • 82bca2257b
    readme : add option, update default value, fix formatting (#10271) Nikolaos Pothitos 2024-12-03 12:50:08 +02:00
  • 0115df2f65
    metal : small-batch mat-mul kernels (#10581) b4246 Georgi Gerganov 2024-12-03 11:52:33 +02:00
  • 434fc452c3
    metal : add comments Georgi Gerganov 2024-12-03 11:39:33 +02:00
  • 515d4e5372
    github : minify link [no ci] (revert) Georgi Gerganov 2024-12-03 11:21:43 +02:00
  • 844e2e1fee
    github : minify link [no ci] Georgi Gerganov 2024-12-03 11:20:35 +02:00
  • 70b98fadbc
    server : fix default draft model parameters (#10586) b4243 Georgi Gerganov 2024-12-03 11:20:00 +02:00
  • 11b4d582bc
    server : various params fixes Georgi Gerganov 2024-12-03 11:18:01 +02:00
  • 33d7b70c88
    server : do not speculate during prompt processing gg/server-fix-spec Georgi Gerganov 2024-12-03 10:58:43 +02:00
  • b2958b33dd
    Merge pull request #33 from NexaAI/weili/dev Zack Li 2024-12-02 23:11:24 -08:00
  • b86cdedb7e remove iostream header 李为 2024-12-03 15:03:55 +08:00
  • 07c7ff3e4a Merge branch 'weili/dev' of github.com:NexaAI/llama.cpp into weili/dev 李为 2024-12-03 15:00:34 +08:00
  • ca7e8ef19e fix clip_n_patch() allocation size error for 81-series omni-vlm models 李为 2024-12-03 14:54:52 +08:00
  • 82cbfda7b9
    Merge branch 'master' into support_glm_edge_model piDack 2024-12-03 13:27:23 +08:00
  • be54cb02ff bug fix 李为 2024-12-03 11:47:28 +08:00
  • 97267e60bd
    bug fix in common-nexa.cpp liwiii 2024-12-03 11:36:59 +08:00
  • 71b563ec9a Merge branch 'weili/dev' of github.com:NexaAI/llama.cpp into weili/dev 李为 2024-12-03 11:26:26 +08:00
  • 674dec9b36 fixes slaren 2024-12-03 03:17:32 +01:00
  • 8b25d7cde2 readme : indent commands in lettered list Nikolaos Pothitos 2024-11-17 20:33:11 +02:00
  • defdfb3a63 readme : indent commands under bullets Nikolaos Pothitos 2024-11-13 10:21:54 +02:00
  • b9e82cc07a readme : remove unnecessary indentation Nikolaos Pothitos 2024-11-13 01:06:50 +02:00
  • 1e421ec1ac readme : update default prompt context size Nikolaos Pothitos 2024-11-12 22:44:44 +00:00
  • 905b990be0 readme : document --no-display-prompt Nikolaos Pothitos 2024-11-12 22:05:14 +00:00
  • 828e4f72a7 add space lihan 2024-12-03 09:00:12 +08:00
  • 21393e31ca
    Add notes for a static build Benson Wong 2024-12-02 15:41:10 -08:00
  • 642330ac7c
    llama : add enum for built-in chat templates (#10623) b4242 Xuan Son Nguyen 2024-12-02 22:10:19 +01:00
  • 9d1a46369e update server README Xuan Son Nguyen 2024-12-02 21:35:25 +01:00
  • 86ff8e3929 fix test Xuan Son Nguyen 2024-12-02 21:33:56 +01:00
  • d2beb5ae07 server : handle speculative decoding llama_decode failures Josh Bleecher Snyder 2024-12-02 12:23:26 -08:00
  • 0d44293751
    Merge 98e6651e2f into 8648c52101 Xuan Son Nguyen 2024-12-02 22:04:40 +02:00
  • f325205574
    server : fix draft params Georgi Gerganov 2024-12-02 21:52:39 +02:00
  • c4e9c7c625 ggml : add predefined list of CPU backend variants to build slaren 2024-12-02 17:42:21 +01:00
  • 8648c52101
    make : deprecate (#10514) Georgi Gerganov 2024-12-02 21:22:53 +02:00
  • bf5eb04961
    Update Makefile Georgi Gerganov 2024-12-02 20:32:19 +02:00
  • 5590160cd6
    metal : final adjustments Georgi Gerganov 2024-12-02 20:10:20 +02:00
  • a682809c94 more build.md updates slaren 2024-12-02 18:46:55 +01:00
  • 0ec5b62f27 more build.md updates slaren 2024-12-02 18:34:11 +01:00
  • bd8c5a81ac more build.md updates slaren 2024-12-02 18:29:26 +01:00
  • bbff53ae5b update build.md slaren 2024-12-02 18:09:23 +01:00
  • 886c153c53 basic fix for compare-commits.sh slaren 2024-12-02 17:48:53 +01:00
  • 677ee9ff59
    metal : add rest of types Georgi Gerganov 2024-12-02 11:08:16 +02:00