Commit graph

  • 4d266310f5 flake.lock: Update (#10146) Georgi Gerganov 2024-11-03 15:14:15 +02:00
  • 329ed914c9
    CANN: adjust backend registry refactor. (#10158) b4024 leo-pony 2024-11-04 19:08:22 +08:00
  • abc2e1343f server : clarify /slots endpoint, add is_processing Xuan Son Nguyen 2024-11-04 11:57:13 +01:00
  • 1dc02150bc optimize offsets calculation isotr0py 2024-11-04 16:44:17 +08:00
  • 027b99cc6a Fix compile error for CANN backend: remove buffer face get_name that used in cann as it was removed in backend registry refactor PR. leo-pony 2024-11-04 16:41:41 +08:00
  • ce027adfb3
    sync : ggml b4023 Georgi Gerganov 2024-11-04 10:33:37 +02:00
  • 284e5b0275
    cmake : make it possible linking ggml as external lib (ggml/1003) Yuri Khrustalev 2024-11-02 05:09:12 -04:00
  • e2292aaa17
    metal : fix minor string leaks (ggml/1004) Plamen Minev 2024-11-01 16:55:10 +02:00
  • dd320df4b4 fix mode isotr0py 2024-11-04 16:19:23 +08:00
  • 205676ceb7 fix mode isotr0py 2024-11-04 16:06:46 +08:00
  • 77afcd16a0
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-11-04 12:29:10 +05:00
  • 1419681089 disable <cxxabi.h> for MSC_VER Zack Zhiyuan Li 2024-11-04 05:45:52 +00:00
  • 4f003a705b ggml : optimize llamafile's cpu matrix multiplication for ppc64le using MMA Amrita H S 2024-09-12 00:31:21 -04:00
  • 6f1ed6e5cb Adding #include <io.h> & <fcntl.h> Zack Zhiyuan Li 2024-11-04 04:54:51 +00:00
  • a4747b2edb fix error on windows qwen2-audio/whisper.cpp:9935:38: error: '_O_BINARY' was not declared in this scope Zack Zhiyuan Li 2024-11-04 04:40:41 +00:00
  • 995baefeed Disable cxxabi.h dependency on Windows Zack Zhiyuan Li 2024-11-04 03:48:20 +00:00
  • d277c674ae add omni-vlm examples (C++ & python) 李为 2024-11-04 09:56:33 +08:00
  • 45e7e5b70e cuda : clear error after changing peer access slaren 2024-11-04 00:03:13 +01:00
  • 4bdc70aaac update to C++17 for compilation Zack Zhiyuan Li 2024-11-03 22:07:07 +00:00
  • 9e67ef75b4 remove uneccesary build and rename shared lib Zack Zhiyuan Li 2024-11-03 21:29:09 +00:00
  • 1716e6b25a add some other commands Xuan Son Nguyen 2024-11-03 22:16:14 +01:00
  • 9f40989351
    ggml : move CPU backend to a separate file (#10144) b4020 Diego Devesa 2024-11-03 19:34:08 +01:00
  • f0d1c4fa1c enable qwen2-audio work E2E Zack Zhiyuan Li 2024-11-03 18:33:32 +00:00
  • c7b912bdca support omni-audio Zack Zhiyuan Li 2024-11-03 17:58:08 +00:00
  • b80781bf1e
    Merge c702e55930 into 08828a6d7d Andrei 2024-11-04 00:29:20 +08:00
  • 0825ba2fca revert synchronization change to ggml_init slaren 2024-11-03 16:28:22 +01:00
  • 673f95bd04 restore use of GGML_PRINT_DEBUG in ggml-cpu.c slaren 2024-11-03 16:26:41 +01:00
  • 08828a6d7d
    metal : minor fixup in FA kernel (#10143) b4019 Georgi Gerganov 2024-11-03 15:18:40 +02:00
  • 1839f69130
    flake.lock: Update (#10146) Georgi Gerganov 2024-11-03 15:14:15 +02:00
  • 811aa872d6 wkv6: drop armv9 and tranfer to GGML style Zhiyuan Li 2024-11-03 23:52:25 +11:00
  • 909cfd498c
    metal : remove unused var Georgi Gerganov 2024-11-03 11:24:18 +02:00
  • fd7d5e870d
    metal : use the unrolled loop variable Georgi Gerganov 2024-11-03 10:02:53 +02:00
  • 042c3e0fd3
    Merge branch 'ggerganov:master' into master Zhiyuan Li 2024-11-03 17:30:25 +11:00
  • 1c58096f6f sycl: Enhance OP support judgment Zhiyuan Li 2024-11-03 16:36:42 +11:00
  • bee1cec7d2 sycl: add some ops Zhiyuan Li 2024-11-03 04:55:29 +11:00
  • 2fc42b6a82 wkv on sycl Zhiyuan Li 2024-11-03 01:12:52 +11:00
  • 3f75f12114 rwkv6: rename params Zhiyuan Li 2024-11-02 00:28:58 +11:00
  • e198f7b9df rwkv6: update cuda file name Zhiyuan Li 2024-11-01 20:58:17 +11:00
  • b4254c5550 rwkv6: support avx2 avx512 armv8 armv9 Zhiyuan Li 2024-11-01 16:57:21 +11:00
  • f66c75a495 rwkv6: rename to wkv6 Zhiyuan Li 2024-11-01 16:23:24 +11:00
  • df01a89f82
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-11-03 09:26:03 +05:00
  • a83ac00565
    Merge branch 'ggerganov:master' into avx_opt Eve 2024-11-03 01:53:49 +00:00
  • 6a4c080824 fix potential overflow (performance reduced) Eve 2024-11-02 21:27:35 -04:00
  • 6667edeaec Q8_0 and IQ4_NL, 5-7% faster Eve 2024-11-02 20:44:40 -04:00
  • 1855a062a2 flake.lock: Update github-actions[bot] 2024-11-03 00:37:25 +00:00
  • bf95fffc6f ggml : move CPU backend to a separate file slaren 2024-11-03 01:30:12 +01:00
  • d7a4f3e497 main : add special commands Xuan Son Nguyen 2024-11-03 01:07:27 +01:00
  • 9830b6923b
    Add apple arm to presets (#10134) Christian Köhnenkamp 2024-11-02 23:35:31 +01:00
  • b8d592fe2c split to functions Eve 2024-11-02 18:08:13 -04:00
  • 7de0bdc2db faster with madd Eve 2024-11-02 17:12:23 -04:00
  • 629befc729 revert f16 Eve 2024-11-02 16:58:40 -04:00
  • 1335c78639 256b version, also slow. i tried :) Eve 2024-11-02 16:36:56 -04:00
  • f8dd133ce4 slower f16c version, kep for reference Eve 2024-11-02 16:30:03 -04:00
  • 40e717263e
    metal : minor fixup in FA kernel Georgi Gerganov 2024-11-02 20:41:42 +02:00
  • 66af1a4b4c docker : fix docker locale issue (#6267) Felix 2024-11-02 10:30:23 -07:00
  • 42cadc74bd
    server : fix slot selection by lru (#10126) b4016 sasha0552 2024-11-02 16:34:56 +00:00
  • 45950415ed
    server : fix endpoint checks (#10135) b4015 Georgi Gerganov 2024-11-02 18:34:00 +02:00
  • fffe7e6204 +7% tg +5% pp compared to master Eve 2024-11-02 11:04:58 -04:00
  • 4fc8673d09 llama-bench : skip repeated values in consecutive lines sl/llama-bench-headers slaren 2024-11-02 15:37:33 +01:00
  • 8411453615
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-11-02 19:23:59 +05:00
  • e069375f09 +3% q4_0 inference Eve 2024-11-02 10:18:12 -04:00
  • 1926d6e39d
    llama : adjust default context size + print warnings (#10136) b4014 Georgi Gerganov 2024-11-02 15:18:56 +02:00
  • b634f8a26f
    simple-chat : only add bos on first prompt (#10129) b4013 Diego Devesa 2024-11-02 13:08:53 +01:00
  • 7554aa4655
    convert-lora : make --base optional (#10110) Xuan Son Nguyen 2024-11-02 12:53:17 +01:00
  • b49b9d175a
    ggml-ci : add missing gpu-layers + adjust context sizes Georgi Gerganov 2024-11-02 13:08:32 +02:00
  • 82a17dcb69 trigger ci Xuan Son Nguyen 2024-11-02 11:59:12 +01:00
  • 531279fe46 Add final new line Christian Köhnenkamp 2024-11-02 11:53:33 +01:00
  • 52d537b56d
    llama : adjust default context size + print warnings Georgi Gerganov 2024-11-02 12:35:53 +02:00
  • cd457dce20 ggml-backend : skip register metal backend on os simulator Jhen 2024-11-02 18:28:41 +08:00
  • 844f011d87 add small comment [no ci] Xuan Son Nguyen 2024-11-02 11:08:16 +01:00
  • 915e6a0012
    server : fix endpoint checks Georgi Gerganov 2024-11-02 11:25:45 +02:00
  • ea501b3707 Add apple arm to presets Christian Köhnenkamp 2024-11-02 09:11:17 +01:00
  • e5ce8b412c
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-11-02 09:27:44 +05:00
  • dca0deb3d8 avx bf16 vec dot Eve 2024-11-01 22:21:21 -04:00
  • 4c7195e839 Add user-provided tokenizer/detokenizer functionality and simple example. Ilan Silva 2024-11-01 22:31:15 -03:00
  • 34b9f0de6b double accumulator Eve 2024-11-01 21:27:55 -04:00
  • ad01d31b60 use 128 bit loads (i've tried 256->128 to death and its slower) Eve 2024-11-01 21:06:00 -04:00
  • 20e12112fd llama : suggest reduce ctx size when kv init fails sl/aligned-alloc-no-abort slaren 2024-11-02 00:55:19 +01:00
  • bf60f27cda ggml : do not abort when ggml_aligned_malloc fails slaren 2024-11-02 00:54:16 +01:00
  • 2a229879c2 simple-chat : only add bos on first prompt slaren 2024-11-02 00:29:17 +01:00
  • a6744e43e8
    llama : add simple-chat example (#10124) b4011 Diego Devesa 2024-11-01 23:50:59 +01:00
  • e991e3127f
    llama : use smart pointers for ggml resources (#10117) b4010 Diego Devesa 2024-11-01 23:48:26 +01:00
  • 051bf88153
    Update examples/simple-chat/simple-chat.cpp Diego Devesa 2024-11-01 23:45:16 +01:00
  • 14fa967e46 minor slaren 2024-11-01 23:14:20 +01:00
  • afec6106e1 add to BUILD_TARGETS slaren 2024-11-01 23:07:49 +01:00
  • d113cf2a13 minor slaren 2024-11-01 23:01:33 +01:00
  • 418f5eef26
    vulkan : improve ggml_vk_create_buffer error handling (#9898) b4009 Shupei Fan 2024-11-02 02:33:14 +08:00
  • 714b811624
    minor debug log fix sasha0552 2024-11-01 18:07:06 +00:00
  • aa4277cf97
    server : fix slot selection by lru, migrate lcs to size_t sasha0552 2024-11-01 17:52:07 +00:00
  • ae8b7eb43e
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-11-01 21:51:49 +05:00
  • 587c114b09 llama : add simple-chat example slaren 2024-11-01 16:53:35 +01:00
  • ba6f62eb79
    readme : update hot topics Georgi Gerganov 2024-11-01 17:31:51 +02:00
  • 7d16e1bc8c Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-11-01 11:12:18 -04:00
  • 1f3de86ec2 clarify unspecified --base Xuan Son Nguyen 2024-11-01 15:32:45 +01:00
  • d865d1478c
    server : fix smart selection of available slot (#10120) b4007 sasha0552 2024-11-01 13:33:14 +00:00
  • 2e11ea2abd do not include metadata from base model Xuan Son Nguyen 2024-11-01 13:24:17 +01:00
  • f853c3eacf Revert back reset function MaggotHATE 2024-11-01 17:23:34 +05:00
  • 4e89bdebb0 minor slaren 2024-11-01 12:01:53 +01:00
  • 1804adb0cf
    ggml : remove ggml_scratch (#10121) b4006 Georgi Gerganov 2024-11-01 12:58:45 +02:00
  • 6b0dc8fe67
    replace vectors of tokens with shorthands sasha0552 2024-11-01 10:11:00 +00:00