Commit graph

  • e68344cb06 Merge branch 'master' into xsn/fix_lora ngxson 2024-07-10 19:52:39 +02:00
  • 1faf7e5be6 do not disable mmap with lora ngxson 2024-07-10 19:51:34 +02:00
  • 40c99abb6c make sure batches are all embed or all non-embed Douglas Hanley 2024-06-24 11:38:30 -05:00
  • dd07a123b7
    Name Migration: Build the deprecation-warning 'main' binary every time (#8404) b3368 Clint Herron 2024-07-10 12:35:18 -04:00
  • 8932135fdb add sqrt and mul ops hongruichen 2024-07-11 00:07:00 +08:00
  • 7ea28a6fac add helper function for binary op hongruichen 2024-07-10 23:39:03 +08:00
  • 9c1f616344 Adjusting 'server' name-deprecation binary to build all the time, similar to the 'main' legacy name binary. HanClinto 2024-07-10 11:10:49 -04:00
  • f4444d992c
    [SYCL] Use multi_ptr to clean up deprecated warnings (#8256) b3367 AidanBeltonS 2024-07-10 16:10:49 +01:00
  • b6f29273f0 add function to get graph from cache hongruichen 2024-07-10 23:00:31 +08:00
  • 7c9e9a22fb
    llama : use F32 precision in Qwen2 attention and no FA Georgi Gerganov 2024-07-10 17:32:55 +03:00
  • 6b2a849d1f
    ggml : move sgemm sources to llamafile subfolder (#8394) b3366 Georgi Gerganov 2024-07-10 15:23:29 +03:00
  • 117f7adbd9
    ggml : remove K_QUANTS_PER_ITERATION (#8306) gg/fix-python-names Georgi Gerganov 2024-07-10 15:23:12 +03:00
  • 0f1a39f343
    ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) b3365 Dibakar Gope 2024-07-10 07:14:51 -05:00
  • 83321c6958
    gguf-py rel pipeline (#8410) M. Yusuf Sarıgöz 2024-07-10 15:12:35 +03:00
  • 80051cfc4d remove unused variables hongruichen 2024-07-10 19:57:47 +08:00
  • b49b501e26 fix sprintf type hongruichen 2024-07-10 19:48:57 +08:00
  • cc61948b1f
    llama : C++20 compatibility for u8 strings (#8408) b3363 Borislav Stanimirov 2024-07-10 14:45:44 +03:00
  • 7a80710d93
    msvc : silence codecvt c++17 deprecation warnings (#8395) b3362 Borislav Stanimirov 2024-07-10 14:40:53 +03:00
  • 3feb574bf0 merge register_rpc_mem into alloc_rpc_mem hongruichen 2024-07-10 19:40:02 +08:00
  • a8be1e6f59
    llama : add assert about missing llama_encode() call (#8400) b3361 fairydreaming 2024-07-10 13:38:58 +02:00
  • e97d3a6c48 fix tensor buffer allocation hongruichen 2024-07-10 11:56:01 +08:00
  • e4dd31ff89
    py : fix converter for internlm2 (#8321) RunningLeon 2024-07-10 19:26:40 +08:00
  • 8f0fad42b9
    py : fix extra space in convert_hf_to_gguf.py (#8407) laik 2024-07-10 19:19:10 +08:00
  • 1e5ecc5cdd Fix rebase Aidan 2024-07-10 12:06:59 +01:00
  • 4a32e6a361 Update submitters Aidan 2024-07-10 11:57:16 +01:00
  • ff137fbbed Bump patch version for release gguf-v0.9.1 M. Yusuf Sarıgöz 2024-07-10 12:39:50 +03:00
  • f6a3321701 Upd gguf-py/readme M. Yusuf Sarıgöz 2024-07-10 12:38:35 +03:00
  • cf94e5dae3 Remove dim default val Aidan 2024-07-03 10:33:42 +01:00
  • e0b8a578ac Update get_pointer ggml-sycl.cpp Aidan 2024-07-03 09:45:58 +01:00
  • 03da9a68cb Use get_multi_ptr Aidan 2024-07-02 14:46:20 +01:00
  • 4fe0861a89
    Merge pull request #9 from ggerganov/sl/fix_fix_lora Xuan Son Nguyen 2024-07-10 10:33:42 +02:00
  • c554997738
    llama : C++20 compatibility for u8 strings Borislav Stanimirov 2024-07-10 09:50:08 +03:00
  • ddd031d2cc Fix space errors in code laik 2024-07-10 14:29:21 +08:00
  • e95c2e1513
    llama : remove the h loop in llama_set_inputs Daniel Bevenius 2024-07-10 07:13:02 +02:00
  • 7c0f58eabb
    Merge 2dd5d1f4b3 into a59f8fdc85 Justine Tunney 2024-07-10 12:58:39 +08:00
  • f47829aa27 Merge branch 'master' into vulkan-build-integration Mason M 2024-07-10 01:02:13 -03:00
  • a2c41b94e7 Modify the deprecation-warning 'main' binary to build every time, instead of only when a legacy binary is present. This is to help users of tutorials and other instruction sets from knowing what to do when the 'main' binary is missing and they are trying to follow instructions. Clint Herron 2024-07-09 23:33:38 -04:00
  • 029deafc3a ggml : Add GGML_USE_SVE macro to disable SVE by default msy-kato 2024-07-10 11:34:17 +09:00
  • ec9e5c7974 remove cpp header map/string in llama.h zhhan 2024-07-09 16:57:07 -07:00
  • 9841fbda7c llama : lora fixes slaren 2024-07-10 02:21:53 +02:00
  • f15167a4c7 cuda : do not use dmmv if the tensor does not have enough cols slaren 2024-07-10 02:21:38 +02:00
  • 3eb1900e5c Skip literal UNUSED token checks jaime-m-p 2024-07-10 00:46:19 +02:00
  • 713665db2e fix types ngxson 2024-07-10 00:36:52 +02:00
  • a59f8fdc85
    Server: Enable setting default sampling parameters via command-line (#8402) b3358 Clint Herron 2024-07-09 18:26:40 -04:00
  • ee2b35c65f conversion: only allow selected models ngxson 2024-07-10 00:23:07 +02:00
  • 775f893aaf Wordsmithing comment HanClinto 2024-07-09 17:37:16 -04:00
  • 6cfa080950 Load server sampling parameters from the server context by default. HanClinto 2024-07-09 16:50:45 -04:00
  • f4c3b96050 llama : add assertion informing about missing llama_encode() call Stanisław Szymczyk 2024-07-09 21:56:35 +02:00
  • c653eb1f1b Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type Dibakar Gope 2024-07-09 19:19:24 +00:00
  • fd560fe680
    Update README.md to fix broken link to docs (#8399) Andy Salerno 2024-07-09 11:58:44 -07:00
  • 0e84ef1aa7 Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels Dibakar Gope 2024-07-09 18:24:40 +00:00
  • 55fbe831ef add customized split functionality, define tensor names set and split by name set zhhan 2024-07-09 11:13:17 -07:00
  • 001c3543f8
    Update README.md to fix broken link to docs Andy Salerno 2024-07-09 10:33:30 -07:00
  • dc7d83e121 add log hongruichen 2024-07-10 00:33:23 +08:00
  • 9add256efe use helper function instead hongruichen 2024-07-10 00:31:13 +08:00
  • a7be0693ba add log hongruichen 2024-07-09 20:35:58 +08:00
  • af869fd636 fix compiling error in debug build hongruichen 2024-07-09 23:21:55 +08:00
  • e500d6135a
    Deprecation warning to assist with migration to new binary names (#8283) b3356 Clint Herron 2024-07-09 11:54:43 -04:00
  • a03e8dd99d
    make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392) b3355 Johannes Gäßler 2024-07-09 17:11:07 +02:00
  • 5b0b8d8cfb
    sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372) b3354 Alberto Cabrera Pérez 2024-07-09 15:03:15 +01:00
  • 274b3fc358 make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE Johannes Gäßler 2024-07-09 13:33:57 +02:00
  • 7d23c8792c
    msvc : silence codecvt c++17 deprecation warnings Borislav Stanimirov 2024-07-09 16:02:51 +03:00
  • d4c15504ee
    ggml : move sgemm sources to llamafile subfolder Georgi Gerganov 2024-07-09 16:01:11 +03:00
  • a7abb78565 Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off Dibakar Gope 2024-07-09 12:56:15 +00:00
  • c2595d0b80 Arm AArch64: remove a redundant comment Dibakar Gope 2024-07-09 12:24:56 +00:00
  • 5f2e3918f6 refactoring ggml_qnn_tensor Hongrui Chen 2024-07-07 23:51:12 +08:00
  • 0a00d6e4b8 Reduced verbosity of comment Alberto Cabrera 2024-07-09 10:47:43 +01:00
  • 9925ca4087
    cmake : allow external ggml (#8370) b3353 Borislav Stanimirov 2024-07-09 11:38:00 +03:00
  • dc4ee89561
    cmake : allow external ggml Borislav Stanimirov 2024-07-08 15:46:28 +03:00
  • 9beb2dda03
    readme : fix typo [no ci] (#8389) daghanerdonmez 2024-07-09 09:16:00 +03:00
  • 24a51bd2bf
    Small typo on Readme daghanerdonmez 2024-07-09 08:32:52 +03:00
  • 7d0e23d72e
    gguf-py : do not use internal numpy types (#7472) compilade 2024-07-09 01:04:49 -04:00
  • aaf7bc89e4 Merge branch 'master' into compilade/gguf-py-fix-old-numpy compilade/gguf-py-fix-old-numpy Francis Couture-Harpin 2024-07-09 00:10:06 -04:00
  • 98edea60bc llama : add UNKNOWN tokens in the special tokens cache Francis Couture-Harpin 2024-07-08 21:23:19 -04:00
  • d4df785868 convert_hf : reduce usages of the UNKNOWN token type Francis Couture-Harpin 2024-07-08 21:09:52 -04:00
  • 3a837ba919 ggml : reading the runtime sve config of the cpu domke 2024-07-09 10:06:51 +09:00
  • c184db74b3 Options to mange token text decoding errors: jaime-m-p 2024-07-09 01:28:56 +02:00
  • 0ab112abdb add multi adaptor hosting zhhan 2024-07-08 16:05:27 -07:00
  • dec64ef793 Compare vocabs jaime-m-p 2024-07-09 01:04:22 +02:00
  • a943b42416 Improve mismatch range localization jaime-m-p 2024-07-09 01:02:44 +02:00
  • 9307c3fd46 Test l/r-strip for more than 4 spaces jaime-m-p 2024-07-09 00:59:29 +02:00
  • e8b3955346 Fix pyparse problems: gcc inline functions jaime-m-p 2024-07-09 00:55:54 +02:00
  • 7fdb6f73e3
    flake.lock: Update (#8342) Georgi Gerganov 2024-07-09 01:36:38 +03:00
  • d6fe269ced llama : fix command-r detokenization Francis Couture-Harpin 2024-07-08 18:13:16 -04:00
  • a130eccef4
    labeler : updated sycl to match docs and code refactor (#8373) Alberto Cabrera Pérez 2024-07-08 21:35:17 +01:00
  • 31a1b0eeaa llama : fix Viking pre-tokenizer regex Francis Couture-Harpin 2024-07-08 16:34:39 -04:00
  • 03d24cae19
    Merge pull request #8 from ngxson/xsn/fix_lora_convert Xuan Son Nguyen 2024-07-08 22:10:57 +02:00
  • 95b3eb057b fix outfile ngxson 2024-07-08 22:05:35 +02:00
  • 802565ca43 fix requirements ngxson 2024-07-08 22:01:23 +02:00
  • d52455f2be add requirements ngxson 2024-07-08 22:00:13 +02:00
  • 7a83f200d3 fix ftype ngxson 2024-07-08 21:55:41 +02:00
  • 6c617e20ef add sanity check ngxson 2024-07-08 21:36:35 +02:00
  • e5f4713d81 rebase on the latest master commit 3fd62a6 and adapt to the new directory structure Dibakar Gope 2024-07-08 17:09:24 +00:00
  • a71a8aca14 Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored. HanClinto 2024-07-08 12:18:34 -04:00
  • c170b0f169
    Merge pull request #4 from ggerganov/master Y.X 2024-07-09 00:14:06 +08:00
  • 0e16188985 add metadata check ngxson 2024-07-08 17:44:14 +02:00
  • 0ff7a5fdc5 labeler : updated sycl to match docs and code refactor Alberto Cabrera 2024-07-08 12:39:46 +01:00
  • 41ced241f2 Merge branch 'master' into xsn/fix_lora ngxson 2024-07-08 17:06:46 +02:00
  • 84288ff9f7 add f16 convert ngxson 2024-07-08 17:05:17 +02:00
  • 712fecba61 no more transpose A ngxson 2024-07-08 16:48:55 +02:00