Commit graph

  • e38781bb3d
    Update clib.json to point to Cyan4973 original xxhash Brian 2024-07-15 21:49:05 +10:00
  • fc690b018e
    docs: fix links in development docs [no ci] (#8481) NikolaiLyssogor 2024-07-15 04:46:39 -07:00
  • 16bdfa42ac
    [SYCL] add concat through dim 1/2 (#8483) b3394 Meng, Hengyu 2024-07-15 19:32:15 +08:00
  • b704448afb Merge branch 'master' into xsn/fix_lora ngxson 2024-07-15 13:22:12 +02:00
  • 3dfda05956
    llama : de-duplicate deepseek2 norm b3393 Georgi Gerganov 2024-07-15 14:10:39 +03:00
  • fc09437496 fix lint nopperl 2024-07-15 12:26:20 +02:00
  • 568110aab5 add chameleon tokenizer tests nopperl 2024-07-15 12:17:04 +02:00
  • f68d092459 remove redundant params ngxson 2024-07-15 12:12:22 +02:00
  • 385c1a8cd4 convert chameleon hf to gguf nopperl 2024-07-15 12:03:07 +02:00
  • 5b18118248 Revert "auto scale" ngxson 2024-07-15 11:48:51 +02:00
  • f1348e25fb
    Merge branch 'ggerganov:master' into embed_files katsu560 2024-07-15 18:46:13 +09:00
  • 42415a4874 auto scale ngxson 2024-07-15 11:41:18 +02:00
  • 703573f608 Merge branch 'master' into xsn/fix_lora ngxson 2024-07-15 11:06:47 +02:00
  • bda62d7999
    Vulkan MMQ Fix (#8479) b3392 0cc4m 2024-07-15 09:38:52 +02:00
  • 87301bdd59 llama : use llm_build_lora_mm in most model graphs Francis Couture-Harpin 2024-07-15 03:23:19 -04:00
  • 0da3fd288f
    llama : valign + remove unused ftype Georgi Gerganov 2024-07-15 10:26:42 +03:00
  • 8956543c09 convert_hf : simplify modify_tensors for InternLM2 Francis Couture-Harpin 2024-07-15 02:35:06 -04:00
  • f32327e2b2 remove multiply declearation of log in unit test hongruichen 2024-07-15 11:19:01 +08:00
  • cd5a7331f7 add cpu backend as cross reference hongruichen 2024-07-15 10:50:33 +08:00
  • 9ecc1196e7 fix format Meng, Hengyu 2024-07-15 10:45:34 +08:00
  • 3c151d9deb add concat through dim 1/2 Meng, Hengyu 2024-07-15 10:39:00 +08:00
  • 4410fd6563 format with clang-format hongruichen 2024-07-15 10:30:57 +08:00
  • c46b4deea9 [unit test] init all tensor by one function hongruichen 2024-07-15 10:23:12 +08:00
  • 090fca7a07
    pydantic : replace uses of __annotations__ with get_type_hints (#8474) compilade 2024-07-14 19:51:21 -04:00
  • 7cda4dd7e9 convert_hf : faster lazy safetensors Francis Couture-Harpin 2024-07-14 18:27:36 -04:00
  • 6c091df5c0 docs: fix links in development docs [no ci] NikolaiLyssogor 2024-07-14 15:34:54 -07:00
  • aaab2419ea
    flake.lock: Update (#8475) Georgi Gerganov 2024-07-14 18:54:02 +03:00
  • 30b40006cc remove unused declarations hongruichen 2024-07-14 23:50:11 +08:00
  • 148ceab70c add log op hongruichen 2024-07-14 22:57:09 +08:00
  • 73cf442e7b
    llama : fix Gemma-2 Query scaling factors (#8473) b3389 Georgi Gerganov 2024-07-14 14:05:09 +03:00
  • cc7a2f61c4 Fix Vulkan op result checker build error 0cc4m 2024-07-14 12:57:49 +02:00
  • 18b2ac169a Fix incoherence by adding missing LOAD_VEC_A parameter 0cc4m 2024-07-14 12:57:13 +02:00
  • f085684038 Add GGML_MUSA in CMake Xiaodong Ye 2024-07-14 15:59:32 +08:00
  • 779c920b88 Add GGML_MUSA in Makefile Xiaodong Ye 2024-07-14 15:59:20 +08:00
  • e236528e76
    gguf_hash.py: Add sha256 (#8470) Brian 2024-07-14 16:47:14 +10:00
  • 4204cab390
    chore : Apply snake case as described in #8305 teleprint-me 2024-07-14 00:34:07 -04:00
  • 1b18688bed
    Merge branch 'ggerganov:master' into gguf-model-template Austin 2024-07-13 23:52:04 -04:00
  • 7b7f749bca
    chore : Add ignore rule for vulkan shader generator teleprint-me 2024-07-13 23:46:05 -04:00
  • fa79495bb4
    llama : fix pre-tokenization of non-special added tokens (#8228) b3387 compilade 2024-07-13 23:35:10 -04:00
  • aadf686779
    chore: Fix compiler warnings, add help text, improve CLI options teleprint-me 2024-07-13 23:17:57 -04:00
  • a364ec78f3 fix UT of concat arthw 2024-07-14 11:07:56 +08:00
  • f89eaa921e pydantic : fix Python 3.9 and 3.10 support compilade/fix-pydantic-example Francis Couture-Harpin 2024-07-13 21:52:45 -04:00
  • 04e9fdeb3c flake.lock: Update github-actions[bot] 2024-07-14 00:20:25 +00:00
  • 6cd47fd0cf
    Apply suggestions from code review Brian 2024-07-14 10:01:05 +10:00
  • eed299f0d2 pydantic : replace uses of __annotations__ with get_type_hints Francis Couture-Harpin 2024-07-13 16:46:26 -04:00
  • e700d37f68 mv softmax to separated file Neo Zhang 2024-07-14 01:02:58 +08:00
  • 34798e9cd7
    Merge 072d7c96c0 into 17eb6aa8a9 Where data meets intelligence 2024-07-13 19:28:40 +03:00
  • 07d457b83f server : handle content array in chat API (#8449) Georgi Gerganov 2024-07-12 14:48:15 +03:00
  • 21825798c2 main : print error on empty input (#8456) Georgi Gerganov 2024-07-12 14:48:04 +03:00
  • 318d950e79 llama : suppress unary minus operator warning (#8448) Daniel Bevenius 2024-07-12 11:05:21 +02:00
  • 0a7d1bf5de server : ensure batches are either all embed or all completion (#8420) Douglas Hanley 2024-07-12 03:14:12 -05:00
  • 3ebd51fcad docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441) Armen Kaleshian 2024-07-12 04:08:19 -04:00
  • 757ae96e5d convert : remove fsep token from GPTRefactForCausalLM (#8237) Jiří Podivín 2024-07-12 10:06:33 +02:00
  • e0916db972 examples : sprintf -> snprintf (#8434) Georgi Gerganov 2024-07-12 10:46:14 +03:00
  • f6786401d2 ggml : minor naming changes (#8433) Georgi Gerganov 2024-07-12 10:46:02 +03:00
  • fa700d1a84 [SYCL] fix the mul_mat_id ut issues (#8427) Chen Xi 2024-07-12 00:52:04 +00:00
  • b4caa00c7c ggml : add NVPL BLAS support (#8329) (#8425) Nicholai Tukanov 2024-07-11 11:49:15 -05:00
  • a5e36a3518 cuda : suppress 'noreturn' warn in no_device_code (#8414) Daniel Bevenius 2024-07-11 17:53:42 +02:00
  • 6a9dcf01ad CUDA: optimize and refactor MMQ (#8416) Johannes Gäßler 2024-07-11 16:47:47 +02:00
  • 8c88cd899b gitignore : deprecated binaries Georgi Gerganov 2024-07-11 11:20:40 +03:00
  • 4e4205aa6f tokenize : add --no-parse-special option (#8423) compilade 2024-07-11 03:41:48 -04:00
  • 2ed5fd58b5 llama : use F32 precision in Qwen2 attention and no FA (#8412) Georgi Gerganov 2024-07-11 10:21:30 +03:00
  • 86ced79ae6 Initialize default slot sampling parameters from the global context. (#8418) Clint Herron 2024-07-10 20:08:17 -04:00
  • 2f027bcb15 Name Migration: Build the deprecation-warning 'main' binary every time (#8404) Clint Herron 2024-07-10 12:35:18 -04:00
  • 35b1aff5cf [SYCL] Use multi_ptr to clean up deprecated warnings (#8256) AidanBeltonS 2024-07-10 16:10:49 +01:00
  • e78fa06f3d ggml : move sgemm sources to llamafile subfolder (#8394) Georgi Gerganov 2024-07-10 15:23:29 +03:00
  • 528f58ff8d ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) Dibakar Gope 2024-07-10 07:14:51 -05:00
  • 04ba8fca3e gguf-py rel pipeline (#8410) M. Yusuf Sarıgöz 2024-07-10 15:12:35 +03:00
  • 224090c64e llama : C++20 compatibility for u8 strings (#8408) Borislav Stanimirov 2024-07-10 14:45:44 +03:00
  • 35f85f71e5 msvc : silence codecvt c++17 deprecation warnings (#8395) Borislav Stanimirov 2024-07-10 14:40:53 +03:00
  • f4e68cd731 llama : add assert about missing llama_encode() call (#8400) fairydreaming 2024-07-10 13:38:58 +02:00
  • 0464524ddd py : fix converter for internlm2 (#8321) RunningLeon 2024-07-10 19:26:40 +08:00
  • eb16c41949 py : fix extra space in convert_hf_to_gguf.py (#8407) laik 2024-07-10 19:19:10 +08:00
  • ae3a78ad34 Server: Enable setting default sampling parameters via command-line (#8402) Clint Herron 2024-07-09 18:26:40 -04:00
  • 8af17465a9 Update README.md to fix broken link to docs (#8399) Andy Salerno 2024-07-09 11:58:44 -07:00
  • 0e6506aeb0 Deprecation warning to assist with migration to new binary names (#8283) Clint Herron 2024-07-09 11:54:43 -04:00
  • c7d621d0da make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392) Johannes Gäßler 2024-07-09 17:11:07 +02:00
  • 5c10e23a80 cmake : allow external ggml (#8370) Borislav Stanimirov 2024-07-09 11:38:00 +03:00
  • 1052802685 readme : fix typo [no ci] (#8389) daghanerdonmez 2024-07-09 09:16:00 +03:00
  • c380b899e5 gguf-py : do not use internal numpy types (#7472) compilade 2024-07-09 01:04:49 -04:00
  • 9ad5bcaad3 flake.lock: Update (#8342) Georgi Gerganov 2024-07-09 01:36:38 +03:00
  • 7a8fa37316 labeler : updated sycl to match docs and code refactor (#8373) Alberto Cabrera Pérez 2024-07-08 21:35:17 +01:00
  • 790e9b2a0e readme : fix web link error [no ci] (#8347) b4b4o 2024-07-08 22:19:24 +08:00
  • a7d7781692 sycl : fix powf call in device code (#8368) Alberto Cabrera Pérez 2024-07-08 14:22:41 +01:00
  • 86d41e6e1c scripts : fix sync for sycl Georgi Gerganov 2024-07-08 13:51:31 +03:00
  • a5038fc736 sync : ggml Georgi Gerganov 2024-07-08 10:39:50 +03:00
  • 8ab505a2e9 tests : fix whitespace (#0) Georgi Gerganov 2024-07-08 10:39:36 +03:00
  • fec49428a6 feat: cuda implementation for ggml_conv_transpose_1d (ggml/854) John Balis 2024-07-02 11:09:52 -05:00
  • 9ff6a62845 common : preallocate sampling token data vector (#8363) Kevin Wang 2024-07-08 03:26:53 -04:00
  • da09d77524 infill : assert prefix/suffix tokens + remove old space logic (#8351) Georgi Gerganov 2024-07-08 09:34:35 +03:00
  • 6e022a225a common : avoid unnecessary logits fetch (#8358) Kevin Wang 2024-07-08 02:31:55 -04:00
  • 68d1711f73 readme : add supported glm models (#8360) toyer 2024-07-08 13:57:19 +08:00
  • df044303f3 py : type-check all Python scripts with Pyright (#8341) compilade 2024-07-07 15:04:39 -04:00
  • b775ea0e75 Update llama-cli documentation (#8315) Denis Spasyuk 2024-07-07 09:08:28 -06:00
  • 9ee7bf007d ci : add checks for cmake,make and ctest in ci/run.sh (#8200) Alex Tuddenham 2024-07-07 15:59:14 +01:00
  • c695235193 readme : update bindings list (#8222) Andy Tai 2024-07-07 06:21:37 -07:00
  • 305b9d8892 gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#8048) Brian 2024-07-07 22:58:43 +10:00
  • bfa07c7003 llama : support glm3 and glm4 (#8031) toyer 2024-07-07 20:52:10 +08:00
  • 155ec5bf82 llama : fix n_rot default (#8348) Georgi Gerganov 2024-07-07 14:59:02 +03:00
  • 78706ed9a8 py : use cpu-only torch in requirements.txt (#8335) compilade 2024-07-07 07:23:38 -04:00