Commit graph

  • 25a6141a12
    Merge 853dbf17cd into 904837e0cb JohnnyB 2024-09-25 08:09:00 +02:00
  • 904837e0cb
    cann: fix crash when llama-bench is running on multiple cann devices (#9627) b3822 Dou Xinpeng 2024-09-25 11:30:38 +08:00
  • bec83989be compress: format Stéphane du Hamel 2024-09-25 01:26:39 +02:00
  • b9a32f464f compress: Fix missing c_str() Stéphane du Hamel 2024-09-25 01:20:53 +02:00
  • 77dd5d05a5 compress: update comment Stéphane du Hamel 2024-09-25 00:03:39 +02:00
  • bd5b24e8b6 compress: cleanup Stéphane du Hamel 2024-09-24 23:52:09 +02:00
  • 1146007610 compress: fix sampling problem introduced by b0f27361f3 Stéphane du Hamel 2024-09-24 23:52:00 +02:00
  • e02c45c63b examples: add compression example Stéphane du Hamel 2024-09-24 22:24:53 +02:00
  • a843f1fac3 fix(granite): Add missing 'output' tensor for Granite Gabe Goodhart 2024-09-24 10:29:28 -06:00
  • 1c8b3e4c44 fix: Allow "output" layer in granite moe architecture (convert and cpp) Gabe Goodhart 2024-09-23 13:59:54 -06:00
  • 317b15bb60 fix(convert): Sanity check on merged FFN tensor sizes Gabe Goodhart 2024-09-23 13:56:39 -06:00
  • f2360996ca fix(convert): Remove unused tensor name mappings Gabe Goodhart 2024-09-23 12:54:50 -06:00
  • 5eb28c4710 fix(conversion): Simplify tensor name mapping in conversion Gabe Goodhart 2024-09-23 11:03:18 -06:00
  • 71bc4c1f93 Typo fix in docstring Gabe Goodhart 2024-09-23 09:32:23 -06:00
  • eca37cd4f2 feat(granitemoe): Implement granitemoe Gabe Goodhart 2024-09-10 16:35:14 -06:00
  • 014e59d31d fix(granitemoe convert): Split the double-sized input layer into gate and up Gabe Goodhart 2024-09-11 10:03:43 -06:00
  • e0b72290d0 feat(convert_hf_to_gguf): Add GraniteMoeModel Gabe Goodhart 2024-09-10 14:48:30 -06:00
  • 8a4ca2313c feat(gguf-py): Add granitemoe architecture Gabe Goodhart 2024-09-10 14:45:51 -06:00
  • 6aca7b7df0
    Update convert_hf_to_gguf.py Ferdaws 2024-09-24 10:51:12 -05:00
  • 92da9371e3
    Merge c42ec2f8bb into 70392f1f81 Michael Yang 2024-09-24 16:37:54 +03:00
  • 4814bdff53 fix: A crash occurs when llama-bench is running on multiple cann devices(#9250) douxinpeng 2024-09-24 12:44:50 +00:00
  • 8f1cd4ada6 CUDA: Enable FP16_MMA for RDNA3 with rocWMMA Ivan Chikish 2024-09-22 19:58:15 +03:00
  • 9e182a24c8 update gguf-split params parse logic zhenweijin 2024-09-24 16:43:17 +08:00
  • 70392f1f81
    ggml : add AVX512DQ requirement for AVX512 builds (#9622) b3821 Eric Zhang 2024-09-24 16:03:21 +08:00
  • bb5f819975
    sync : ggml b3820 Georgi Gerganov 2024-09-24 11:01:18 +03:00
  • c038931615
    examples : adapt to ggml.h changes (ggml/0) Georgi Gerganov 2024-09-20 21:50:16 +03:00
  • 31ac5834fe
    llama : keep track of all EOG tokens in the vocab (#9609) b3818 Georgi Gerganov 2024-09-24 10:16:06 +03:00
  • cea1486ecf
    log : add CONT level for continuing previous log entry (#9610) b3817 Georgi Gerganov 2024-09-24 10:15:35 +03:00
  • 8dfa9c6c6a
    ggml : add AVX512DQ requirement for AVX512 builds EZForever 2024-09-24 14:28:05 +08:00
  • 0aa15011e3
    server : add newline after chat example (#9616) b3816 StrangeBytesDev 2024-09-23 23:04:39 -07:00
  • b0f27361f3
    sampling : avoid expensive softmax during greedy sampling (#9605) Georgi Gerganov 2024-09-24 09:03:17 +03:00
  • e9e1c20c75
    sampling : add clarifying comment [no ci] Georgi Gerganov 2024-09-24 09:02:54 +03:00
  • c087b6f11d
    threads: fix msvc build without openmp (#9615) b3814 Max Krasnyansky 2024-09-23 21:18:48 -07:00
  • e0abaa0aee make sure params --split and --merge are not specified at same time zhenweijin 2024-09-24 10:29:55 +08:00
  • 46d20e11e2
    Updated clip.cpp Tejaakshaykumar 2024-09-24 07:31:49 +05:30
  • 36d9bbce6b
    Updated examples/llava/clip.cpp Tejaakshaykumar 2024-09-24 07:27:35 +05:30
  • 21ee3806e4 avoid symbol link error zhenweijin 2024-09-24 09:50:46 +08:00
  • 116efee0ee
    cuda: add q8_0->f32 cpy operation (#9571) b3813 Ivan 2024-09-24 03:14:24 +03:00
  • df58f0e649 Add newline after chat example in llama-server StrangeBytesDev 2024-09-23 16:36:08 -07:00
  • d9fa53438f threads: fix msvc build without openmp Max Krasnyansky 2024-09-23 15:25:10 -07:00
  • 0b3bf966f4
    server : add --no-context-shift option (#9607) b3812 Xuan Son Nguyen 2024-09-23 22:23:54 +02:00
  • f0c7b5edf8
    threads: improve ggml_barrier scaling with large number of threads (#9598) b3811 Max Krasnyansky 2024-09-23 11:42:43 -07:00
  • b42f4205be threads: improve ggml_barrier scaling with large number of threads Max Krasnyansky 2024-09-22 18:36:42 -07:00
  • 3de8c69e0d update server documentation Xuan Son Nguyen 2024-09-23 18:13:21 +02:00
  • 1d48e98e4f
    readme : add programmable prompt engine language CLI (#9599) b3810 Riceball LEE 2024-09-23 23:58:17 +08:00
  • f3979df762
    flake.lock: Update (#9586) Georgi Gerganov 2024-09-23 18:43:40 +03:00
  • ff7b2eb8aa
    log : add CONT level for continuing previous log entry Georgi Gerganov 2024-09-23 18:01:18 +03:00
  • a5a11bfbc3
    Update tests/test-sampling.cpp Georgi Gerganov 2024-09-23 17:18:12 +03:00
  • 1e7b9299c6
    ggml : AVX512 gemm for Q4_0_8_8 (#9532) b3808 Srihari-mcw 2024-09-23 19:36:38 +05:30
  • a2393d6f08
    llama : keep track of all EOG tokens in the vocab Georgi Gerganov 2024-09-23 16:57:04 +03:00
  • 770462aace revert usage of GGML_ASSERT Xuan Son Nguyen 2024-09-23 15:20:29 +02:00
  • 448e4a94b8 Update x to start from 0 Srihari-mcw 2024-09-23 05:38:51 -07:00
  • 8941264d7e tests : minor fix Xuan Son Nguyen 2024-09-23 14:28:24 +02:00
  • c643de89cc
    Update examples/server/tests/features/embeddings.feature Xuan Son Nguyen 2024-09-23 14:27:24 +02:00
  • 3a64932e1f small fix Xuan Son Nguyen 2024-09-23 14:26:00 +02:00
  • 9bea433f73
    Update clip.cpp by deleting optional logs Tejaakshaykumar 2024-09-23 17:09:51 +05:30
  • c2e7945bb4 server : add --no-context-shift option Xuan Son Nguyen 2024-09-23 13:35:56 +02:00
  • 407910ffbe
    style : minor adjustments Georgi Gerganov 2024-09-23 13:41:04 +03:00
  • 3cb33a8e29
    speculative : fix default RNG seed + set sparams.n_probs Georgi Gerganov 2024-09-23 12:44:28 +03:00
  • 8241bc71b5
    sampling : avoid expensive softmax during greedy sampling Georgi Gerganov 2024-09-23 12:19:32 +03:00
  • 114ab6347e
    sampling : fix off-by-one in tail-free sampling gg/tfs-ob1 Georgi Gerganov 2024-09-23 11:44:55 +03:00
  • 14d2abb8eb Edit commments Srihari-mcw 2024-09-23 01:33:48 -07:00
  • 3578d09729 keep the minimum min_keep value to 1 in sampling zhenweijin 2024-09-23 16:30:42 +08:00
  • 37f8c7b4c9
    perplexity : remove extra new lines after chunks (#9596) b3807 Georgi Gerganov 2024-09-23 11:28:02 +03:00
  • bf9c1013ac
    metal : use F32 prec for K*Q in vec FA (#9595) b3806 Georgi Gerganov 2024-09-23 11:27:47 +03:00
  • bfb1058d74 llama : introduce anonymous namespace in llama.cpp Daniel Bevenius 2024-09-23 08:44:49 +02:00
  • 7436d52922 Rename functions and rearrange order of macros Srihari-mcw 2024-09-23 00:39:39 -07:00
  • 7aee79bda5 Remove zero vector parameter passing Srihari-mcw 2024-09-22 20:21:10 -07:00
  • 4cc0f3295e
    readme: add offline-ai/cli programmable prompt engine language CLI for llama.cpp server Riceball LEE 2024-09-23 13:49:55 +08:00
  • e62e9789cd
    Revert "[SYCL] fallback mmvq (#9088)" (#9579) b3805 Akarshan Biswas 2024-09-23 08:58:06 +05:30
  • 768c43f852 remove unused fileds to avoid unused filed build error zhenweijin 2024-09-23 10:24:34 +08:00
  • 3b6470f228
    perplexity : remove extra new lines after chunks Georgi Gerganov 2024-09-22 22:26:22 +03:00
  • 5d888c48a3
    metal : use F32 prec for K*Q in vec FA Georgi Gerganov 2024-09-22 21:56:31 +03:00
  • c35e586ea5
    musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526) b3804 R0CKSTAR 2024-09-22 22:55:49 +08:00
  • c7081061a9
    Merge branch 'ggerganov:master' into master Paweł Wodnicki 2024-09-22 09:19:11 -05:00
  • dc785ba37b
    Update README.md Paweł Wodnicki 2024-09-22 09:18:53 -05:00
  • 912c331d3d
    Fix merge error in #9454 (#9589) b3803 Molly Sophia 2024-09-22 21:26:50 +08:00
  • 49639b6fe3 Fix merge error in #9454 Molly Sophia 2024-09-22 20:54:21 +08:00
  • 0fb0b4eab3 mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest) Xiaodong Ye 2024-09-22 19:50:02 +08:00
  • a3ad2c9971 mtgpu: enable unified memory Xiaodong Ye 2024-09-22 12:49:37 +08:00
  • 43ff5f36c2 mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas Xiaodong Ye 2024-09-22 12:47:59 +08:00
  • e40b33dcad mtgpu: add mp_21 support Xiaodong Ye 2024-09-14 08:42:29 +08:00
  • c4d6f343d4 cuda: add q8_0->f32 cpy operation Ivan Chikish 2024-09-20 22:00:55 +03:00
  • a5b57b08ce
    CUDA: enable Gemma FA for HIP/Pascal (#9581) b3802 Johannes Gäßler 2024-09-22 09:34:52 +02:00
  • af0a9faf7f Add basic function calling example using a llama-cli python wrapper Don Mahurin 2024-09-21 23:23:17 -07:00
  • ecd5d6b65b
    llama: remove redundant loop when constructing ubatch (#9574) b3801 Shankar 2024-09-21 19:30:34 -07:00
  • 2a63caaa69
    RWKV v6: RWKV_WKV op CUDA implementation (#9454) b3800 Molly Sophia 2024-09-22 10:29:12 +08:00
  • caeba159da
    Merge branch 'master' into wkv-cuda Molly Sophia 2024-09-22 09:17:11 +08:00
  • db660f5a40 flake.lock: Update github-actions[bot] 2024-09-22 00:22:46 +00:00
  • 0ad9572f8b CUDA: enable Gemma FA for HIP/Pascal Johannes Gäßler 2024-09-21 17:51:58 +02:00
  • 33b692934f
    Revert "[SYCL] fallback mmvq (#9088)" Akarshan Biswas 2024-09-21 20:02:59 +05:30
  • d09770cae7
    ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (#9573) b3799 slaren 2024-09-21 14:24:23 +02:00
  • 3ae8374b59 use 2024.1 arthw 2024-09-21 15:03:29 +08:00
  • 4af076b494 updated context shift error to ERROR_TYPE_INVALID_REQUEST VJHack 2024-09-21 00:35:11 -05:00
  • 9e07236444 llama: remove redundant loop when constructing ubatch shankarg87 2024-09-20 18:50:38 -07:00
  • d9ce02ae82 ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG slaren 2024-09-21 03:37:55 +02:00
  • 41f477879f
    Update CUDA graph on scale change plus clear nodes/params (#9550) b3798 agray3 2024-09-21 01:41:07 +01:00
  • e948a7da7a
    CI: Provide prebuilt windows binary for hip (#9467) b3797 Huang Qi 2024-09-21 08:39:41 +08:00
  • a1e6be1505 CI: Provide prebuilt windows binary for hip Huang Qi 2024-09-05 22:47:25 +08:00
  • 9880e3a069 changed error message wording VJHack 2024-09-20 14:56:03 -05:00