Commit graph

  • 534e96ca56 Replaced tabs with spaces. juk 2024-11-29 13:52:18 +00:00
  • 4b3242bbea
    ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (#10580) b4221 Shupei Fan 2024-11-29 21:49:02 +08:00
  • a7e15b0355 [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend nscipione 2024-11-29 11:29:00 +00:00
  • 1016cc6e05 Added all tests. Fixed bug with tmpl == "llama2" test. juk 2024-11-29 13:29:15 +00:00
  • 58b071a833 ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 Shupei Fan 2024-11-29 21:17:50 +08:00
  • d0bb89ece9 add missing linceses slaren 2024-11-29 14:01:30 +01:00
  • 9864e0d224 Merge branch 'master' into xsn/server_more_tests Xuan Son Nguyen 2024-11-29 13:44:13 +01:00
  • 0f77aae560
    sycl : offload of get_rows set to 0 (#10432) b4220 Alberto Cabrera Pérez 2024-11-29 12:38:45 +00:00
  • b5f23c9e4c
    Fix missing semicolon Akarshan Biswas 2024-11-29 18:05:00 +05:30
  • 22d4166ed6
    Switched to GGML_LOG Akarshan Biswas 2024-11-29 17:58:04 +05:30
  • 1efe1a1d3a sort list alphabetically slaren 2024-11-29 13:19:45 +01:00
  • be88ce7f81 cleanup UI link list slaren 2024-11-29 13:11:47 +01:00
  • 61d34f1911 fixed broken link l3utterfly 2024-11-29 20:38:37 +09:00
  • 29d7b459b0 Added template code and test for mistral-v7 juk 2024-11-29 11:12:12 +00:00
  • fac034530f update to keep up stream changes HimariO 2024-11-11 23:26:23 +08:00
  • 07553cfb0f update llama_hparams HimariO 2024-11-10 16:10:29 +08:00
  • 241bb45714 fix rope op mode switching, out dated func args HimariO 2024-11-04 22:11:13 +08:00
  • f1fa60f84c add GGML_ROPE_TYPE_MROPE, GGML_ROPE_TYPE_VISION HimariO 2024-10-31 00:34:29 +08:00
  • 201f7043c3 add fp16 support for qwen2vl and m-rope HimariO 2024-10-30 19:03:26 +08:00
  • 3237bb4614 add fp32 mrope, vision rope kernel HimariO 2024-10-29 01:16:21 +08:00
  • 0882f57612 cuda-gdb cmake preset HimariO 2024-10-27 13:38:22 +08:00
  • 53480d2bdb replace variable size array with vector HimariO 2024-10-21 21:34:24 +08:00
  • 3d19dd44b6 add arg parser to qwen2vl_surgery HimariO 2024-10-21 02:28:19 +08:00
  • 023f0076e0 correcting vision-rope behavior, add the missing last layer back to ViT HimariO 2024-10-20 21:42:53 +08:00
  • bcd49f5984 [WIP] create inference workflow, gguf convert script but fix HimariO 2024-10-18 19:01:02 +08:00
  • 7e9fc7202e make batch and clip utils compatible with qwen2vl HimariO 2024-10-18 18:59:47 +08:00
  • c13edfed59 [WIP] qwen2vl vision model HimariO 2024-10-10 22:30:42 +08:00
  • 3c3691e10f update 5D tensor op workaround HimariO 2024-10-02 20:53:56 +08:00
  • f661483ea7 update qwen2vl cli tool HimariO 2024-10-01 23:25:06 +08:00
  • 9d389a051b Add vl-rope/2d-rope support for qwen2vl ViT HimariO 2024-09-30 22:30:02 +08:00
  • 35411963d2 Verify m-rope output HimariO 2024-09-30 02:23:08 +08:00
  • b24bd89e77 [WIP] add qwen2vl arch HimariO 2024-09-26 00:45:08 +08:00
  • 7c6f793492 Add Qwen2VL cli entrypoint HimariO 2024-09-22 23:25:33 +08:00
  • c17546fffa Barebone Qwen2VL LLM convertor HimariO 2024-09-21 17:01:34 +08:00
  • 266b8519ee
    sycl : Reroute permuted mul_mats through oneMKL (#10408) b4219 Alberto Cabrera Pérez 2024-11-29 09:49:43 +00:00
  • f3617eecfe clip add sycl support piDack 2024-11-29 08:56:40 +00:00
  • 692880535a remove unused AutoTokenizer piDack 2024-11-29 08:15:15 +00:00
  • 816d93db75 Merge branch 'master' of https://github.com/piDack/llama.cpp into support_glm_edge_model piDack 2024-11-29 08:11:31 +00:00
  • 938f608742
    CANN: RoPE operator optimization (#10563) b4218 Chenguang Li 2024-11-29 14:46:55 +08:00
  • f095a649ec
    vulkan: get the first command buffer submitted sooner (#10499) b4217 Jeff Bolz 2024-11-29 00:18:02 -06:00
  • 6c50e9caca llava.cpp trailing whitespace piDack 2024-11-29 06:11:12 +00:00
  • 7d80a4aa97 fix format piDack 2024-11-29 06:05:05 +00:00
  • 55a6f951ca remove debug info piDack 2024-11-29 05:49:13 +00:00
  • ce32516f99 fix lhpqaq 2024-11-29 10:56:43 +08:00
  • c98e9a7160 update readme lhpqaq 2024-11-29 10:45:49 +08:00
  • f766ae125f remove space lhpqaq 2024-11-29 10:28:14 +08:00
  • fb10521514 add timings lhpqaq 2024-11-29 10:20:58 +08:00
  • fded99769e Removed tab-indented lines juk 2024-11-29 01:29:54 +00:00
  • 94944cc9bb Invalid system_message instead of content fixed juk 2024-11-29 01:13:28 +00:00
  • f55c434eb4 Changed system message logic and added tests for all 4 juk 2024-11-29 00:40:10 +00:00
  • 678d7994f4
    llava: return false instead of exit (#10546) b4216 Ting Lou 2024-11-29 08:09:46 +08:00
  • dbbde92374 Templates: mistral-v1, mistral-v2, mistral-v3, mistral-v3-tekken juk 2024-11-29 00:05:31 +00:00
  • f4898e16b5 ggml : move AMX to the CPU backend slaren 2024-11-29 00:33:51 +01:00
  • 2bca812230 force 16 sequential threads per block Eve 2024-11-28 17:38:12 -05:00
  • 97e0c686a3 subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45) Eve 2024-11-28 16:31:36 -05:00
  • 31a1d8afc0 Merge https://github.com/ggerganov/llama.cpp into vulkan Eve 2024-11-28 15:50:29 -05:00
  • 6893f3ac5d llama: Add generic abort to token_decode_internal kingbri 2024-11-28 15:42:22 -05:00
  • dc22344088
    ggml : remove redundant copyright notice + update authors b4215 Georgi Gerganov 2024-11-28 20:46:40 +02:00
  • 4c0a95b107
    llama : add missing model types b4214 Georgi Gerganov 2024-11-28 20:45:07 +02:00
  • 6c59567689
    server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568) Xuan Son Nguyen 2024-11-28 19:17:49 +01:00
  • 890719311b
    common: fix warning message when no GPU found (#10564) b4212 Johannes Gäßler 2024-11-28 18:15:25 +01:00
  • 8a349b31d3 bump openai to 1.55.3 Xuan Son Nguyen 2024-11-28 18:05:09 +01:00
  • e59bed91af test: bump openai to 1.55.2 Xuan Son Nguyen 2024-11-28 17:17:07 +01:00
  • 879c5ebd25 add invalid cases Xuan Son Nguyen 2024-11-28 17:07:51 +01:00
  • 7439ba7b11 server : (tests) don't use thread for capturing stdout/stderr Xuan Son Nguyen 2024-11-28 16:47:36 +01:00
  • 7281cf13ad
    docs: fix outdated usage of llama-simple (#10565) b4211 Random Fly 2024-11-28 23:03:11 +08:00
  • e90688edd0
    ci : fix tag name in cuda and hip releases (#10566) b4210 Diego Devesa 2024-11-28 15:58:54 +01:00
  • db0809e0d6 ci : fix tag name in cuda and hip releases slaren 2024-11-28 15:30:15 +01:00
  • 8aaf69a3ee add test speculative Xuan Son Nguyen 2024-11-28 15:15:02 +01:00
  • 0f8d690a04 docs: fix outdated usage of llama-simple rand-fly 2024-11-28 22:12:53 +08:00
  • ac404be2dc server : add split model test Xuan Son Nguyen 2024-11-28 14:40:22 +01:00
  • 5acff8f3a3
    ggml : fix bug in Q4_1 x Q8_1 I8MM kernel Georgi Gerganov 2024-11-28 13:09:23 +02:00
  • 203fbb0bac common: fix warning message when no GPU found Johannes Gäßler 2024-11-28 14:02:09 +01:00
  • 76b27d29c2
    ggml : fix row condition for i8mm kernels (#10561) b4209 Georgi Gerganov 2024-11-28 14:56:37 +02:00
  • eea986f215
    cmake : fix ARM feature detection (#10543) b4208 Georgi Gerganov 2024-11-28 14:56:23 +02:00
  • c202cef168
    ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541) Shupei Fan 2024-11-28 20:52:03 +08:00
  • 2025fa67e9
    kompute : improve backend to pass test_backend_ops (#10542) b4206 Sergio López 2024-11-28 12:51:38 +01:00
  • 758bb13e6d [CANN]Code Formatting noemotiovon 2024-11-28 11:19:27 +00:00
  • 0adfd0ff92
    cmake : fix ARM feature detection for MSVC Georgi Gerganov 2024-11-27 13:16:13 +02:00
  • 7ea8d0984c [cann] RoPE operator optimization noemotiovon 2024-11-28 11:08:21 +00:00
  • 2e752c4c22
    ggml : fix row condition for i8mm kernels Georgi Gerganov 2024-11-28 12:20:48 +02:00
  • 1e645678e7 clang format lihan 2024-11-28 17:16:08 +08:00
  • c6bc73951e
    CANN: Update cann.md to display correctly in CLion (#10538) b4205 Ruixin Huang 2024-11-28 15:27:11 +08:00
  • 605fa66c50
    CANN: Fix SOC_TYPE compile bug (#10519) b4204 leo-pony 2024-11-28 15:25:24 +08:00
  • e509f79635 fix CANN compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version leo-pony 2024-11-28 09:28:52 +08:00
  • dc96a0b06a Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU. leo-pony 2024-11-26 22:07:13 +08:00
  • dc60ede113 CANN: Fix the bug build fail on Ascend310P under two cases: 1) Manual specify SOC_TYPE 2) Under some unusual compile environment leo-pony 2024-11-26 21:20:47 +08:00
  • b7420131bf
    CANN: ROPE operator optimization (#10540) b4203 Chenguang Li 2024-11-28 14:24:46 +08:00
  • 209046905a llama: Support MiniCPM-1B (with & w/o longrope) Yuxuan Li 2024-11-28 13:59:48 +08:00
  • 6a6c954ddb delete unused commnet lihan 2024-11-28 13:32:21 +08:00
  • 65180fbaaa faster ssm_scan lihan 2024-11-28 13:15:16 +08:00
  • 7c313b5f5e check for subgroup multiple of 16 and greater than 16 Eve 2024-11-27 21:41:16 -05:00
  • 9f912511bc
    common : fix duplicated file name with hf_repo and hf_file (#10550) b4202 Xuan Son Nguyen 2024-11-27 22:30:52 +01:00
  • 91fd3226b8 common : fix duplicated file name with hf_repo and hf_file Xuan Son Nguyen 2024-11-27 19:18:46 +01:00
  • 3ad5451f3b
    Add some minimal optimizations for CDNA (#10498) b4201 uvos 2024-11-27 17:10:08 +01:00
  • 0aa6488a67 ggml-cpu: add __ARM_FEATURE_DOTPROD guard Shupei Fan 2024-11-27 21:52:12 +08:00
  • 2d5acca196 ggml_cuda: set launch bounds also for GCN as it helps there too uvos 2024-11-27 14:16:46 +01:00
  • 4e9692aeac llava: return false instead of exit Lou Ting 2024-11-27 20:35:23 +08:00
  • 2c96bd2466
    Merge branch 'ggerganov:master' into master haopeng 2024-11-27 19:50:29 +08:00
  • 46c69e0e75
    ci : faster CUDA toolkit installation method and use ccache (#10537) b4200 Diego Devesa 2024-11-27 11:03:25 +01:00