Commit graph

  • 43b35e38ba
    Add support for sqrt on CUDA (#7953) b3163 Calvin Laurenson 2024-06-16 15:23:04 -07:00
  • d63aca3e45 Refactor Vulkan backend to allow multiple contexts 0cc4m 2024-06-16 12:00:38 +02:00
  • f05a0e0a00 Add --pre-tokenizer option to convert Galunid 2024-06-16 20:18:27 +02:00
  • 19b7a836f6
    cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231) b3162 Georgi Gerganov 2024-06-11 17:39:01 +03:00
  • b5fcf8ef5c
    ggml : fix and optimize ppc64le (ggml/849) Hong Bo PENG 2024-06-16 16:53:11 +08:00
  • 398105ff43
    ggml : remove duplicate include of ggml-common.h (ggml/853) Daniel Bevenius 2024-06-16 10:51:18 +02:00
  • bc6c457fa3
    flake.lock: Update (#7951) b3159 Georgi Gerganov 2024-06-16 19:16:21 +03:00
  • 3591f1cc8c
    Use F32 sqrtf instead of F64 sqrt Calvin Laurenson 2024-06-16 08:00:05 -07:00
  • a5679ddd8e use ggml_qnn_tensor_reader for output tensor hongruichen 2024-06-16 22:01:14 +08:00
  • 36e41a1055 use tensor wrapper in matmul hongruichen 2024-06-16 21:46:15 +08:00
  • 37bb9263dd use tensor wrapper in add hongruichen 2024-06-15 11:13:30 +08:00
  • 6c68adc1d9 add ggml_qnn_tensor_binder hongruichen 2024-06-14 18:52:54 +08:00
  • 5e18cdc268 init the test array with const values hongruichen 2024-06-15 12:55:06 +08:00
  • 52399254b3
    unicode : avoid char32_t (#7957) b3158 Georgi Gerganov 2024-06-16 14:51:40 +03:00
  • 6fe1c62741
    readme : update UI list [no ci] (#7958) hopkins385 2024-06-16 13:51:18 +02:00
  • cddaf028ad
    ggml : fix handling of zero blocks in IQ quants (#7955) b3156 Georgi Gerganov 2024-06-16 14:50:12 +03:00
  • ac686cbe10 add RAGNA Desktop to readme hopkins385 2024-06-16 13:45:16 +02:00
  • 98f948b9d0
    unicode : avoid char32_t gg/no-char32_t Georgi Gerganov 2024-06-16 13:18:46 +03:00
  • c8a82194a8
    github : update pr template Georgi Gerganov 2024-06-16 10:46:51 +03:00
  • 28f7a4d028
    ggml : fix handling of zero blocks in IQ quants gg/ggml-fix-zero-blocks Georgi Gerganov 2024-06-16 10:41:53 +03:00
  • 7c7836d9d4
    Vulkan Shader Refactor, Memory Debugging Option (#7947) b3154 0cc4m 2024-06-16 07:17:31 +02:00
  • fea1dc98c0
    new line Calvin Laurenson 2024-06-15 19:38:14 -07:00
  • 97b37313ac
    fix test Calvin Laurenson 2024-06-15 19:30:43 -07:00
  • b57187fb37 iq3_s small fix netrunnereve 2024-06-15 22:23:06 -04:00
  • f9926746d9
    add sqrt to ggml_backend_cuda_supports_op Calvin Laurenson 2024-06-15 18:47:20 -07:00
  • 99f666c1b6 iq3_s netrunnereve 2024-06-15 21:34:02 -04:00
  • 8ad3edfc40
    add test Calvin Laurenson 2024-06-15 18:29:36 -07:00
  • 145f09fc92
    fix comments in pca Calvin Laurenson 2024-06-15 18:05:08 -07:00
  • 3c4df6ccf3
    enable cuda in pca Calvin Laurenson 2024-06-15 17:36:09 -07:00
  • 80321f4b49
    cuda sqrt support Calvin Laurenson 2024-06-15 17:35:57 -07:00
  • bb0e865597 flake.lock: Update github-actions[bot] 2024-06-16 00:19:21 +00:00
  • 39e816e54e iq3_s before sllv netrunnereve 2024-06-15 18:07:56 -04:00
  • eccc609efa iq2_xs netrunnereve 2024-06-15 17:08:25 -04:00
  • 4e4e376e1e Merge branch 'master' into convert-split Christian Zhou-Zheng 2024-06-15 14:28:39 -04:00
  • 0c7b3595b9
    Add cvector-generator example (#7514) b3153 Xuan Son Nguyen 2024-06-15 18:53:40 +02:00
  • 309ef24209 Fix unnecessary high llama-3 VRAM use 0cc4m 2024-06-15 18:23:07 +02:00
  • e9f2abfc8c
    bitnet : pad tensors to 256 gg/bitnet Georgi Gerganov 2024-06-15 19:01:03 +03:00
  • 569a03ed97 finish i2_s/i8_s vec_dot x86 simd Eddie-Wang 2024-06-15 14:01:26 +00:00
  • 51032b15e1 gguf-dump.py: element count autosizing brian khuu 2024-06-15 22:03:39 +10:00
  • dfbf6e1458 gguf-dump: right align element count brian khuu 2024-06-15 00:15:38 +10:00
  • 9310a02a7b gguf-dump.py: prettyfy dimention brian khuu 2024-06-15 00:05:04 +10:00
  • 0b181a9df5 Add type hints and spacing Brian 2024-06-13 16:01:35 +10:00
  • 18b3b4e348 gguf-dump.py: markdownTableWithAlignmentSupport() added brian khuu 2024-06-13 14:28:10 +10:00
  • dd8cdae659 gguf-dump.py: fix array preview brian khuu 2024-06-11 02:01:55 +10:00
  • a69febd886 gguf-dump.py: Add tensor overview count brian khuu 2024-06-10 22:30:41 +10:00
  • b000526741 gguf-dump.py: use standard tensor name lookup. Also add tensor ID field brian khuu 2024-06-10 22:19:56 +10:00
  • 3363405f13 gguf-dump.py: Add toc brian khuu 2024-06-10 20:39:16 +10:00
  • f13e94ca1d gguf-dump.py: add --markdown dump output brian khuu 2024-06-10 20:01:04 +10:00
  • 34bdbed481
    rpc : fix load/store misaligned addresses gg/rpc-fix-misaligned Georgi Gerganov 2024-06-15 14:39:20 +03:00
  • 27e88cd183 Add YX UI for llama-server Aliebc 2024-06-15 17:50:00 +08:00
  • 9bcf6952f7 Fix flake8 0cc4m 2024-06-15 11:43:01 +02:00
  • 00aaaabec5 Add memory debug output option 0cc4m 2024-06-15 11:24:29 +02:00
  • 8abb23f60f Improve debug log code 0cc4m 2024-06-15 09:36:07 +02:00
  • 3ac05b4190 Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory 0cc4m 2024-06-15 08:55:21 +02:00
  • 7b2f4a7d19
    [SYCL] remove global variables (#7710) b3152 Meng, Hengyu 2024-06-15 14:05:10 +08:00
  • c8cfc963f0
    Update README-sycl.md Neo Zhang 2024-06-15 13:34:29 +08:00
  • dcfee06594 iq2_s netrunnereve 2024-06-15 00:25:16 -04:00
  • 592618656a iq3_xxs netrunnereve 2024-06-14 23:36:18 -04:00
  • fcb2bb1222 Add YX simple filter for llama-server Aliebc 2024-06-15 10:45:01 +08:00
  • 9719f513c0
    Update README-sycl.md Neo Zhang 2024-06-15 10:23:07 +08:00
  • 28eaafc166 use macro for group_size and remove cuda-related Meng, Hengyu 2024-06-15 02:12:42 +00:00
  • 95dced07e4 i2_s to absmax Eddie-Wang1120 2024-06-15 10:10:40 +08:00
  • 903e47f956 Fix merge: renamed and deleted files jaime-m-p 2024-06-15 04:03:23 +02:00
  • e28d0e41ea Merge branch 'master' into tokenizer-bpe-fixes jaime-m-p 2024-06-15 03:32:42 +02:00
  • 4af5478f60 Better unicode data generation jaime-m-p 2024-06-14 23:50:14 +02:00
  • 6f6612570e Revert "Minor arithmetic improvement to mmvq wrapper kernel (#7172)" Joe Todd 2024-06-14 22:22:57 +01:00
  • 489a5cbbc0 Add LARS to the UI list in README [no ci] AbheekG 2024-06-14 14:04:41 -07:00
  • 8093253b41 take out attention_type; add in llama_set_embeddings Douglas Hanley 2024-06-06 15:11:25 -05:00
  • d4e6972f60 get rid of old causal_attn accessor Douglas Hanley 2024-06-04 11:16:37 -05:00
  • 7c37ae9d29 only use embd output for pooling_type NONE Douglas Hanley 2024-06-04 01:20:13 -05:00
  • 1756c4b5b6 find result_norm/result_embd tensors properly; update output allocation logic Douglas Hanley 2024-05-22 22:42:08 -05:00
  • 010571490f create append_pooling operation; allow to specify attention_type; add last token pooling; update examples Douglas Hanley 2024-05-22 12:14:24 -05:00
  • 8cda5af9fe Update brute force random test jaime-m-p 2024-06-14 20:14:29 +02:00
  • 0575023923 Skip missing byte tokens (falcon) jaime-m-p 2024-06-14 20:12:39 +02:00
  • 4ff15d4fda Fix unicode whitespaces (deepseek-llm) jaime-m-p 2024-06-14 20:00:15 +02:00
  • f8ec8877b7
    ci : fix macos x86 build (#7940) b3151 olexiyb 2024-06-14 20:28:34 +03:00
  • 520361f318
    Merge branch 'ggerganov:master' into avx_iq Eve 2024-06-14 16:51:34 +00:00
  • 76d66ee0be
    CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921) b3150 Johannes Gäßler 2024-06-14 18:41:49 +02:00
  • 2f08455762
    On april 2 github macos-latest became arm https://github.com/actions/runner-images/pull/9601/files In order to use old macos-latest we should use macos-12 olexiyb 2024-06-14 18:50:47 +03:00
  • 1d9dd480ff rever q2_K precision related changes Johannes Gäßler 2024-06-14 17:43:33 +02:00
  • 181c0e3b0f
    review: modify codes as review suggestion zhou.weiguo 2024-06-14 23:18:43 +08:00
  • 11c7b1e25a
    review: modify codes as review suggestion zhou.weiguo 2024-06-14 23:04:13 +08:00
  • bff3a20944 fix data race Johannes Gäßler 2024-06-14 16:16:44 +02:00
  • 66ef1ceedf
    metal : utilize max shared memory for mul_mat_id (#7935) b3149 Georgi Gerganov 2024-06-14 17:14:09 +03:00
  • ff076b8873
    Merge pull request #7920 from ggerganov/codeplay/revert-host-alloc Joe Todd 2024-06-14 15:10:33 +01:00
  • b2c8c831c9
    Merge pull request #7919 from ggerganov/codeplay/unify-rope-sycl Joe Todd 2024-06-14 15:08:23 +01:00
  • e65bbf606c
    llama-bench : fix RPC indication (#7936) b3148 Radoslav Gerganov 2024-06-14 16:47:41 +03:00
  • ded54b5d9b Replace powf with sycl::pow in ggml-sycl.cpp Joe Todd 2024-06-14 13:14:33 +01:00
  • d9452267a0 fix: QWEN2MOE support for expert_feed_forward_length stefan 2024-06-14 11:38:12 +00:00
  • 225ec48fe5
    np.int16 no longer used Sigbjørn Skjæret 2024-06-14 13:32:48 +02:00
  • 6fcd1331ef
    llama : more checks before assuming FIM tokens (#7644) b3147 Sigbjørn Skjæret 2024-06-14 12:20:04 +02:00
  • 41b9260f18
    convert : add Poro-34B-chat tokenizer support (#7713) b3146 Elaine 2024-06-14 13:16:49 +03:00
  • af019105f1
    Update llama.cpp Georgi Gerganov 2024-06-14 13:16:18 +03:00
  • 1c03036c15
    Update convert-hf-to-gguf-update.py Georgi Gerganov 2024-06-14 13:15:46 +03:00
  • 5d676a2245
    Change Poro-34B-chat to poro-chat Elaine 2024-06-14 13:10:22 +03:00
  • cd974f14ad
    Change Poro-34B-chat to poro-chat Elaine 2024-06-14 13:09:37 +03:00
  • 02a2cc85f1 llama-bench : fix RPC indication Radoslav Gerganov 2024-06-14 13:09:07 +03:00
  • a75f69a63e
    Update convert-hf-to-gguf-update.py Elaine 2024-06-14 13:06:53 +03:00
  • eaf34ba0cd
    metal : utilize max shared memory for mul_mat_id gg/metal-mmid-max-rows Georgi Gerganov 2024-06-14 13:02:25 +03:00
  • c776fb8033 remove duplicated extras Meng, Hengyu 2024-06-14 09:25:14 +00:00