Commit graph

  • 5471e756ec
    support --spm-infill Sigbjørn Skjæret 2024-06-19 21:59:31 +02:00
  • ce89d8a49c
    add --spm-infill option Sigbjørn Skjæret 2024-06-19 21:57:36 +02:00
  • 80d447994d Remove IDEWorkspaceChecks.plist from root-level .gitignore Michael de Gans 2024-06-19 12:47:28 -07:00
  • 35f2a46090 Remove .clang-tidy from .gitignore Michael de Gans 2024-06-19 12:26:39 -07:00
  • 7f38c03f9b Reorganize .gitignore Michael de Gans 2024-06-19 12:06:29 -07:00
  • a05faca0d0
    Update convert-hf-to-gguf.py 0xspringtime 2024-06-19 14:53:32 -04:00
  • 620633e9d8
    Update convert-hf-to-gguf.py 0xspringtime 2024-06-19 14:12:23 -04:00
  • fad942bb84 Fix Vulkan debug build error 0cc4m 2024-06-19 19:26:27 +02:00
  • 230396bc5b update avx2 Eddie-Wang1120 2024-06-20 00:12:58 +08:00
  • ff0359d6f4 move qnn helper function into utility files hongruichen 2024-06-19 18:16:11 +08:00
  • fa9a742b46 fix seq Eddie-Wang1120 2024-06-19 21:49:13 +08:00
  • fcf2da4621 add dequantize Eddie-Wang1120 2024-06-19 21:48:04 +08:00
  • 9c77ec1d74
    ggml : synchronize threads using barriers (#7993) b3184 slaren 2024-06-19 15:04:15 +02:00
  • a04a953cab
    codecov : remove (#8004) b3183 Georgi Gerganov 2024-06-19 13:04:36 +03:00
  • d7f74f0559
    codecov : remove Georgi Gerganov 2024-06-19 13:03:27 +03:00
  • 37a1585ead rename hongruichen 2024-06-19 17:36:50 +08:00
  • 73bf3090d3 modify true ci file luoyu-intel 2024-06-19 17:15:05 +08:00
  • b8ffaa646e update ci cmd luoyu-intel 2024-06-19 16:51:30 +08:00
  • 8a3d501cda revert format luoyu-intel 2024-06-19 08:26:06 +00:00
  • e1eabdc2e4 revert format change luoyu-intel 2024-06-19 08:22:02 +00:00
  • 5de2122647 format luoyu-intel 2024-06-19 08:07:43 +00:00
  • 4488134edf fix debug link error. fix windows crash luoyu-intel 2024-06-19 07:43:51 +00:00
  • 3fe07eb907 fix compiling error hongruichen 2024-06-19 14:47:41 +08:00
  • 35c6887f78 optimize convert-hf-to-gguf.py for chatglm model XingXing Qiao 2024-05-16 11:42:53 +08:00
  • 82408332b6 fix lint error XingXing Qiao 2024-05-24 14:13:36 +08:00
  • ad896fad1f remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model XingXing Qiao 2024-05-15 11:00:04 +08:00
  • 8edec939e4 add chatglm3-6b model support huggingface model: https://hf-mirror.com/THUDM/chatglm3-6b qiaoxx 2024-06-11 12:01:32 +08:00
  • 593ed55473
    Merge pull request #1 from ggerganov/master AX 2024-06-19 15:08:14 +08:00
  • 3c491a3263 remove reference of g_qnn_mgr in qnn_instance hongruichen 2024-06-19 14:43:22 +08:00
  • 99320620b0 split logger function, tensors and backend from main qnn source hongruichen 2024-06-19 12:25:32 +08:00
  • 4147a04581 Added OMP Barrier in ggml.c to avoid atomic operations Abhishek Nair 2024-06-19 10:39:23 +05:30
  • dfe159ffff remove TODO hongruichen 2024-06-19 10:58:12 +08:00
  • aeef0c68f4 make the constant condition first hongruichen 2024-06-19 10:29:53 +08:00
  • 079dd3f592 add sycl preset luoyu-intel 2024-06-19 02:05:19 +00:00
  • 623494a478
    [SYCL] refactor (#6408) b3182 Meng, Hengyu 2024-06-19 09:11:51 +08:00
  • 4d8a0c2b9f skip barriers with 1 threads slaren 2024-06-19 02:05:44 +02:00
  • 22eedc7677 un-ignore build-info.cpp.in Michael de Gans 2024-06-18 15:04:35 -07:00
  • 840b6ba66c un-ignore build-info.cmake and build-info.sh Michael de Gans 2024-06-18 14:53:54 -07:00
  • e5c0c4e30d server ci : do not use openmp with tsan slaren 2024-06-18 20:54:02 +02:00
  • 7226483b06 spin more slaren 2024-06-18 20:09:51 +02:00
  • e4643ad4d4 add implementation without openmp slaren 2024-06-18 19:55:17 +02:00
  • 4886f5001d ggml : synchronize using openmp barriers slaren 2024-06-18 18:38:49 +02:00
  • 37bef89433
    tokenizer : BPE fixes (#7530) b3181 jaime-m-p 2024-06-18 18:40:52 +02:00
  • 89c7e4c1dd remove block scale Eddie-Wang1120 2024-06-18 23:33:58 +08:00
  • 07f4b706e6 Fix gemma model conversion Galunid 2024-06-18 17:24:05 +02:00
  • 0e4699e651
    sycl-exp : dequant q4 k improvements (#7972) AidanBeltonS 2024-06-18 16:20:38 +01:00
  • 65a14d9e9a fix todo hongruichen 2024-06-18 23:07:01 +08:00
  • da43a545ef rename normalization layers wheelspawn 2024-06-18 10:04:58 -05:00
  • 4edc958fec fix code Eddie-Wang1120 2024-06-18 22:16:16 +08:00
  • 91c188d6c2
    Only use FIM middle token if it exists (#7648) b3180 Sigbjørn Skjæret 2024-06-18 14:19:45 +02:00
  • 84f6de17f6
    Fix no gcc pragma on Windows (#7751) b3179 jojorne 2024-06-18 09:18:32 -03:00
  • f99c653aba
    sycl-exp : Revert "Minor arithmetic improvement to mmvq wrapper kernel (#7172)" (#7980) Joe Todd 2024-06-18 13:09:52 +01:00
  • 61665277af
    Allow compiling with CUDA without CUDA runtime installed (#7989) b3178 Ulrich Drepper 2024-06-18 14:00:14 +02:00
  • 7cdbb33ee2
    Allow compiling with CUDA without CUDA runtime installed Ulrich Drepper 2024-06-18 13:48:01 +02:00
  • f3974cabac all matrix multiplication backend sl/test-mul-mat-backend slaren 2024-06-14 21:44:55 +02:00
  • 3b1ae2cbeb fix merge master Yann Follet 2024-06-18 10:39:44 +00:00
  • 3fc2a81bfa merge master Yann Follet 2024-06-18 10:29:08 +00:00
  • 841b9a5bec
    Merge branch 'master' into feat-jina-embeddings-v2-zh Joan Fontanals 2024-06-18 11:02:28 +02:00
  • ce6e28cc23
    Update ggml-sycl.cpp codeplay/fix-matmul-arith Joe Todd 2024-06-18 09:57:14 +01:00
  • 6a4fd2b118 fix workgroup size hardcode Meng, Hengyu 2024-06-18 06:14:04 +00:00
  • e9aa74207f
    Revert "llama : offload to RPC in addition to other backends (#7640)" (#7981) Joe Todd 2024-06-18 09:54:47 +01:00
  • 3819a8e7fe
    Merge 20b22433f0 into b96f9afb0d Julia 2024-06-18 09:43:00 +01:00
  • b96f9afb0d
    chore: clean useless beam search param (#7985) b3177 Frank Mai 2024-06-18 15:11:40 +08:00
  • 1193778105
    readme : update UI list (#7943) Abheek Gulati 2024-06-17 23:57:41 -07:00
  • 5326bcceeb
    ggml : sync b3175 Georgi Gerganov 2024-06-18 09:50:45 +03:00
  • e6ecc2be47
    whisper : use ggml_backend_sched (whisper/2239) Georgi Gerganov 2024-06-18 09:37:20 +03:00
  • a7614fa239 seperate lower precision GEMM from the main files Meng, Hengyu 2024-06-18 05:49:40 +00:00
  • 29e2a96d28 iq3_s sllv can be safely replaced with sse multiply netrunnereve 2024-06-17 21:43:24 -04:00
  • ec667c78ba chore: clean useless beam search param thxCode 2024-06-18 09:12:16 +08:00
  • a94e6ff877
    update: support Qwen2-57B-A14B (#7835) b3173 Ștefan-Gabriel Muscalu 2024-06-17 22:08:46 +03:00
  • 0a321fc53e Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs 0cc4m 2024-06-17 20:57:47 +02:00
  • b8929d5f06
    Merge branch 'master' into tokenizer-bpe-fixes jaime-m-p 2024-06-17 20:24:29 +02:00
  • 5b6da18750
    Make updates to type cast based on compiler instead of OS (#7851) b3172 Srihari-mcw 2024-06-17 23:53:17 +05:30
  • b7ee8270ae Replace char32_t with uint32_t jaime-m-p 2024-06-17 20:19:03 +02:00
  • 7c26775adb
    llama : disable FA if KV head size do not match (#7982) b3171 Georgi Gerganov 2024-06-17 19:40:01 +03:00
  • ef79941ac9
    llama : disable FA if KV head size do not match gg/fa-req-kq-hs Georgi Gerganov 2024-06-17 19:20:24 +03:00
  • a0cbf2c555 Revert "llama : offload to RPC in addition to other backends (#7640)" Joe Todd 2024-06-14 22:23:39 +01:00
  • b473e95084
    Add Nix and Flox install instructions (#7899) Bryan Honof 2024-06-17 17:37:55 +02:00
  • 06cdd3031c
    Add Nix and Flox install instructions Bryan Honof 2024-06-12 13:55:53 +02:00
  • 99052cd227
    sched : offload_op also requires supports_op (#7977) b3169 slaren 2024-06-17 16:51:42 +02:00
  • c637fcd34d
    fix: divide 0 exception in mamba (#7932) b3168 Frank Mai 2024-06-17 22:11:08 +08:00
  • 6a2f0b3474
    Implement non-mapped async IO for CUDA on Windows. (#7896) b3167 Markus Tavenrath 2024-06-17 16:10:15 +02:00
  • 0bab366d1a sched : offload_op also requires supports_op slaren 2024-06-17 16:03:01 +02:00
  • a03eff318c i2s->q22 Eddie-Wang1120 2024-06-17 20:33:09 +08:00
  • f00ffcf2e5 server : fix JSON-Scheme typo Aarni Koskela 2024-06-17 15:20:01 +03:00
  • 9456bba121 rename hongruichen 2024-06-17 18:44:19 +08:00
  • a235b7c532 Vectorize q load codeplay/dequant_q4_K_improvements Aidan 2024-06-17 10:30:40 +01:00
  • 604ef6bf15 Store scales in local mem Aidan 2024-06-17 10:26:18 +01:00
  • cb3fb42046 Single load for half2 Aidan 2024-06-17 10:21:16 +01:00
  • 4a481556e6 Remove double lines Aidan 2024-06-17 10:16:10 +01:00
  • 21be9cab94
    rpc : fix load/store misaligned addresses (#7948) b3166 Georgi Gerganov 2024-06-17 11:09:20 +03:00
  • 5cc7b453a4 bypass logits when doing non-NONE pooling Douglas Hanley 2024-06-17 00:24:33 -06:00
  • 006167aaf6
    gguf-dump.py: add --markdown dump output (#7853) Brian 2024-06-17 15:25:20 +10:00
  • 6200b43661
    Apply suggestions from code review Brian 2024-06-17 15:21:57 +10:00
  • 5fe7b87ba1 use ggml_qnn_tensor_writer for all parameters hongruichen 2024-06-16 23:54:00 +08:00
  • df68d4fa5d
    [SYCL] Update README-sycl.md for Chapter "Recommended release" and "News" (#7946) b3164 Neo Zhang 2024-06-17 11:17:07 +08:00
  • 1fc5bf5bcb support glm-4-9b-chat XingXing Qiao 2024-06-17 10:08:52 +08:00
  • f4d3bdabff MiniCPM: fix for gpa zhangkaihuo 2024-06-17 10:53:37 +08:00
  • 5b7b7e0894
    Update README-sycl.md Neo Zhang 2024-06-17 09:54:48 +08:00
  • 59dc0acbac
    Update README-sycl.md Neo Zhang 2024-06-17 09:44:18 +08:00