Commit graph

  • cbab212a32 Restrict threadpool to CPU backend fmz 2024-05-28 08:47:35 -07:00
  • 6bd12ce409
    sycl : fix assert (#7563) b3027 Georgi Gerganov 2024-05-28 22:22:50 +03:00
  • 4e4c41e553 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-05-28 15:15:18 -04:00
  • 21936ddb5d fix: do not complicate things Joan Martinez 2024-05-28 21:06:12 +02:00
  • 3a414b0be2 llama : sequence-length-aware batch splitting Francis Couture-Harpin 2024-05-28 12:21:52 -04:00
  • 181dadf294 llama : fix Jamba quantization sanity checks Francis Couture-Harpin 2024-05-28 12:23:05 -04:00
  • 02eb445d73 sync master caitianchi 2024-05-29 03:06:58 +08:00
  • 28d4a7f9cc
    Merge pull request #8 from OpenBMB/master tc-mb 2024-05-29 03:03:26 +08:00
  • 8bd47ce5d6
    Merge pull request #7 from OpenBMB/prepare-PR tc-mb 2024-05-29 02:50:30 +08:00
  • 8767ce29cf
    Merge branch 'prepare-PR-of-minicpm-v2.5' into prepare-PR tc-mb 2024-05-29 02:49:59 +08:00
  • 5442939fcc
    llama : support small Granite models (#7481) b3026 Giuseppe Scrivano 2024-05-28 20:49:49 +02:00
  • cc0ac09712 feat: add changes to handle jina v2 base code Joan Martinez 2024-05-28 20:45:04 +02:00
  • b37ab0b1e5 add link caitianchi 2024-05-29 02:21:41 +08:00
  • 9495504e7b replace and organize code caitianchi 2024-05-29 01:52:26 +08:00
  • 3c306f18c8 clear code caitianchi 2024-05-29 01:50:59 +08:00
  • 56411a950f
    vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552) b3025 k.h.lai 2024-05-29 01:25:08 +08:00
  • 056d178160 rename wrapper caitianchi 2024-05-29 00:18:17 +08:00
  • d99c0e44f6
    Merge 02240912ff into 2b737caae1 Alon 2024-05-28 19:10:26 +03:00
  • b974e9fcfb
    llama: add support for small granite models Giuseppe Scrivano 2024-05-23 00:45:35 +02:00
  • 06748ff338
    llama: honor add_space_prefix from the model configuration Giuseppe Scrivano 2024-05-25 22:24:12 +02:00
  • 5c9222d88f Restrict threadpool to CPU backend fmz 2024-05-28 08:47:35 -07:00
  • 83689a50b7 use the correct SYCL context for host USM allocations Ben Ashbaugh 2024-05-28 08:19:34 -07:00
  • 2b737caae1
    rpc : resource management rework (#7562) b3024 Radoslav Gerganov 2024-05-28 18:13:36 +03:00
  • ee3dff6b8e
    Add support for DeepseekV2ForCausalLM (#7519) b3023 fairydreaming 2024-05-28 17:07:05 +02:00
  • a0e3970a18
    Merge 472a9b8be5 into edc29433fa Behnam Moh 2024-05-28 11:01:09 -04:00
  • 732c3c977a rm unused Meng, Hengyu 2024-05-28 21:02:41 +08:00
  • d63f6b66d4
    Update ggml-sycl.cpp Meng, Hengyu 2024-05-28 21:00:30 +08:00
  • fc08e1a729
    Update ggml-sycl.cpp Meng, Hengyu 2024-05-28 21:00:01 +08:00
  • 1723c147c2
    Update ggml-sycl.cpp Meng, Hengyu 2024-05-28 20:59:54 +08:00
  • 6a8432bf43
    Update ggml-sycl.cpp Meng, Hengyu 2024-05-28 20:59:44 +08:00
  • c1c99a3186 address review comments Radoslav Gerganov 2024-05-28 15:16:21 +03:00
  • edc29433fa
    tests : fix test-tokenizer-0.sh Georgi Gerganov 2024-05-28 15:04:09 +03:00
  • 3efb6595ae gguf-py, llama : whitespace formatting fixes Stanisław Szymczyk 2024-05-28 13:28:57 +02:00
  • 8b99e2aa66
    llama : handle unknown utf8 bytes (#7588) b3021 Georgi Gerganov 2024-05-28 13:55:35 +03:00
  • fd8eda19df
    llama : handle unknown utf8 bytes Georgi Gerganov 2024-05-28 13:54:17 +03:00
  • 271ff3fc44
    github: add refactor to issue template (#7561) Brian 2024-05-28 20:27:27 +10:00
  • dd53eb9380
    Update 07-refactor.yml Brian 2024-05-28 20:26:08 +10:00
  • e2b065071c
    [SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436) b3019 Neo Zhang 2024-05-28 17:53:37 +08:00
  • 841cd47432 llama : replace ggml_new_tensor_3d + ggml_set_inplace + ggml_set_inplace with single ggml_concat in build_deepseek2() Stanisław Szymczyk 2024-05-28 11:15:17 +02:00
  • 55a7834442 rpc : resource management rework Radoslav Gerganov 2024-05-22 13:23:12 +03:00
  • 98ff6e1b45 Merge remote-tracking branch 'upstream/master' into deepseek-v2 Stanisław Szymczyk 2024-05-28 11:01:48 +02:00
  • 6366d62d6b updata cmakelist caitianchi 2024-05-28 16:35:13 +08:00
  • d0e9e0e14d remove fp16 replacing fp32 Meng, Hengyu 2024-05-28 16:20:00 +08:00
  • 8eb0549fd0 revert typo Meng, Hengyu 2024-05-28 16:05:57 +08:00
  • 0548a4187f
    ggml : generalize GGML_OP_CONCAT (#7563) b3018 Georgi Gerganov 2024-05-28 11:04:19 +03:00
  • c7ed1d8ddc revert FORCE_DMMV both in cuda and sycl Meng, Hengyu 2024-05-28 16:03:41 +08:00
  • df80e03899
    ggml : fix ptrs Georgi Gerganov 2024-05-28 10:35:25 +03:00
  • 6cd825ffff
    cuda : add asserts Georgi Gerganov 2024-05-28 10:32:30 +03:00
  • 94a0c7650d
    ggml : reimplement CPU and Metal Georgi Gerganov 2024-05-27 18:22:05 +03:00
  • e73a0c7c2f updata cmakelist caitianchi 2024-05-28 15:26:09 +08:00
  • 4bf6133b0e typo Meng, Hengyu 2024-05-28 14:06:45 +08:00
  • bfed2838ac update readme Meng, Hengyu 2024-05-28 06:52:48 +08:00
  • 19dc47c064 remove useless use_xmx Meng, Hengyu 2024-05-28 06:41:11 +08:00
  • abe594a058 fix typo Meng, Hengyu 2024-05-28 06:39:21 +08:00
  • 583c81c91c align GEMM dispatch Meng, Hengyu 2024-05-28 05:11:55 +08:00
  • 1d9d39a18e threadpool: update backend interface in ggml-rpc Max Krasnyansky 2024-05-27 22:13:07 -07:00
  • 9335b969e8
    server: do not remove whitespace at the start of a completion chunk (#7524) mgroeber9110 2024-05-28 06:55:51 +02:00
  • a67dbcc538 threadpool: fix compiler errors for android and x64 builds Max Krasnyansky 2024-05-27 21:29:58 -07:00
  • c41767154e
    Markdownish code block fix (#7571) Nathan Epstein 2024-05-28 00:41:14 -04:00
  • 74b239b3d5
    llava : update clip.h (#7580) b3015 Ikko Eltociear Ashimine 2024-05-28 11:48:16 +09:00
  • e0a686bd49 rm unused or duplicated code, rename as review comment arthw 2024-05-24 10:14:14 +08:00
  • bff9eb862d rm comment arthw 2024-05-21 21:48:28 +08:00
  • 4db8e86cf4 fix mul_mat_id to match the change of api arthw 2024-05-21 21:39:48 +08:00
  • cd839a2e2c
    llava : update clip.h Ikko Eltociear Ashimine 2024-05-28 11:32:30 +09:00
  • f2a5310dd7
    Merge branch 'master' into duo Brian 2024-05-28 11:03:06 +10:00
  • 40c8d7fdea vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE Adriankhl 2024-05-27 15:58:54 +08:00
  • efbbc95321 remove ms per token, since not relevant for most webui users and use cases Yazan Agha-Schrader 2024-05-28 02:51:47 +02:00
  • 6bf7ae08dd update const ModelGenerationInfo Yazan Agha-Schrader 2024-05-28 02:46:18 +02:00
  • 8768b4f5ea use template literales for promptFormats.js Yazan Agha-Schrader 2024-05-28 02:25:49 +02:00
  • 852aafb163
    update HIP_UMA #7399 (#7414) b3014 Djip007 2024-05-28 01:40:47 +02:00
  • 0136966daf
    adding in x64 targets to cmake presets (#7574) kunnis 2024-05-27 18:40:12 -05:00
  • b414599a72 Moving all the threading operations to it's own file Kunnis 2024-05-27 18:37:46 -05:00
  • 1ca802a3e0 parallelize fattn compilation test sl/cuda-fattn-par-test slaren 2024-05-28 01:19:36 +02:00
  • e2d917c5f0 fix grammar field width Yazan Agha-Schrader 2024-05-28 00:29:08 +02:00
  • 7055d84020 fix FloatField and BoolField tooltips Yazan Agha-Schrader 2024-05-28 00:23:06 +02:00
  • 569cbd8bbe
    Merge branch 'ggerganov:master' into server-ui-pr Yazan Agha-Schrader 2024-05-27 23:49:47 +02:00
  • 0f077968c0 add tooltips to the parameters with comprehensible explanations Yazan Agha-Schrader 2024-05-27 23:46:18 +02:00
  • db0aba2d53 adding in x64 targets. Kunnis 2024-05-27 16:26:29 -05:00
  • c7803876ce move API to the top, rearrange param sliders. update css Yazan Agha-Schrader 2024-05-27 22:33:18 +02:00
  • 55e6336b4f Review fixes Galunid 2024-05-27 21:34:48 +02:00
  • f3f6c0a930 Discard all tokens when no matching found jaime-m-p 2024-05-27 20:17:01 +02:00
  • 117b091069 Fix and improve preprocessing jaime-m-p 2024-05-27 20:15:44 +02:00
  • 938cb4941a Build vocab.special_tokens_cache using vocab token types jaime-m-p 2024-05-27 20:12:43 +02:00
  • 2c8f62fd40 Add Viking-7B tokenizer support Aarni Koskela 2024-05-17 14:29:22 +03:00
  • c28c99629c test-tokenizer-0: improve output, show how many tests failed Aarni Koskela 2024-05-27 20:19:23 +03:00
  • f4003cfba1 fix nwarps > batch size Johannes Gäßler 2024-05-26 23:00:15 +02:00
  • f08776041d add q8_0 q4_0 tests Johannes Gäßler 2024-05-26 22:30:46 +02:00
  • 3194a01058 fix commented-out kernel variants Johannes Gäßler 2024-05-26 20:14:55 +02:00
  • 462add6a01 try CI fix Johannes Gäßler 2024-05-25 22:06:25 +02:00
  • 672244a88b CUDA: quantized KV support for FA vec Johannes Gäßler 2024-05-21 19:38:25 +02:00
  • ecbf827a92 updating regexes Nathan Epstein 2024-05-27 13:41:17 -04:00
  • 9df6a44cd7 markdownish codeblock fix Nathan Epstein 2024-05-27 13:35:06 -04:00
  • 10b1e45876
    make: add --device-debug to NVCC debug flags (#7542) b3012 Johannes Gäßler 2024-05-27 19:34:40 +02:00
  • 197c00681b
    Allow multiple copy function pointers for CUDA graph kernel param updates (#7565) b3011 agray3 2024-05-27 18:33:42 +01:00
  • 10729fa03c Merge branch 'Markdownish_Codeblock_fix' of github.com:SixNines/llama.cpp into Markdownish_Codeblock_fix Nathan Epstein 2024-05-27 13:30:58 -04:00
  • 305996cb1a rebase / merge Nathan Epstein 2024-05-27 13:28:18 -04:00
  • d8974b8ea6 support ollama caitianchi 2024-05-28 01:13:57 +08:00
  • af45703f74 Add WPM models for testing jaime-m-p 2024-05-23 20:16:34 +02:00
  • 2a38e5fa88 Update random test: add_bos_token jaime-m-p 2024-05-23 20:11:40 +02:00
  • 228b1bd487
    Update README.md Oleksandr Kuvshynov 2024-05-27 09:49:55 -07:00