Commit graph

  • ad014bba97
    make: add error message for bad CUDA version (#5444) b2138 Johannes Gäßler 2024-02-13 12:38:37 +01:00
  • 35670e7433
    train-text-from-scratch: rename ff tensors Daniel Bevenius 2024-01-10 07:33:59 +01:00
  • fa2c0d558b
    finetune: rename feed-forward tensors (w1/w2/w3) Daniel Bevenius 2024-01-09 14:51:45 +01:00
  • bbc0ebb9d4
    unicode : minor style fixes Georgi Gerganov 2024-02-13 13:18:25 +02:00
  • 4be44b7c33 iq1_s: use IQ2_XXS for attn_output Iwan Kawrakow 2024-02-12 18:55:37 +02:00
  • 307c5f617a iq1_s: better grid Iwan Kawrakow 2024-02-12 13:58:16 +02:00
  • 773014926f iq1_s: ARM_NEON dot product. Works, but not very fast Iwan Kawrakow 2024-02-12 11:40:31 +02:00
  • 2ffb05acc8 iq1_s: AVX2 finally works Iwan Kawrakow 2024-02-12 08:29:54 +02:00
  • 67e7c4238e Fix after merge with latest master Iwan Kawrakow 2024-02-12 07:38:29 +02:00
  • dc0b14bebb Fix shadow warnings Iwan Kawrakow 2024-02-12 06:10:06 +02:00
  • 5574533a72 Fix tests Iwan Kawrakow 2024-02-11 18:50:50 +02:00
  • 592b3b26bb iq1_s: WIP AVX2 dot product - something is not right Iwan Kawrakow 2024-02-11 17:22:42 +02:00
  • d94139bf27 iq1_s: scalar CPU dot product Iwan Kawrakow 2024-02-11 14:07:19 +02:00
  • a9d48e9718 iq1_s: CUDA is working Iwan Kawrakow 2024-02-11 13:08:26 +02:00
  • 80cd5bae99 iq1_s: WIP basics Iwan Kawrakow 2024-02-11 11:15:31 +02:00
  • 6c533edb94
    unicode : fix data race for unidentified codepoints Georgi Gerganov 2024-02-13 13:08:12 +02:00
  • 49cc1f7d67
    bert : add tests + fix quantization (#5475) b2137 Georgi Gerganov 2024-02-13 13:01:29 +02:00
  • d075e719a1 common : make load error reporting more granular Aarni Koskela 2024-02-13 12:19:42 +02:00
  • 1ab4f15228
    ci : do not do BERT tests on low-perf nodes Georgi Gerganov 2024-02-13 11:37:26 +02:00
  • 99b8b43d7b
    tests : disable moe test (#5473) b2136 Georgi Gerganov 2024-02-13 11:20:24 +02:00
  • 09b59430da
    ci : add BERT tests Georgi Gerganov 2024-02-13 11:17:27 +02:00
  • ce730ad7e3
    llama : do not quantize pos embd and token type tensors Georgi Gerganov 2024-02-13 11:17:05 +02:00
  • 21851c11d1
    tests : multi-thread the tokenizer tests Georgi Gerganov 2024-02-13 10:39:21 +02:00
  • dc66c6ac2e
    tests : disable moe test Georgi Gerganov 2024-02-13 09:37:13 +02:00
  • e37b8f022d
    Merge branch 'ggerganov:master' into master bmwl 2024-02-12 23:17:56 -08:00
  • 895407f31b
    ggml-quants : fix compiler warnings (shadow variable) (#5472) b2135 Kawrakow 2024-02-13 09:07:57 +02:00
  • 9d42825c3f Update READMEs with info about numa flags, change INTERLEAVE strategy name to DISTRIBUTE everywhere, implement the improved distribution strategy from @rankaiyx, fix a spelling mistake and un-merge some bad merges root 2024-02-13 06:47:40 +00:00
  • 4246b71ad7 Fix compiler warnings (shadow variable) ik/fix_warnings Iwan Kawrakow 2024-02-13 08:44:56 +02:00
  • 5a942099e5 Merge branch 'master' of https://github.com/bmtwl/llama.cpp root 2024-02-13 04:39:48 +00:00
  • 87f8d9e5e0
    Merge branch 'ggerganov:master' into master bmwl 2024-02-12 20:39:41 -08:00
  • 861f544f33
    fix typo bobqianic 2024-02-13 03:05:37 +00:00
  • 7570bbcd9b
    Add files via upload bobqianic 2024-02-13 03:00:05 +00:00
  • 0e5b25b2f2
    fix bugs bobqianic 2024-02-13 02:58:44 +00:00
  • 07f5cd7bec ws John 2024-02-13 00:35:31 +01:00
  • 3a72267869 moved llava functions to llava.cpp, made clip.h C compatible API, replaced vector style functions with pointers, added a debug define to remove functions from compilation while not needed John 2024-02-13 00:29:17 +01:00
  • 5a668ea000
    metal : trying bs = 512 performance (wip) Georgi Gerganov 2024-02-12 19:21:57 +02:00
  • e8b00e2941
    metal : fix NSG1 > 1 Georgi Gerganov 2024-02-08 16:39:38 +02:00
  • 845876d012
    metal : works with ne00 % 4 == 0 Georgi Gerganov 2024-02-08 13:26:50 +02:00
  • e68e32548f
    metal : opts Georgi Gerganov 2024-02-07 23:12:22 +02:00
  • 92a0c17474
    metal : initial working version Georgi Gerganov 2024-02-07 11:20:04 +02:00
  • 6875997fd6
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-02-12 21:16:58 +02:00
  • 099afc6274
    llama : fix quantization when tensors are missing (#5423) b2134 Georgi Gerganov 2024-02-12 20:14:39 +02:00
  • f281d76f41 bring back non-causal attention Douglas Hanley 2024-02-12 12:13:21 -06:00
  • 1549493e94 batched embedding: pool outputs by sequence id. updated embedding example Douglas Hanley 2024-02-09 16:11:36 -06:00
  • df334a1125
    swift : package no longer use ggml dependency (#5465) b2133 Georgi Gerganov 2024-02-12 19:54:29 +02:00
  • 10ab4c4129
    spm : add ggml headers Georgi Gerganov 2024-02-12 19:53:23 +02:00
  • dbd8828eb0
    py : fix persimmon n_rot conversion (#5460) Lee 2024-02-13 01:29:57 +08:00
  • 4dedd11fdf
    Update convert-persimmon-to-gguf.py Georgi Gerganov 2024-02-12 19:29:41 +02:00
  • b8a91c5ac5
    Revert "swift : update Package.swift to use ggml as dependency (#4691)" Georgi Gerganov 2024-02-12 19:23:41 +02:00
  • 43fe07c1a4
    ggml-sycl: Replace 3d ops with macro (#5458) b2131 Abhilash Majumder 2024-02-12 20:22:05 +05:30
  • 19a03e0ed2 Updated/merged the deepseek coder pr Jaggzh 2024-02-12 04:18:06 -08:00
  • b16a391fb7 merge not working yet Jaggzh 2024-02-12 04:04:34 -08:00
  • e3d8c5ecc9 lookup: hashmap, most frequent tokens, abort early JohannesGaessler 2024-02-11 12:33:23 +01:00
  • 0c612e5e04
    Remove trailing whitespace in test-tokenizer-0-falcon.cpp bobqianic 2024-02-12 11:05:47 +00:00
  • 0da1e9c6c9
    Update test-tokenizer-0-falcon.cpp bobqianic 2024-02-12 11:03:00 +00:00
  • dbf52ee42b
    convert : fix persimmon offical weight conversion to write correct n_rot. Lee 2024-02-12 18:19:17 +08:00
  • 9ec342e5de
    fix format Abhilash Majumder 2024-02-12 14:36:46 +05:30
  • 4a46d2b792
    llava : remove prog parameter from ArgumentParser (#5457) b2130 Daniel Bevenius 2024-02-12 09:38:44 +01:00
  • 03477279df update build.zig hazelnutcloud 2024-02-12 16:06:41 +08:00
  • 28444caf7f
    use macro Abhilash Majumder 2024-02-12 13:08:54 +05:30
  • 3b169441df
    sync : ggml (#5452) b2129 Georgi Gerganov 2024-02-12 09:16:06 +02:00
  • 24bbb4447a
    ci: add W503 to flake8 ignore list Daniel Bevenius 2024-02-12 08:07:35 +01:00
  • 8ac20ae88b
    llava: remove prog parameter from ArgumentParser Daniel Bevenius 2024-02-12 07:44:48 +01:00
  • 60f9508103
    use macro Abhilash Majumder 2024-02-12 11:54:50 +05:30
  • 0dd6c9da2a added verbose_prompt support into cli added stopwords for llava-1.6 into cli John 2024-02-12 04:34:51 +01:00
  • 60c5f46ba7 ws John 2024-02-12 04:04:57 +01:00
  • 51e60c996f Tensors are now properly permuted. Before the embeddings were inserted 1:1, now they are split into the 24x24 patches as in reference. John 2024-02-12 04:02:54 +01:00
  • 76d5b7f76c Merge remote-tracking branch 'origin/master' into sync slaren 2024-02-11 21:08:09 +01:00
  • 7cba240bcf ggml-backend : reduce alignment to 32 to match gguf and fix mmap slaren 2024-02-11 21:04:03 +01:00
  • 7d404b383c update finetune.cpp, train-text-from-scratch.cpp slaren 2024-02-11 18:01:36 +01:00
  • 3bdc4cd0f5
    CUDA: mul_mat_vec_q tiling, refactor mul mat logic (#5434) b2128 Johannes Gäßler 2024-02-11 19:08:39 +01:00
  • 763083e567
    Update ggml-cuda.cu Johannes Gäßler 2024-02-11 19:07:03 +01:00
  • a3a46580f7 any_pascal fixup JohannesGaessler 2024-02-11 19:04:00 +01:00
  • b1f6fab684 refactor boolean logic JohannesGaessler 2024-02-11 19:01:44 +01:00
  • 005de593ad refactor fp16 logic, only consider used devices JohannesGaessler 2024-02-11 18:52:49 +01:00
  • 2891c8aa9a
    Add support for BERT embedding models (#5423) b2127 Douglas Hanley 2024-02-11 10:21:38 -06:00
  • 61bab4781c Merge branch 'bert' of github.com:iamlemec/llama.cpp into bert Douglas Hanley 2024-02-11 09:51:25 -06:00
  • 97a336507e flake.lock: Update github-actions[bot] 2024-02-11 00:17:31 +00:00
  • e379e8c10b avoid use of ggml_graph_get_tensor Douglas Hanley 2024-02-11 09:50:18 -06:00
  • 8e8d76cb39
    Merge branch 'ggerganov:master' into master hsnmkls 2024-02-11 23:14:38 +08:00
  • c88c74f967
    vulkan: only use M-sized matmul on Apple GPUs (#5412) b2125 Sergio López 2024-02-11 15:12:00 +01:00
  • 7f44f296bc update llama.cpp, clip.cpp, export-lora.cpp slaren 2024-02-11 14:58:14 +01:00
  • a803333a4e
    common : use enums for sampler types (#5418) b2124 Alexey Parfenov 2024-02-11 13:43:31 +00:00
  • e34ebae2b0
    minor : spaces Georgi Gerganov 2024-02-11 15:43:11 +02:00
  • 684780141a
    server : allow to specify tokens as strings in logit_bias (#5003) b2123 Alexey Parfenov 2024-02-11 13:38:14 +00:00
  • 85910c5b30
    main : ctrl+C print timing in non-interactive mode (#3873) b2122 Georgi Gerganov 2024-02-11 15:35:50 +02:00
  • 139b62a839
    common : fix compile warning b2121 Georgi Gerganov 2024-02-11 15:33:43 +02:00
  • 0f2411f154
    ggml : fix compile warnings (unused vars) (#4966) Georgi Gerganov 2024-02-11 15:33:01 +02:00
  • a07d0fee1f
    ggml : add mmla kernels for quantized GEMM (#4966) b2119 snadampal 2024-02-11 07:22:33 -06:00
  • b13ef36316
    sync : ggml Georgi Gerganov 2024-02-11 14:39:39 +02:00
  • 2f04c6efe1
    ggml-alloc : v3 (ggml/727) slaren 2024-02-11 13:37:58 +01:00
  • e4640d8fdf
    lookup: add print for drafting performance (#5450) b2118 Johannes Gäßler 2024-02-11 12:44:51 +01:00
  • 846aaa505c lookup: add print for drafting performance JohannesGaessler 2024-02-11 11:22:37 +01:00
  • 0caf8dc906 Enable non-contiguous support for simple ops 0cc4m 2024-02-11 12:08:05 +01:00
  • 8fbefed148
    minor : code style normalization Georgi Gerganov 2024-02-11 12:59:59 +02:00
  • 3a5a7e3718 vulkan: only use M-sized matmul on Apple GPUs Sergio Lopez 2024-02-11 11:30:16 +01:00
  • f79cef94ae vulkan: refactor guess_matmul_pipeline for vendor Sergio Lopez 2024-02-08 12:49:15 +01:00
  • f67ba9a674
    Update Makefile Johannes Gäßler 2024-02-11 11:29:01 +01:00
  • 907e08c110
    server : add llama2 chat template (#5425) b2117 Xuan Son Nguyen 2024-02-11 11:16:22 +01:00
  • c19c59812e Fix Vulkan check results 0cc4m 2024-02-11 09:59:36 +01:00