Commit graph

  • 52c6276e12 llama: dbrx: fix k scale Pierrick HYMBERT 2024-04-08 10:43:36 +02:00
  • 87fb5b4234
    remove row=1 cond (#6532) b2629 Abhilash Majumder 2024-04-08 13:56:01 +05:30
  • 48fbf8ca1a add multi-gpu support for ngl calculations Yui 2024-04-08 10:08:14 +02:00
  • d752327c33
    Adding KodiBot to UI list (#6535) Firat 2024-04-08 00:48:29 -07:00
  • 1589e524c7 calculate max ngl by looking at block sizes Yui 2024-04-08 09:47:06 +02:00
  • a6120326b2
    Update common/common.h (suggested by JohannesGaessler) Yui 2024-04-08 09:18:00 +02:00
  • 2325ec0550
    Update common/common.cpp (suggested by JohannesGaessler) Yui 2024-04-08 09:17:33 +02:00
  • f70a84bacb
    Adding KodiBot to UI list Firat 2024-04-07 23:32:56 -07:00
  • daa4f259c3
    Update convert-hf-to-gguf.py Ren Xuancheng 2024-04-08 13:49:55 +08:00
  • 2201efb719
    remove row=1 cond Abhilash Majumder 2024-04-08 08:55:02 +05:30
  • 88f4b2ee45 Comment explaining a decision Kunnis 2024-04-07 19:23:10 -05:00
  • 71f9e479aa llama: dbrx: Try another rope type Pierrick HYMBERT 2024-04-08 01:29:00 +02:00
  • f8f97e74f9 llama: dbrx: hardcode nn.LayerNorm epsilon Pierrick HYMBERT 2024-04-08 01:17:33 +02:00
  • 74e6d876f6 llama: dbrx: fix build kv att out tensor name Pierrick HYMBERT 2024-04-08 00:37:28 +02:00
  • b01b062ab5 llama: dbrx: fix build kv att out Pierrick HYMBERT 2024-04-08 00:25:54 +02:00
  • 993f836029 llama: dbrx: move norm2 after attention, fix build kv Pierrick HYMBERT 2024-04-08 00:11:19 +02:00
  • 2897aa628c llama: dbrx: revert Pierrick HYMBERT 2024-04-07 23:47:26 +02:00
  • 830e46d7ae llama: dbrx: fix last normalization Pierrick HYMBERT 2024-04-07 23:40:12 +02:00
  • 420cf62838 Fixed mismatch type errors Flipbook 2024-04-07 14:13:22 -07:00
  • 95bf5f7d87 llama_sampling_sample with default args is more naively usable Flipbook 2024-04-06 18:29:05 -07:00
  • 23f7d71a2b cleanup slaren 2024-04-07 22:04:33 +02:00
  • f3f7627bd8 refactor moe ffn to llm_build_moe_ffn slaren 2024-04-07 21:14:23 +02:00
  • 0ab1bae854 llama: dbrx: output norm dim Pierrick HYMBERT 2024-04-07 20:56:53 +02:00
  • 855f54402e
    Change Windows AMD example to release build to make inference much faster. (#6525) Mark Fairbairn 2024-04-07 19:52:19 +01:00
  • c436d3a3a5
    Change Windows AMD example to release build to make inference much faster. Mark Fairbairn 2024-04-07 19:37:57 +01:00
  • bc615548d3 fix windows build slaren 2024-04-07 20:31:09 +02:00
  • b909236c0b
    flake.lock: Update (#6517) Georgi Gerganov 2024-04-07 21:25:30 +03:00
  • 50b4373673 model: dbrx: weird fix expert reshape Pierrick HYMBERT 2024-04-07 20:14:43 +02:00
  • e2c919962b model: dbrx: fix again sic expert reshape Pierrick HYMBERT 2024-04-07 20:10:16 +02:00
  • 1b5d78d3ee minor slaren 2024-04-06 15:19:47 +02:00
  • c9bddbf253 model: dbrx: fix expert reshape Pierrick HYMBERT 2024-04-07 19:38:35 +02:00
  • e0717e751e
    Add GritLM as supported models. (#6513) DAN™ 2024-04-07 13:33:59 -04:00
  • 7dd84b0924 model: dbrx: fix expert reshape Pierrick HYMBERT 2024-04-07 19:12:24 +02:00
  • dbfd59114f model: dbrx: fix tensor names mapping broken Pierrick HYMBERT 2024-04-07 18:52:28 +02:00
  • f062b834ed model: dbrx: convert experts to f16 Pierrick HYMBERT 2024-04-07 18:47:37 +02:00
  • d151d8fad9 model: dbrx: convert reshape expert tensors to 3D Pierrick HYMBERT 2024-04-07 18:41:33 +02:00
  • e9987c66d0 llama: dbrx: fix tensor qkv number of elements Pierrick HYMBERT 2024-04-07 18:21:57 +02:00
  • 1bd94270e5 llama: quantize: remove wrong look for tensor qkv name as it was badly missing the .weight suffix model: dbrx: convert to gguf force experts tensors to have .weight suffix Pierrick HYMBERT 2024-04-07 17:55:22 +02:00
  • 2449ef48a9 llama: dbrx: no weight suffix in ffn_gate_exps, ffn_up_exps and ffn_down_exps. Output tensor not optional. Pierrick HYMBERT 2024-04-07 16:57:13 +02:00
  • 8154617ff2 model: dbrx: convert-hf-to-gguf.py support python 3.8 Pierrick HYMBERT 2024-04-07 17:25:39 +02:00
  • 3a9dc2eee2 model: dbrx: convert-hf-to-gguf.py fix 'token_embd.weight' has wrong shape, fix special tokens Pierrick HYMBERT 2024-04-07 17:21:35 +02:00
  • d2924073ee
    Type convention Carolinabanana 2024-04-07 15:50:24 +01:00
  • c37247796b
    sync : ggml Georgi Gerganov 2024-04-07 17:05:51 +03:00
  • f77261a7c5
    ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020) Slava Primenko 2024-04-04 14:49:24 +02:00
  • d7546fda64 llama: quantize: remove wrong look for tensor qkv name as it was badly missing the .weight suffix Pierrick HYMBERT 2024-04-07 15:59:07 +02:00
  • 9e17dad087 model: dbrx: convert-hf-to-gguf.py add chat template Pierrick HYMBERT 2024-04-07 15:57:36 +02:00
  • 200ce21436 model: dbrx: convert-hf-to-gguf.py fix fix ftype missing, fix tensor names does not suffix with .weight Pierrick HYMBERT 2024-04-07 15:54:19 +02:00
  • 1fb6d95c1d model: convert-hf-to-gguf.py fix classname conflict with qwen2 Pierrick HYMBERT 2024-04-07 15:40:21 +02:00
  • 43e8995e75
    scripts : sync ggml-cuda folder Georgi Gerganov 2024-04-07 16:08:12 +03:00
  • 9472bce308
    Run make to build the project (#6457) limitedAtonement 2024-04-07 07:05:40 -04:00
  • 61be4b91a6 model: convert-hf-to-gguf.py add _set_vocab_tiktoken gpt2 backed on llama.cpp Pierrick HYMBERT 2024-04-07 12:15:16 +02:00
  • 88b35f7b26 Run make to build the project lmat 2024-04-03 09:21:25 -04:00
  • dccb012637 llama: dbrx: quantize fix n_attention_wv tensor name Pierrick HYMBERT 2024-04-07 05:09:17 +02:00
  • b6522a9f5b model: dbrx: convert fix tokenizer Pierrick HYMBERT 2024-04-07 04:23:32 +02:00
  • 305ac3b61b llama: dbrx: quantize fix n_attention_wv tensor name Pierrick HYMBERT 2024-04-07 05:01:33 +02:00
  • d4f220a5cc
    support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521) b2620 Neo Zhang Jianyu 2024-04-07 10:55:59 +08:00
  • f381347e0d support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M Jianyu Zhang 2024-04-07 10:15:21 +08:00
  • 06a59abf0a model: dbrx: convert add n_ff Pierrick HYMBERT 2024-04-07 03:17:24 +02:00
  • 52c403355f llama: increase maximum experts allowed Pierrick HYMBERT 2024-04-07 03:16:33 +02:00
  • c71f095aad flake.lock: Update github-actions[bot] 2024-04-07 00:18:17 +00:00
  • 7e7cd53ca6 llama: dbrx: remove unnecessary optional tensor on FFN_GATE_EXPS Pierrick HYMBERT 2024-04-06 23:55:37 +02:00
  • 69856297b9 Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx Pierrick HYMBERT 2024-04-06 23:53:11 +02:00
  • 4f12a580d9 llama: dbrx: remove not existing condition on empty output layer Pierrick HYMBERT 2024-04-06 23:35:23 +02:00
  • fe8089871e model: dbrx: fix missing embedding tensor, mix with output layer Pierrick HYMBERT 2024-04-06 23:27:29 +02:00
  • 8b6577bd63
    Merge pull request #1 from slaren/cmrp-fixes Carolinabanana 2024-04-06 21:59:53 +01:00
  • 78819c07ff fix overflow issues during quant and other cleanup slaren 2024-04-06 22:37:08 +02:00
  • ce9413d849 export norms as f32 slaren 2024-04-06 22:33:36 +02:00
  • 9c7dedb0f3 llama: dbrx: no attention output layer Pierrick HYMBERT 2024-04-06 22:22:57 +02:00
  • 26e8f23bf3 Reverted blocked multiplication code as it still has issues and could affect other Llama arches S 2024-04-06 20:21:42 +01:00
  • 76f266beef scripts: get-wikitext-2 add unzip Pierrick HYMBERT 2024-04-06 21:10:19 +02:00
  • 03da419fc0 llama: dbrx: remove wrong attn output layer in model arch Pierrick HYMBERT 2024-04-06 20:43:46 +02:00
  • 916b91852b convert: dbrx: fix remove wrong ATTN_OUT_NORM tensor, add output layer mapping Pierrick HYMBERT 2024-04-06 20:30:30 +02:00
  • c8e6f903e0 doc: dbrx: add the model as supported Pierrick HYMBERT 2024-04-06 20:09:01 +02:00
  • 0a35f5881b convert: dbrx: fix mixed up and down expert tensors llama: dbrx: review graph Pierrick HYMBERT 2024-04-06 19:56:37 +02:00
  • e3c1e8127c convert: dbrx: fix mixed up and down expert tensors Pierrick HYMBERT 2024-04-06 19:21:43 +02:00
  • a7f9a3eafc dbrx: minor Pierrick HYMBERT 2024-04-06 18:59:53 +02:00
  • 9a0aa24ac7 Add GritLM as supported models. DAN™ 2024-04-06 11:32:53 -04:00
  • 54ea0698fb
    sync : ggml b2619 Georgi Gerganov 2024-04-06 17:43:15 +03:00
  • 2549662cde implement ggml_backend_sycl_get_free_device_memory for all GPU support Yui 2024-04-06 17:03:04 +02:00
  • b66aec675c
    backend : fix typo in scheduler documentation (ggml/781) Daniel Bevenius 2024-04-03 22:57:20 +02:00
  • 57dd02c44b
    Tests: Added integration tests for GBNF parser (#6472) Clint Herron 2024-04-06 10:31:33 -04:00
  • 746d5fb3c9
    Merge branch 'ggerganov:master' into master Yui 2024-04-06 16:30:11 +02:00
  • e4f8ee4f48 llama: support dbrx fix norm type Pierrick HYMBERT 2024-04-06 16:08:25 +02:00
  • 09210334bf model: dbrx fix python linter in convert-hf-to-gguf.py Pierrick HYMBERT 2024-04-06 16:00:32 +02:00
  • c0beb3cf7e llama: add label for model 132B Pierrick HYMBERT 2024-04-06 15:58:17 +02:00
  • 3937100adb model: dbrx, trust remote code Pierrick HYMBERT 2024-04-06 15:57:57 +02:00
  • 3e3d2d127c gguf-py: remove wrong clip -> clamp Pierrick HYMBERT 2024-04-06 15:46:47 +02:00
  • ed582c1dde llama: support dbrx #6344 Pierrick HYMBERT 2024-04-06 15:16:42 +02:00
  • 6745ea7a65 dranger003: Fix block index overflow in CUDA dequantizing. S 2024-04-06 14:20:27 +01:00
  • f13818d77d
    Merge branch 'master' into name-metadata-fix Brian 2024-04-06 23:52:38 +11:00
  • 1d8de31565 model: dbrx convert to gguf #6344 Pierrick HYMBERT 2024-04-06 13:52:11 +02:00
  • c2658c3ae8 Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda) S 2024-04-06 11:23:38 +01:00
  • 75cd4c7729
    ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495) Pierrick Hymbert 2024-04-06 05:40:47 +02:00
  • 60a01b3ddc
    Hacky func streaming (#1) Yingbei Tong 2024-04-05 20:59:58 +00:00
  • bf94e9f788 reject whitespace and trailing dot Jan Boon 2024-04-06 03:14:39 +08:00
  • 2fbf0c3495 also reject empty filename Jan Boon 2024-04-06 02:43:13 +08:00
  • a8bd14d557
    gguf.py : add licence and version to gguf writer (#6504) b2615 Brian 2024-04-06 05:41:38 +11:00
  • d0f5deebf8
    readme : update UI list (#6503) Hoang Nguyen 2024-04-05 11:39:43 -07:00
  • 87e21bbacd
    bench : make n_batch and n_ubatch configurable in Batched bench (#6500) b2613 Ting Sun 2024-04-06 01:34:53 +07:00
  • 0acc56719a
    Update common.cpp cpumaxx 2024-04-05 11:20:34 -07:00