Commit graph

  • 368272c54b ggml_debug: add main test label Pierrick HYMBERT 2024-04-10 21:52:04 +02:00
  • 1a031d39ae ci: build revert label Pierrick HYMBERT 2024-04-10 21:48:48 +02:00
  • f3f0d1818f common: fix cb_eval and user data not initialized Pierrick HYMBERT 2024-04-10 21:40:52 +02:00
  • ca6f3ff4e0 ggml_debug: fix trailing spaces Pierrick HYMBERT 2024-04-10 21:20:32 +02:00
  • 08fa088d74 ggml_debug: fix trailing spaces Pierrick HYMBERT 2024-04-10 21:19:09 +02:00
  • fe4b1915f1 ggml_debug: Remove unused param n_batch, no batching here Pierrick HYMBERT 2024-04-10 21:15:21 +02:00
  • 2d34bbe2a0 ggml_debug: EOL in CMakeLists.txt Pierrick HYMBERT 2024-04-10 21:07:45 +02:00
  • cda1d42a64 ggml_debug: ci: add tests Pierrick HYMBERT 2024-04-10 21:06:12 +02:00
  • 01dd5e9776 ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random. Pierrick HYMBERT 2024-04-10 20:52:38 +02:00
  • 8fe3be8d68 llama: cv eval: move cb eval field in common gpt_params Pierrick HYMBERT 2024-04-10 20:51:54 +02:00
  • 8228b66dbc
    gguf : add option to not check tensor data (#6582) b2647 Daniel Bevenius 2024-04-10 20:16:48 +02:00
  • 74529e54e5 llama: dbrx: use the MOE naming convention for model type Pierrick HYMBERT 2024-04-10 19:27:53 +02:00
  • 6f813dcc6a Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx Pierrick HYMBERT 2024-04-10 19:24:38 +02:00
  • b3a96f27f0
    minor layout improvements (#6572) b2646 Ralph Soika 2024-04-10 19:18:25 +02:00
  • 502d069b14 Remove split metadata when quantize model shards z5269887 2024-04-11 00:39:15 +08:00
  • 4f407a0a35
    llama : add model types for mixtral (#6589) b2645 slaren 2024-04-10 17:24:14 +02:00
  • bb18b19a87 llama : add model types for mixtral slaren 2024-04-10 16:21:28 +02:00
  • 06808a3d0d
    Support converting models with multiple chat templates Sigbjørn Skjæret 2024-04-10 15:31:05 +02:00
  • 65c64dc36f
    convert.py : add consolidated.safetensors for mixtral 8x22b (#6587) slaren 2024-04-10 15:23:12 +02:00
  • d285fb6558 convert.py : add consolidated.safetensors for mixtral 8x22b slaren 2024-04-10 14:39:43 +02:00
  • f63b722486 gguf-debug: no mutex, verify type, fix stride. Pierrick HYMBERT 2024-04-10 09:50:45 +02:00
  • d220a06b3c
    gguf: add option to not check tensor data Daniel Bevenius 2024-04-10 09:02:24 +02:00
  • 67fac4b95f
    docs : how to add a model (#6565) Pierrick Hymbert 2024-04-10 08:58:48 +02:00
  • 7bc1d0bdb7
    Update README.md Georgi Gerganov 2024-04-10 09:58:21 +03:00
  • 29122d32ac
    readme : fix ROCm link (#6579) Artem Zinnatullin 2024-04-10 00:49:12 -06:00
  • b089987cf3
    Merge branch 'ggerganov:master' into master Ralph Soika 2024-04-10 08:40:09 +02:00
  • 4dfc562682 added missing file, run deps.sh locally Ralph Soika 2024-04-10 08:39:06 +02:00
  • b231b37b09
    readme : update UI list (#6560) sjxx 2024-04-10 14:34:00 +08:00
  • 8c0a5b0e1e
    Update llava-cli.cpp cpumaxx 2024-04-09 21:07:37 -07:00
  • 124e259dc6
    Update common.cpp cpumaxx 2024-04-09 21:03:11 -07:00
  • 7e1b9e0b70
    Update common.h cpumaxx 2024-04-09 21:01:50 -07:00
  • c9d0e8e6eb
    Fix ROCm link in README Artem Zinnatullin 2024-04-10 06:47:04 +03:00
  • 067e294783 gguf-debug: Example how to use ggml callback for debugging Pierrick HYMBERT 2024-04-10 03:35:57 +02:00
  • 29b084c072
    Update Makefile Nikolas 2024-04-10 03:24:17 +02:00
  • 21e15f8b88
    Update Makefile Nikolas 2024-04-10 03:23:14 +02:00
  • 377a52427c
    Refactor Error Handling for CUDA Nikolas 2024-04-10 03:17:59 +02:00
  • d66849f628 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-04-09 20:22:19 -04:00
  • ba5e134e07
    readme: fix typo in amdgpu target name (#6573) Jiří Sejkora 2024-04-10 00:23:02 +02:00
  • d88ee6d79a
    readme: fix typo in amdgpu target name Jiří Sejkora 2024-04-10 00:02:45 +02:00
  • 0c8b3b2095 llama : correctly handle more edge cases for the rs cache Francis Couture-Harpin 2024-04-09 17:35:22 -04:00
  • 6abc0fa0b9 minor layout improvements Ralph Soika 2024-04-09 20:58:47 +02:00
  • 1b67731e18
    BERT tokenizer fixes (#6498) Jared Van Bortel 2024-04-09 13:44:08 -04:00
  • c4a3a4ff47
    sync : ggml b2638 Georgi Gerganov 2024-04-09 20:29:06 +03:00
  • 63bb8e543a
    docs : some fixes Georgi Gerganov 2024-04-09 18:50:55 +03:00
  • bb4af0f764 docs: model: README.md fix trailing spaces Pierrick HYMBERT 2024-04-09 16:54:42 +02:00
  • aafe2b9016 docs: model: rephrasing README.md Pierrick HYMBERT 2024-04-09 16:37:34 +02:00
  • 32f35ed6b3 docs: model: rephrasing README.md Pierrick HYMBERT 2024-04-09 16:35:45 +02:00
  • 4be9750d49 docs: model: add prevision on RoPE Pierrick HYMBERT 2024-04-09 16:34:55 +02:00
  • 90c67719bf docs: model: typo and docs Pierrick HYMBERT 2024-04-09 16:32:05 +02:00
  • 9c33ee9930 json: fix server/README (json_schema in /completion vs. result_format in /v1/chat/completions) Olivier Chafik 2024-04-09 15:30:59 +01:00
  • 797d22027c docs: how to add a model Pierrick HYMBERT 2024-04-09 16:25:17 +02:00
  • c481e11f41 Fix more int overflow during quant. DAN™ 2024-04-09 09:13:48 -04:00
  • e5631cf25a Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx Pierrick HYMBERT 2024-04-09 15:10:51 +02:00
  • 5139d8dbdf Adding eva to UI list sjxx 2024-04-09 18:48:00 +08:00
  • 400d5d722d
    server : detect search query to start webchat (#6554) Ed Lee 2024-04-09 01:31:47 -07:00
  • 5dc9dd7152
    llama : add Command R Plus support (#6491) b2636 Carolinabanana 2024-04-09 09:16:13 +01:00
  • e11a8999b5
    license : update copyright notice + add AUTHORS (#6405) Georgi Gerganov 2024-04-09 09:23:19 +03:00
  • 072e0a4d3b
    scipts : add LICENSE and gen-authors.sh to sync gg/authors Georgi Gerganov 2024-04-09 09:19:33 +03:00
  • 0e0d4e821f
    authors : update Georgi Gerganov 2024-04-09 09:14:03 +03:00
  • ac75fbd8c5 gguf-py: dbrx: reverse again the MOE tensors mapping: layer.ffn_up_exps -> Up-projection weights (w1) layer.ffn_gate_exps -> Gating weights (v1) layer.ffn_down_exps -> Down-projection weights (w2) Pierrick HYMBERT 2024-04-09 02:41:39 +02:00
  • ac82aa0e63 gguf-py: revert spaces Pierrick HYMBERT 2024-04-09 01:26:57 +02:00
  • 67a5184fa3 json: fix type error w/ python 3.8 ochafik 2024-04-09 00:07:18 +01:00
  • 3c81e944ce nits ochafik 2024-04-09 00:02:04 +01:00
  • c7b9a2e85e llama: dbrx: fix ggml context of the attention outputs weight Pierrick HYMBERT 2024-04-09 00:58:50 +02:00
  • 6c885dce8b server+json: update server/README w/ result_format ochafik 2024-04-08 23:35:28 +01:00
  • 14ff4e2571
    server : detect search query to start webchat Ed Lee 2024-04-08 15:26:23 -07:00
  • de4e60ea67 json: support string minLength/maxLength ochafik 2024-04-08 23:08:53 +01:00
  • 181f984def json: unify all repetition code (w/ or w/o sep) ochafik 2024-04-08 23:06:42 +01:00
  • ea1aeba48b dranger003: Fix more int overflow during quant. S 2024-04-08 22:47:46 +01:00
  • 55943a281f model: dbrx: convert fix mixed ffn_gate_exps and ffn_down_exps Pierrick HYMBERT 2024-04-08 21:47:59 +02:00
  • cc4a95426d
    llama : fix attention layer count sanity check (#6550) Georgi Gerganov 2024-04-08 22:25:49 +03:00
  • dcf5d3283a json: cap length of numbers to 15 digits before/after decimal point ochafik 2024-04-08 20:14:36 +01:00
  • ea8b58c6cd llama: dbrx: first add the residuals and then do the norm Pierrick HYMBERT 2024-04-08 21:10:49 +02:00
  • 07163fb627 grammars: add troubleshooting section to readme ochafik 2024-04-08 20:10:15 +01:00
  • a59e9431fc json: optimize repetitions for minItems/maxItems and regexps: a{,3} goes from "a"? "a"? "a"? (explosive combos) to (a (a (a)?)?)? ochafik 2024-04-08 20:02:18 +01:00
  • 159b883bd4 json: deps management for primitive rules (+ allow null values) ochafik 2024-04-08 19:53:30 +01:00
  • 7bab4c055c llama : fix parentheses in attention layer count sanity check Francis Couture-Harpin 2024-04-08 14:41:39 -04:00
  • f30a73bb01 llama: dbrx: rename layer_out_norm to attn_out_norm Pierrick HYMBERT 2024-04-08 20:38:31 +02:00
  • f771a8f1b5 server: skip null json_schema / grammar fields ochafik 2024-04-08 19:33:26 +01:00
  • 2148f244ca json: rename python schema converter to make import easier ochafik 2024-04-08 19:32:30 +01:00
  • 6804714190
    llama : fix attention layer count sanity check Georgi Gerganov 2024-04-08 21:17:40 +03:00
  • e66f1e3448 llama: dbrx: document changes, permute only FFN_DOWN_EXPS. Add a check for ftype Pierrick HYMBERT 2024-04-08 20:08:54 +02:00
  • 9968952921 llama: dbrx: fix experts 3D tensor layout (again) Pierrick HYMBERT 2024-04-08 19:37:23 +02:00
  • 18a84fedda llama: dbrx: fix experts 3D tensor layout (again) Pierrick HYMBERT 2024-04-08 19:12:53 +02:00
  • 48909ed2a7 model: dbrx convert permute experts directly torch, log shape Pierrick HYMBERT 2024-04-08 19:01:44 +02:00
  • 9a43e80820 update imatrix slaren 2024-04-08 18:01:02 +02:00
  • f20c04f01f llama: factorize moe graph implementation between grok, mixtral and dbrx Pierrick HYMBERT 2024-04-08 17:45:35 +02:00
  • cecd8d3c98
    Comment explaining a decision (#6531) b2633 kunnis 2024-04-08 10:44:19 -05:00
  • 21fb24aa45 model: dbrx: convert-hf-to-gguf.py fix experts tensors shapes Pierrick HYMBERT 2024-04-08 16:55:56 +02:00
  • 0028010d01 llama : state checkpoints for recurrent models Francis Couture-Harpin 2024-04-08 09:54:35 -04:00
  • b73e564b16
    quantize : fix precedence of cli args (#6541) b2632 Georgi Gerganov 2024-04-08 16:23:01 +03:00
  • 81f308ad64 llama: dbrx: fix experts tensor layout Pierrick HYMBERT 2024-04-08 15:04:18 +02:00
  • e3c337d87c
    llama : support negative ith in llama_get_ API (#6519) Rick G 2024-04-08 06:02:30 -07:00
  • beea6e1b16
    llama : save and restore kv cache for single seq id (#6341) b2630 Jan Boon 2024-04-08 20:43:30 +08:00
  • eb0847e6b1 llama: dbrx: load norm eps in hparams Pierrick HYMBERT 2024-04-08 14:38:21 +02:00
  • 506cc2ea53 llama: dbrx: convert remove previous reverse Pierrick HYMBERT 2024-04-08 14:09:06 +02:00
  • 35dce3e145 llama: dbrx: rename tensor to actual meaning. Fix normalization in graph. Permute expert tensors to the llama.cpp layout Pierrick HYMBERT 2024-04-08 14:02:08 +02:00
  • f6da969be7
    quantize : fix precedence of cli args Georgi Gerganov 2024-04-08 14:24:36 +03:00
  • 8e22688401 llama: dbrx: move norm epsilon to convert. Fix missing normalization. Pierrick HYMBERT 2024-04-08 11:22:24 +02:00
  • f8b8d2f44f
    Merge 3c49d9387a into 87fb5b4234 Moritz Raguschat 2024-04-08 10:51:45 +02:00