model: support arch DbrxForCausalLM (#6515)

* model: dbrx convert to gguf #6344 * llama: support dbrx #6344 * doc: dbrx: add the model as supported * scripts: get-wikitext-2 add unzip * llama: increase maximum experts allowed * llama: factorize moe graph implementation between grok, mixtral and dbrx --------- Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>
2024-04-13 11:33:52 +02:00 · 2024-04-13 11:33:52 +02:00 · 4bd0f93e4a
commit 4bd0f93e4a
parent ab9a3240a9
7 changed files with 428 additions and 148 deletions
--- a/README.md
+++ b/README.md
@ -94,6 +94,7 @@ Typically finetunes of the base models below are supported as well.
 - [x] LLaMA 2 🦙🦙
 - [X] [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1)
 - [x] [Mixtral MoE](https://huggingface.co/models?search=mistral-ai/Mixtral)
+- [x] [DBRX](https://huggingface.co/databricks/dbrx-instruct)
 - [X] Falcon
 - [X] [Chinese LLaMA / Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) and [Chinese LLaMA-2 / Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)
 - [X] [Vigogne (French)](https://github.com/bofenghuang/vigogne)