llama.cpp

Author	SHA1	Message	Date
Pierrick HYMBERT	830e46d7ae	llama: dbrx: fix last normalization	2024-04-07 23:40:12 +02:00
Pierrick HYMBERT	0ab1bae854	llama: dbrx: output norm dim	2024-04-07 20:56:53 +02:00
Mark Fairbairn	855f54402e	Change Windows AMD example to release build to make inference much faster. (#6525 )	2024-04-07 20:52:19 +02:00
Georgi Gerganov	b909236c0b	flake.lock: Update (#6517 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/f7b3c975cf067e56e7cda6cb098ebe3fb4d74ca2' (2024-03-01) → 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d' (2024-04-01) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8?dir=lib' (2024-02-29) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib' (2024-03-29) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) → 'github:NixOS/nixpkgs/fd281bd6b7d3e32ddfa399853946f782553163b5' (2024-04-03) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-04-07 11:25:30 -07:00
Pierrick HYMBERT	50b4373673	model: dbrx: weird fix expert reshape	2024-04-07 20:14:43 +02:00
Pierrick HYMBERT	e2c919962b	model: dbrx: fix again sic expert reshape	2024-04-07 20:10:16 +02:00
Pierrick HYMBERT	c9bddbf253	model: dbrx: fix expert reshape	2024-04-07 19:38:35 +02:00
DAN™	e0717e751e	Add GritLM as supported models. (#6513 )	2024-04-07 19:33:59 +02:00
Pierrick HYMBERT	7dd84b0924	model: dbrx: fix expert reshape	2024-04-07 19:12:24 +02:00
Pierrick HYMBERT	dbfd59114f	model: dbrx: fix tensor names mapping broken	2024-04-07 18:52:28 +02:00
Pierrick HYMBERT	f062b834ed	model: dbrx: convert experts to f16	2024-04-07 18:47:37 +02:00
Pierrick HYMBERT	d151d8fad9	model: dbrx: convert reshape expert tensors to 3D	2024-04-07 18:41:33 +02:00
Pierrick HYMBERT	e9987c66d0	llama: dbrx: fix tensor qkv number of elements	2024-04-07 18:21:57 +02:00
Pierrick HYMBERT	1bd94270e5	llama: quantize: remove wrong look for tensor qkv name as it was badly missing the .weight suffix model: dbrx: convert to gguf force experts tensors to have .weight suffix	2024-04-07 17:55:33 +02:00
Pierrick HYMBERT	2449ef48a9	llama: dbrx: no weight suffix in ffn_gate_exps, ffn_up_exps and ffn_down_exps. Output tensor not optional.	2024-04-07 17:55:33 +02:00
Pierrick HYMBERT	8154617ff2	model: dbrx: convert-hf-to-gguf.py support python 3.8	2024-04-07 17:25:39 +02:00
Pierrick HYMBERT	3a9dc2eee2	model: dbrx: convert-hf-to-gguf.py fix 'token_embd.weight' has wrong shape, fix special tokens	2024-04-07 17:21:35 +02:00
Georgi Gerganov	c37247796b	sync : ggml	2024-04-07 17:05:51 +03:00
Slava Primenko	f77261a7c5	ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020) `cudaHostRegisterReadOnly` parameter was only introduced in CUDA 11.1 See this issue for more details: https://github.com/ggerganov/examples/whisper/whisper.cpp/issues/2007	2024-04-07 17:05:40 +03:00
Pierrick HYMBERT	d7546fda64	llama: quantize: remove wrong look for tensor qkv name as it was badly missing the .weight suffix	2024-04-07 15:59:07 +02:00
Pierrick HYMBERT	9e17dad087	model: dbrx: convert-hf-to-gguf.py add chat template	2024-04-07 15:57:36 +02:00
Pierrick HYMBERT	200ce21436	model: dbrx: convert-hf-to-gguf.py fix fix ftype missing, fix tensor names does not suffix with .weight	2024-04-07 15:54:19 +02:00
Pierrick HYMBERT	1fb6d95c1d	model: convert-hf-to-gguf.py fix classname conflict with qwen2	2024-04-07 15:40:21 +02:00
Georgi Gerganov	43e8995e75	scripts : sync ggml-cuda folder	2024-04-07 16:08:12 +03:00
limitedAtonement	9472bce308	Run make to build the project (#6457 )	2024-04-07 13:05:40 +02:00
Pierrick HYMBERT	61be4b91a6	model: convert-hf-to-gguf.py add _set_vocab_tiktoken gpt2 backed on llama.cpp	2024-04-07 12:15:16 +02:00
Pierrick HYMBERT	dccb012637	llama: dbrx: quantize fix n_attention_wv tensor name	2024-04-07 05:09:17 +02:00
Pierrick HYMBERT	b6522a9f5b	model: dbrx: convert fix tokenizer	2024-04-07 05:02:14 +02:00
Pierrick HYMBERT	305ac3b61b	llama: dbrx: quantize fix n_attention_wv tensor name	2024-04-07 05:01:33 +02:00
Neo Zhang Jianyu	d4f220a5cc	support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521 )	2024-04-07 10:55:59 +08:00
Pierrick HYMBERT	06a59abf0a	model: dbrx: convert add n_ff	2024-04-07 03:17:24 +02:00
Pierrick HYMBERT	52c403355f	llama: increase maximum experts allowed	2024-04-07 03:16:33 +02:00
Pierrick HYMBERT	7e7cd53ca6	llama: dbrx: remove unnecessary optional tensor on FFN_GATE_EXPS	2024-04-06 23:55:37 +02:00
Pierrick HYMBERT	69856297b9	Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx	2024-04-06 23:53:11 +02:00
Pierrick HYMBERT	4f12a580d9	llama: dbrx: remove not existing condition on empty output layer	2024-04-06 23:35:23 +02:00
Pierrick HYMBERT	fe8089871e	model: dbrx: fix missing embedding tensor, mix with output layer	2024-04-06 23:27:29 +02:00
Pierrick HYMBERT	9c7dedb0f3	llama: dbrx: no attention output layer	2024-04-06 22:25:37 +02:00
Pierrick HYMBERT	76f266beef	scripts: get-wikitext-2 add unzip	2024-04-06 21:10:19 +02:00
Pierrick HYMBERT	03da419fc0	llama: dbrx: remove wrong attn output layer in model arch	2024-04-06 20:43:46 +02:00
Pierrick HYMBERT	916b91852b	convert: dbrx: fix remove wrong ATTN_OUT_NORM tensor, add output layer mapping	2024-04-06 20:30:30 +02:00
Pierrick HYMBERT	c8e6f903e0	doc: dbrx: add the model as supported	2024-04-06 20:09:01 +02:00
Pierrick HYMBERT	0a35f5881b	convert: dbrx: fix mixed up and down expert tensors llama: dbrx: review graph	2024-04-06 19:56:37 +02:00
Pierrick HYMBERT	e3c1e8127c	convert: dbrx: fix mixed up and down expert tensors	2024-04-06 19:21:43 +02:00
Pierrick HYMBERT	a7f9a3eafc	dbrx: minor	2024-04-06 19:09:04 +02:00
Georgi Gerganov	54ea0698fb	sync : ggml	2024-04-06 18:27:46 +03:00
Daniel Bevenius	b66aec675c	backend : fix typo in scheduler documentation (ggml/781) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-06 17:42:26 +03:00
Clint Herron	57dd02c44b	Tests: Added integration tests for GBNF parser (#6472 ) * Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements. * Fixing whitespace errors and cleaning error message alert to be clearer. * Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API. * Comment cleanup. * Reorganizing tests for readability. * Cleaning up debug message to make a bit more sense.	2024-04-06 10:31:33 -04:00
Pierrick HYMBERT	e4f8ee4f48	llama: support dbrx fix norm type	2024-04-06 16:14:58 +02:00
Pierrick HYMBERT	09210334bf	model: dbrx fix python linter in convert-hf-to-gguf.py	2024-04-06 16:00:32 +02:00
Pierrick HYMBERT	c0beb3cf7e	llama: add label for model 132B	2024-04-06 15:58:17 +02:00

1 2 3 4 5 ...

2720 commits