llama.cpp

Author	SHA1	Message	Date
Pierrick HYMBERT	dbfd59114f	model: dbrx: fix tensor names mapping broken	2024-04-07 18:52:28 +02:00
Pierrick HYMBERT	f062b834ed	model: dbrx: convert experts to f16	2024-04-07 18:47:37 +02:00
Pierrick HYMBERT	d151d8fad9	model: dbrx: convert reshape expert tensors to 3D	2024-04-07 18:41:33 +02:00
Pierrick HYMBERT	e9987c66d0	llama: dbrx: fix tensor qkv number of elements	2024-04-07 18:21:57 +02:00
Pierrick HYMBERT	1bd94270e5	llama: quantize: remove wrong look for tensor qkv name as it was badly missing the .weight suffix model: dbrx: convert to gguf force experts tensors to have .weight suffix	2024-04-07 17:55:33 +02:00
Pierrick HYMBERT	2449ef48a9	llama: dbrx: no weight suffix in ffn_gate_exps, ffn_up_exps and ffn_down_exps. Output tensor not optional.	2024-04-07 17:55:33 +02:00
Pierrick HYMBERT	8154617ff2	model: dbrx: convert-hf-to-gguf.py support python 3.8	2024-04-07 17:25:39 +02:00
Pierrick HYMBERT	3a9dc2eee2	model: dbrx: convert-hf-to-gguf.py fix 'token_embd.weight' has wrong shape, fix special tokens	2024-04-07 17:21:35 +02:00
Pierrick HYMBERT	d7546fda64	llama: quantize: remove wrong look for tensor qkv name as it was badly missing the .weight suffix	2024-04-07 15:59:07 +02:00
Pierrick HYMBERT	9e17dad087	model: dbrx: convert-hf-to-gguf.py add chat template	2024-04-07 15:57:36 +02:00
Pierrick HYMBERT	200ce21436	model: dbrx: convert-hf-to-gguf.py fix fix ftype missing, fix tensor names does not suffix with .weight	2024-04-07 15:54:19 +02:00
Pierrick HYMBERT	1fb6d95c1d	model: convert-hf-to-gguf.py fix classname conflict with qwen2	2024-04-07 15:40:21 +02:00
Pierrick HYMBERT	61be4b91a6	model: convert-hf-to-gguf.py add _set_vocab_tiktoken gpt2 backed on llama.cpp	2024-04-07 12:15:16 +02:00
Pierrick HYMBERT	dccb012637	llama: dbrx: quantize fix n_attention_wv tensor name	2024-04-07 05:09:17 +02:00
Pierrick HYMBERT	b6522a9f5b	model: dbrx: convert fix tokenizer	2024-04-07 05:02:14 +02:00
Pierrick HYMBERT	305ac3b61b	llama: dbrx: quantize fix n_attention_wv tensor name	2024-04-07 05:01:33 +02:00
Pierrick HYMBERT	06a59abf0a	model: dbrx: convert add n_ff	2024-04-07 03:17:24 +02:00
Pierrick HYMBERT	52c403355f	llama: increase maximum experts allowed	2024-04-07 03:16:33 +02:00
Pierrick HYMBERT	7e7cd53ca6	llama: dbrx: remove unnecessary optional tensor on FFN_GATE_EXPS	2024-04-06 23:55:37 +02:00
Pierrick HYMBERT	69856297b9	Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx	2024-04-06 23:53:11 +02:00
Pierrick HYMBERT	4f12a580d9	llama: dbrx: remove not existing condition on empty output layer	2024-04-06 23:35:23 +02:00
Pierrick HYMBERT	fe8089871e	model: dbrx: fix missing embedding tensor, mix with output layer	2024-04-06 23:27:29 +02:00
Pierrick HYMBERT	9c7dedb0f3	llama: dbrx: no attention output layer	2024-04-06 22:25:37 +02:00
Pierrick HYMBERT	76f266beef	scripts: get-wikitext-2 add unzip	2024-04-06 21:10:19 +02:00
Pierrick HYMBERT	03da419fc0	llama: dbrx: remove wrong attn output layer in model arch	2024-04-06 20:43:46 +02:00
Pierrick HYMBERT	916b91852b	convert: dbrx: fix remove wrong ATTN_OUT_NORM tensor, add output layer mapping	2024-04-06 20:30:30 +02:00
Pierrick HYMBERT	c8e6f903e0	doc: dbrx: add the model as supported	2024-04-06 20:09:01 +02:00
Pierrick HYMBERT	0a35f5881b	convert: dbrx: fix mixed up and down expert tensors llama: dbrx: review graph	2024-04-06 19:56:37 +02:00
Pierrick HYMBERT	e3c1e8127c	convert: dbrx: fix mixed up and down expert tensors	2024-04-06 19:21:43 +02:00
Pierrick HYMBERT	a7f9a3eafc	dbrx: minor	2024-04-06 19:09:04 +02:00
Georgi Gerganov	54ea0698fb	sync : ggml	2024-04-06 18:27:46 +03:00
Daniel Bevenius	b66aec675c	backend : fix typo in scheduler documentation (ggml/781) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-06 17:42:26 +03:00
Clint Herron	57dd02c44b	Tests: Added integration tests for GBNF parser (#6472 ) * Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements. * Fixing whitespace errors and cleaning error message alert to be clearer. * Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API. * Comment cleanup. * Reorganizing tests for readability. * Cleaning up debug message to make a bit more sense.	2024-04-06 10:31:33 -04:00
Pierrick HYMBERT	e4f8ee4f48	llama: support dbrx fix norm type	2024-04-06 16:14:58 +02:00
Pierrick HYMBERT	09210334bf	model: dbrx fix python linter in convert-hf-to-gguf.py	2024-04-06 16:00:32 +02:00
Pierrick HYMBERT	c0beb3cf7e	llama: add label for model 132B	2024-04-06 15:58:17 +02:00
Pierrick HYMBERT	3937100adb	model: dbrx, trust remote code	2024-04-06 15:57:57 +02:00
Pierrick HYMBERT	3e3d2d127c	gguf-py: remove wrong clip -> clamp	2024-04-06 15:46:47 +02:00
Pierrick HYMBERT	ed582c1dde	llama: support dbrx #6344	2024-04-06 15:22:55 +02:00
Pierrick HYMBERT	1d8de31565	model: dbrx convert to gguf #6344	2024-04-06 14:17:24 +02:00
Pierrick Hymbert	75cd4c7729	ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495 ) * ci: bench: support sse and fix prompt processing time server: add tokens usage in stream mode * ci: bench: README.md EOL * ci: bench: remove total pp and tg as it is not accurate * ci: bench: fix case when there is no token generated * ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics * ci: bench: fix finish reason rate	2024-04-06 05:40:47 +02:00
Brian	a8bd14d557	gguf.py : add licence and version to gguf writer (#6504 )	2024-04-05 21:41:38 +03:00
Hoang Nguyen	d0f5deebf8	readme : update UI list (#6503 ) * Add MindMac to UI list * Update proprietary description Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-04-05 21:39:43 +03:00
Ting Sun	87e21bbacd	bench : make n_batch and n_ubatch configurable in Batched bench (#6500 ) * bench: make n_batch and n_ubatch configurable * bench: update doc for batched bench	2024-04-05 21:34:53 +03:00
Ouadie EL FAROUKI	1b496a745c	[SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464 ) * moved INTEL_MKL guard from gemm_impl to gemm (wrapper) * Update ggml-sycl.cpp Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> --------- Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>	2024-04-05 19:05:06 +05:30
alexpinel	a307375c02	readme : add Dot to UI list (#6487 )	2024-04-04 13:22:50 -04:00
Jun Jie	b660a5729e	readme : fix typo (#6481 )	2024-04-04 13:16:37 -04:00
Ed Lepedus	0a1d889e27	server: add cURL support to server Dockerfiles (#6474 ) * server: add cURL support to `full.Dockerfile` * server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile` * server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile` * server: add cURL support to `server-intel.Dockerfile` * server: add cURL support to `server-vulkan.Dockerfile` * fix typo in `server-vulkan.Dockerfile` Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-04 18:31:22 +02:00
Minsoo Cheong	7dda1b727e	ci: exempt master branch workflows from getting cancelled (#6486 ) * ci: exempt master branch workflows from getting cancelled * apply to bench.yml	2024-04-04 18:30:53 +02:00
Ewout ter Hoeven	c666ba26c3	build CI: Name artifacts (#6482 ) Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP. It might be possible to further simplify the packing step (in future PRs).	2024-04-04 17:08:55 +02:00

1 2 3 4 5 ...

2656 commits