Pierrick HYMBERT
dbfd59114f
model: dbrx: fix tensor names mapping broken
2024-04-07 18:52:28 +02:00
Pierrick HYMBERT
f062b834ed
model: dbrx: convert experts to f16
2024-04-07 18:47:37 +02:00
Pierrick HYMBERT
d151d8fad9
model: dbrx: convert reshape expert tensors to 3D
2024-04-07 18:41:33 +02:00
Pierrick HYMBERT
e9987c66d0
llama: dbrx: fix tensor qkv number of elements
2024-04-07 18:21:57 +02:00
Pierrick HYMBERT
1bd94270e5
llama: quantize: remove wrong look for tensor qkv name as it was badly missing the .weight suffix
...
model: dbrx: convert to gguf force experts tensors to have .weight suffix
2024-04-07 17:55:33 +02:00
Pierrick HYMBERT
2449ef48a9
llama: dbrx: no weight suffix in ffn_gate_exps, ffn_up_exps and ffn_down_exps. Output tensor not optional.
2024-04-07 17:55:33 +02:00
Pierrick HYMBERT
8154617ff2
model: dbrx: convert-hf-to-gguf.py support python 3.8
2024-04-07 17:25:39 +02:00
Pierrick HYMBERT
3a9dc2eee2
model: dbrx: convert-hf-to-gguf.py fix 'token_embd.weight' has wrong shape, fix special tokens
2024-04-07 17:21:35 +02:00
Pierrick HYMBERT
d7546fda64
llama: quantize: remove wrong look for tensor qkv name as it was badly missing the .weight suffix
2024-04-07 15:59:07 +02:00
Pierrick HYMBERT
9e17dad087
model: dbrx: convert-hf-to-gguf.py add chat template
2024-04-07 15:57:36 +02:00
Pierrick HYMBERT
200ce21436
model: dbrx: convert-hf-to-gguf.py fix fix ftype missing, fix tensor names does not suffix with .weight
2024-04-07 15:54:19 +02:00
Pierrick HYMBERT
1fb6d95c1d
model: convert-hf-to-gguf.py fix classname conflict with qwen2
2024-04-07 15:40:21 +02:00
Pierrick HYMBERT
61be4b91a6
model: convert-hf-to-gguf.py add _set_vocab_tiktoken gpt2 backed on llama.cpp
2024-04-07 12:15:16 +02:00
Pierrick HYMBERT
dccb012637
llama: dbrx: quantize fix n_attention_wv tensor name
2024-04-07 05:09:17 +02:00
Pierrick HYMBERT
b6522a9f5b
model: dbrx: convert fix tokenizer
2024-04-07 05:02:14 +02:00
Pierrick HYMBERT
305ac3b61b
llama: dbrx: quantize fix n_attention_wv tensor name
2024-04-07 05:01:33 +02:00
Pierrick HYMBERT
06a59abf0a
model: dbrx: convert add n_ff
2024-04-07 03:17:24 +02:00
Pierrick HYMBERT
52c403355f
llama: increase maximum experts allowed
2024-04-07 03:16:33 +02:00
Pierrick HYMBERT
7e7cd53ca6
llama: dbrx: remove unnecessary optional tensor on FFN_GATE_EXPS
2024-04-06 23:55:37 +02:00
Pierrick HYMBERT
69856297b9
Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx
2024-04-06 23:53:11 +02:00
Pierrick HYMBERT
4f12a580d9
llama: dbrx: remove not existing condition on empty output layer
2024-04-06 23:35:23 +02:00
Pierrick HYMBERT
fe8089871e
model: dbrx: fix missing embedding tensor, mix with output layer
2024-04-06 23:27:29 +02:00
Pierrick HYMBERT
9c7dedb0f3
llama: dbrx: no attention output layer
2024-04-06 22:25:37 +02:00
Pierrick HYMBERT
76f266beef
scripts: get-wikitext-2 add unzip
2024-04-06 21:10:19 +02:00
Pierrick HYMBERT
03da419fc0
llama: dbrx: remove wrong attn output layer in model arch
2024-04-06 20:43:46 +02:00
Pierrick HYMBERT
916b91852b
convert: dbrx: fix remove wrong ATTN_OUT_NORM tensor, add output layer mapping
2024-04-06 20:30:30 +02:00
Pierrick HYMBERT
c8e6f903e0
doc: dbrx: add the model as supported
2024-04-06 20:09:01 +02:00
Pierrick HYMBERT
0a35f5881b
convert: dbrx: fix mixed up and down expert tensors
...
llama: dbrx: review graph
2024-04-06 19:56:37 +02:00
Pierrick HYMBERT
e3c1e8127c
convert: dbrx: fix mixed up and down expert tensors
2024-04-06 19:21:43 +02:00
Pierrick HYMBERT
a7f9a3eafc
dbrx: minor
2024-04-06 19:09:04 +02:00
Georgi Gerganov
54ea0698fb
sync : ggml
2024-04-06 18:27:46 +03:00
Daniel Bevenius
b66aec675c
backend : fix typo in scheduler documentation (ggml/781)
...
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-06 17:42:26 +03:00
Clint Herron
57dd02c44b
Tests: Added integration tests for GBNF parser ( #6472 )
...
* Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements.
* Fixing whitespace errors and cleaning error message alert to be clearer.
* Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API.
* Comment cleanup.
* Reorganizing tests for readability.
* Cleaning up debug message to make a bit more sense.
2024-04-06 10:31:33 -04:00
Pierrick HYMBERT
e4f8ee4f48
llama: support dbrx fix norm type
2024-04-06 16:14:58 +02:00
Pierrick HYMBERT
09210334bf
model: dbrx fix python linter in convert-hf-to-gguf.py
2024-04-06 16:00:32 +02:00
Pierrick HYMBERT
c0beb3cf7e
llama: add label for model 132B
2024-04-06 15:58:17 +02:00
Pierrick HYMBERT
3937100adb
model: dbrx, trust remote code
2024-04-06 15:57:57 +02:00
Pierrick HYMBERT
3e3d2d127c
gguf-py: remove wrong clip -> clamp
2024-04-06 15:46:47 +02:00
Pierrick HYMBERT
ed582c1dde
llama: support dbrx
...
#6344
2024-04-06 15:22:55 +02:00
Pierrick HYMBERT
1d8de31565
model: dbrx convert to gguf
...
#6344
2024-04-06 14:17:24 +02:00
Pierrick Hymbert
75cd4c7729
ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response ( #6495 )
...
* ci: bench: support sse and fix prompt processing time
server: add tokens usage in stream mode
* ci: bench: README.md EOL
* ci: bench: remove total pp and tg as it is not accurate
* ci: bench: fix case when there is no token generated
* ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics
* ci: bench: fix finish reason rate
2024-04-06 05:40:47 +02:00
Brian
a8bd14d557
gguf.py : add licence and version to gguf writer ( #6504 )
2024-04-05 21:41:38 +03:00
Hoang Nguyen
d0f5deebf8
readme : update UI list ( #6503 )
...
* Add MindMac to UI list
* Update proprietary description
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-04-05 21:39:43 +03:00
Ting Sun
87e21bbacd
bench : make n_batch and n_ubatch configurable in Batched bench ( #6500 )
...
* bench: make n_batch and n_ubatch configurable
* bench: update doc for batched bench
2024-04-05 21:34:53 +03:00
Ouadie EL FAROUKI
1b496a745c
[SYCL] Fixed minor bug when enabling FP16 for non intel targets ( #6464 )
...
* moved INTEL_MKL guard from gemm_impl to gemm (wrapper)
* Update ggml-sycl.cpp
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
---------
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
2024-04-05 19:05:06 +05:30
alexpinel
a307375c02
readme : add Dot to UI list ( #6487 )
2024-04-04 13:22:50 -04:00
Jun Jie
b660a5729e
readme : fix typo ( #6481 )
2024-04-04 13:16:37 -04:00
Ed Lepedus
0a1d889e27
server: add cURL support to server Dockerfiles ( #6474 )
...
* server: add cURL support to `full.Dockerfile`
* server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile`
* server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile`
* server: add cURL support to `server-intel.Dockerfile`
* server: add cURL support to `server-vulkan.Dockerfile`
* fix typo in `server-vulkan.Dockerfile`
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-04 18:31:22 +02:00
Minsoo Cheong
7dda1b727e
ci: exempt master branch workflows from getting cancelled ( #6486 )
...
* ci: exempt master branch workflows from getting cancelled
* apply to bench.yml
2024-04-04 18:30:53 +02:00
Ewout ter Hoeven
c666ba26c3
build CI: Name artifacts ( #6482 )
...
Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP.
It might be possible to further simplify the packing step (in future PRs).
2024-04-04 17:08:55 +02:00