Commit graph

2692 commits

Author SHA1 Message Date
Pierrick HYMBERT
305ac3b61b llama: dbrx: quantize fix n_attention_wv tensor name 2024-04-07 05:01:33 +02:00
Neo Zhang Jianyu
d4f220a5cc
support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521) 2024-04-07 10:55:59 +08:00
Pierrick HYMBERT
06a59abf0a model: dbrx: convert add n_ff 2024-04-07 03:17:24 +02:00
Pierrick HYMBERT
52c403355f llama: increase maximum experts allowed 2024-04-07 03:16:33 +02:00
Pierrick HYMBERT
7e7cd53ca6 llama: dbrx: remove unnecessary optional tensor on FFN_GATE_EXPS 2024-04-06 23:55:37 +02:00
Pierrick HYMBERT
69856297b9 Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx 2024-04-06 23:53:11 +02:00
Pierrick HYMBERT
4f12a580d9 llama: dbrx: remove not existing condition on empty output layer 2024-04-06 23:35:23 +02:00
Pierrick HYMBERT
fe8089871e model: dbrx: fix missing embedding tensor, mix with output layer 2024-04-06 23:27:29 +02:00
Pierrick HYMBERT
9c7dedb0f3 llama: dbrx: no attention output layer 2024-04-06 22:25:37 +02:00
Pierrick HYMBERT
76f266beef scripts: get-wikitext-2 add unzip 2024-04-06 21:10:19 +02:00
Pierrick HYMBERT
03da419fc0 llama: dbrx: remove wrong attn output layer in model arch 2024-04-06 20:43:46 +02:00
Pierrick HYMBERT
916b91852b convert: dbrx: fix remove wrong ATTN_OUT_NORM tensor, add output layer mapping 2024-04-06 20:30:30 +02:00
Pierrick HYMBERT
c8e6f903e0 doc: dbrx: add the model as supported 2024-04-06 20:09:01 +02:00
Pierrick HYMBERT
0a35f5881b convert: dbrx: fix mixed up and down expert tensors
llama: dbrx: review graph
2024-04-06 19:56:37 +02:00
Pierrick HYMBERT
e3c1e8127c convert: dbrx: fix mixed up and down expert tensors 2024-04-06 19:21:43 +02:00
Pierrick HYMBERT
a7f9a3eafc dbrx: minor 2024-04-06 19:09:04 +02:00
Georgi Gerganov
54ea0698fb
sync : ggml 2024-04-06 18:27:46 +03:00
Daniel Bevenius
b66aec675c
backend : fix typo in scheduler documentation (ggml/781)
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-06 17:42:26 +03:00
Clint Herron
57dd02c44b
Tests: Added integration tests for GBNF parser (#6472)
* Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements.

* Fixing whitespace errors and cleaning error message alert to be clearer.

* Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API.

* Comment cleanup.

* Reorganizing tests for readability.

* Cleaning up debug message to make a bit more sense.
2024-04-06 10:31:33 -04:00
Pierrick HYMBERT
e4f8ee4f48 llama: support dbrx fix norm type 2024-04-06 16:14:58 +02:00
Pierrick HYMBERT
09210334bf model: dbrx fix python linter in convert-hf-to-gguf.py 2024-04-06 16:00:32 +02:00
Pierrick HYMBERT
c0beb3cf7e llama: add label for model 132B 2024-04-06 15:58:17 +02:00
Pierrick HYMBERT
3937100adb model: dbrx, trust remote code 2024-04-06 15:57:57 +02:00
Pierrick HYMBERT
3e3d2d127c gguf-py: remove wrong clip -> clamp 2024-04-06 15:46:47 +02:00
Pierrick HYMBERT
ed582c1dde llama: support dbrx
#6344
2024-04-06 15:22:55 +02:00
Pierrick HYMBERT
1d8de31565 model: dbrx convert to gguf
#6344
2024-04-06 14:17:24 +02:00
Pierrick Hymbert
75cd4c7729
ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495)
* ci: bench: support sse and fix prompt processing time
server: add tokens usage in stream mode

* ci: bench: README.md EOL

* ci: bench: remove total pp and tg as it is not accurate

* ci: bench: fix case when there is no token generated

* ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics

* ci: bench: fix finish reason rate
2024-04-06 05:40:47 +02:00
Brian
a8bd14d557
gguf.py : add licence and version to gguf writer (#6504) 2024-04-05 21:41:38 +03:00
Hoang Nguyen
d0f5deebf8
readme : update UI list (#6503)
* Add MindMac to UI list

* Update proprietary description

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-04-05 21:39:43 +03:00
Ting Sun
87e21bbacd
bench : make n_batch and n_ubatch configurable in Batched bench (#6500)
* bench: make n_batch and n_ubatch configurable

* bench: update doc for batched bench
2024-04-05 21:34:53 +03:00
Ouadie EL FAROUKI
1b496a745c
[SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464)
* moved INTEL_MKL guard from gemm_impl to gemm (wrapper)

* Update ggml-sycl.cpp

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>

---------

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
2024-04-05 19:05:06 +05:30
alexpinel
a307375c02
readme : add Dot to UI list (#6487) 2024-04-04 13:22:50 -04:00
Jun Jie
b660a5729e
readme : fix typo (#6481) 2024-04-04 13:16:37 -04:00
Ed Lepedus
0a1d889e27
server: add cURL support to server Dockerfiles (#6474)
* server: add cURL support to `full.Dockerfile`

* server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile`

* server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile`

* server: add cURL support to `server-intel.Dockerfile`

* server: add cURL support to `server-vulkan.Dockerfile`

* fix typo in `server-vulkan.Dockerfile`

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-04 18:31:22 +02:00
Minsoo Cheong
7dda1b727e
ci: exempt master branch workflows from getting cancelled (#6486)
* ci: exempt master branch workflows from getting cancelled

* apply to bench.yml
2024-04-04 18:30:53 +02:00
Ewout ter Hoeven
c666ba26c3
build CI: Name artifacts (#6482)
Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP.

It might be possible to further simplify the packing step (in future PRs).
2024-04-04 17:08:55 +02:00
Shakhar Dasgupta
2e66913e5f
server: allow penalizing repetition of newlines on server webpage (#6431) 2024-04-04 17:03:00 +02:00
Pierrick Hymbert
8120efee1d
ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478) 2024-04-04 16:59:04 +02:00
limitedAtonement
a74401f0e5
Correct README link (#6458)
README is called README.md.
2024-04-04 16:30:02 +02:00
Pierrick Hymbert
7a2c92637a
ci: bench: add more ftype, fix triggers and bot comment (#6466)
* ci: bench: change trigger path to not spawn on each PR

* ci: bench: add more file type for phi-2: q8_0 and f16.
- do not show the comment by default

* ci: bench: add seed parameter in k6 script

* ci: bench: artefact name perf job

* Add iteration in the commit status, reduce again the autocomment

* ci: bench: add per slot metric in the commit status

* Fix trailing spaces
2024-04-04 12:57:58 +03:00
Daniel Bevenius
4bcd6b959c
common: remove duplicate check for curl (#6471)
This commit removes one of the two identical checks for curl being NULL
in llama_load_model_from_url.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-04 09:49:21 +02:00
Clint Herron
9b84ae1806
examples : add GBNF validator program (#5948)
* Revising GBNF validator program to be much simpler.

* Changing from streams to using cstdio

* Adding final newline character.
2024-04-04 10:44:28 +03:00
Georgi Gerganov
4399f13fb9
server : remove obsolete --memory-f32 option 2024-04-04 09:34:58 +03:00
Xiao-Yong Jin
1a43c7254e
server : add option to disable KV offload (#6468) 2024-04-04 09:33:48 +03:00
Clint Herron
72d73af651
convert : fix for lint error complaining of bare except (#6470) 2024-04-04 09:32:53 +03:00
Fattire
5fb1574c81
A few small fixes to server's README docs (#6428)
* Typo fix to server's README.md

Fix minor typo ("tonen") in server README.

* server readme grammar/style fixes.

Quickly went through this file to look for inconsistencies in
presentation of defaults, flag options, and looked for typos
and grammar issues.

Not perfect, but hopefully improved.

* Update README.md

Remove an extra space before newline.
2024-04-03 22:22:57 +02:00
JH23X
60cdf40cc3
server : handle exception on wrong type in request (#6452)
Co-authored-by: Jonas Holzner <jonas.holzner.external@hensoldt.net>
2024-04-03 21:09:52 +03:00
bryanSwk
bb43cf7e9d
llama : add SEA-LION support (#6448)
* initial commit for sealion support

* add sealion support

* minor fix

* q/k ln and pos_embd only if required

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* minor : clear whitespaces

---------

Co-authored-by: bryan <bryansiow@aisingapore.org>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03 21:05:10 +03:00
Ewout ter Hoeven
9f62c0173d
ci : update checkout, setup-python and upload-artifact to latest (#6456)
* CI: Update actions/checkout to v4

* CI: Update actions/setup-python to v5

* CI: Update actions/upload-artifact to v4
2024-04-03 21:01:13 +03:00
Ed Lepedus
5d4f12e462
server: add cURL support to server.Dockerfile (#6461) 2024-04-03 19:56:37 +02:00