Commit graph

3157 commits

Author SHA1 Message Date
Christian Zhou-Zheng
03cc9bcbe8 use simplification from #7827 2024-06-08 23:14:26 -04:00
Christian Zhou-Zheng
666bb097a2 Merge branch 'master' into convert-split 2024-06-08 23:06:18 -04:00
Christian Zhou-Zheng
282e71fb39 edit cmd line args 2024-06-08 23:00:42 -04:00
compilade
5795b94182
convert-hf : match model part name prefix and suffix (#7687)
In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. 

But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.

This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
compilade
ed9f252118
gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. 

In addition use_temp_file is now opt-in instead of opt-out defaulting to False.

Also GGUFWriter now does not require output file name until when actually writing to it.

And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
slaren
fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
This reverts commit 9422c5e34b.
2024-06-09 01:43:39 +02:00
Christian Zhou-Zheng
079dfe3a8c
Update convert-hf-to-gguf.py
Co-authored-by: compilade <git@compilade.net>
2024-06-08 15:42:17 -04:00
Olivier Chafik
d4d915d351
url: save -mu downloads to new cache location (#7826)
* url: save -mu download to new cache location

* url: fs_get_cache_file_path util

* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
Christian Zhou-Zheng
f658e91f4a comma consistency 2024-06-08 08:10:12 -04:00
sasha0552
7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
Christian Zhou-Zheng
02be0dd654 attempt 3 to appease the linter 2024-06-07 21:26:40 -04:00
Christian Zhou-Zheng
891b19cb81 attempt 2 to appease the linter 2024-06-07 21:20:46 -04:00
Christian Zhou-Zheng
2e70fa1055 attempt to appease the linter 2024-06-07 21:18:30 -04:00
Christian Zhou-Zheng
c6ae1d6799 reinstate original gguf package import and fix type annotation 2024-06-07 21:09:03 -04:00
Christian Zhou-Zheng
9576965ce7 examples/convert-legacy-llama.py: restore executable file permission 2024-06-07 20:51:22 -04:00
Francis Couture-Harpin
e093dfba9f convert-hf : restore executable file permission 2024-06-07 17:31:35 -04:00
Christian Zhou-Zheng
dc5cf5fd82
Update gguf-py/gguf/gguf_writer_split.py
Co-authored-by: compilade <git@compilade.net>
2024-06-07 17:26:30 -04:00
Christian Zhou-Zheng
0283fc1771 fix line endings 2024-06-07 17:24:27 -04:00
Christian Zhou-Zheng
5f29d4a617 fix convert-hf-to-gguf.py permissions 2024-06-07 17:19:01 -04:00
Christian Zhou-Zheng
1312e287ec
Update gguf-py/gguf/constants.py
Co-authored-by: compilade <git@compilade.net>
2024-06-07 17:10:51 -04:00
slaren
da799b4189
vulkan : reuse parent extra for views (#7806)
* vulkan : reuse parent extra for views

* Fix validation error when multiple compute contexts are used in a graph

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng
6d3a256d1d rename GGUFManager to GGUFWriterSplit 2024-06-07 09:12:44 -04:00
Christian Zhou-Zheng
c00fad71e5
gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
intelmatt
27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build (#7784)
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Johannes Gäßler
7027b27d76
server: update cache_prompt documentation [no ci] (#7745) 2024-06-07 11:15:49 +02:00
woodx
a5cabd7649
server : do not get prompt in infill mode (#7286)
* avoid to get prompt in infill mode and embedding mode

* remove embedding mode

* refactor format

---------

Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
pengxin99
d5c938cd77
[SYCL] fix softmax r2r result wrong issue (#7811) 2024-06-07 14:28:26 +08:00
slaren
c9ee7118d5
check for nans in imatrix and quantize (#7807)
* imatrix : detect nan/inf values

* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
Georgi Gerganov
ee459f40f6
server : fix --threads-http arg (#7801) 2024-06-06 19:19:59 +03:00
Christian Zhou-Zheng
13ffe22ca7 base-1024 bytes to base-1000 2024-06-06 10:24:11 -04:00
Georgi Gerganov
f83351f9a6
imatrix : migrate to gpt_params (#7771)
* imatrix : migrate to gpt_params

ggml-ci

* imatrix : add --save-frequency cli arg

* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Clint Herron
ad675e1c67
Added support for . (any character) token in grammar engine. (#6467)
* Added support for . (any characer) token in grammar engine.

* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00
Christian Zhou-Zheng
83e4a3f5cc make pathlib explicit 2024-06-06 09:00:59 -04:00
Christian Zhou-Zheng
2037eabb64 move kv keys to constants.py 2024-06-06 08:49:46 -04:00
Christian Zhou-Zheng
1cbab22225 type consistency in format_n_bytes_to_str 2024-06-06 08:43:26 -04:00
Christian Zhou-Zheng
3328b0a991 Shard dataclass and un-negative dont_add_architecture 2024-06-06 08:37:35 -04:00
Christian Zhou-Zheng
6a05183b97
GGUFWriter compatibility fix
Co-authored-by: compilade <git@compilade.net>
2024-06-06 08:28:10 -04:00
Christian Zhou-Zheng
706bd69023
re-add type hint
Co-authored-by: compilade <git@compilade.net>
2024-06-06 08:27:25 -04:00
Mattheus Chediak
a143c04375
README minor fixes (#7798) [no ci]
derievatives --> derivatives
2024-06-06 22:17:54 +10:00
Olivier Chafik
55b2d0849d
grammars: x{min,max} repetition operator (#6640)
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates

* grammars: handle `x{n}` and fix `x{n,n}`

* grammars: document new repetition operators

* grammars: uniform use of int for min & max

* grammars: refactor parser test

* grammar: parsing tests w/ natural pretty print of updated expectations

* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)

* grammars: improve test pretty print again

* grammars: pretty print rules and chars

* grammars: fix copy rule skipping

* grammars: disallow `a{,}` (not allowed in regexps)

* Update common/grammar-parser.cpp

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: fix copy rule skipping (again) & display of expectations

* grammars: more test cases

* grammars: update reps parsing to bring ? / * / + closer to before

* json: use new GBNF repetitions{m,n} syntax

* grammars: update performance gotchas w/ repetition advice

* Update examples/json_schema_to_grammar.py

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: comment on rule repetitions

* grammars: ensure unambiguous number alternatives

* grammar: nit typo switched error msgs

* grammar: nit numbering in comment

* json: update numeric rule to be unambiguous

* Apply suggestions from code review

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* json: fix integral-part

* grammar: add repetition tests

---------

Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Joan Fontanals
f5d7b268ec
llama : add jina v2 base code (#7596)
* feat: add changes to handle jina v2 base code

* fix: do not complicate things

* fix: fix the usage of the code model

* fix: fix comments

* fix: fix linting issues

* fix: remove ollama patches

* style : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-06 10:22:41 +03:00
slaren
2d08b7fbb4
docker : build only main and server in their images (#7782)
* add openmp lib to dockerfiles

* build only main and server in their docker images
2024-06-06 08:19:49 +03:00
slaren
d67caea0d6
docker : add openmp lib (#7780) 2024-06-06 08:17:21 +03:00
Christian Zhou-Zheng
ce7e6985d2 form shards while adding tensors, SHA256 sums agree with master 2024-06-05 18:29:39 -04:00
Christian Zhou-Zheng
5ad397d610 reduce diffs with master 2024-06-05 13:49:20 -04:00
Galunid
7672adeec7
Fix encoding in python scripts (#7733) 2024-06-06 03:07:24 +10:00
Christian Zhou-Zheng
bb5ee02096 simplify even further and standardize with GGUFWriter 2024-06-05 12:49:08 -04:00
Christian Zhou-Zheng
f6fd3ea4e9 further simplify GGUFManager 2024-06-05 12:28:40 -04:00
Johannes Gäßler
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
* CUDA: refactor mmq, dmmv, mmvq

* fix out-of-bounds write

* struct for qk, qr, qi

* fix cmake build

* mmq_type_traits
2024-06-05 16:53:00 +02:00
Christian Zhou-Zheng
3e9430df33 reduce duplicated code from gguf_writer 2024-06-05 09:29:33 -04:00