Commit graph

3225 commits

Author SHA1 Message Date
Christian Zhou-Zheng
70a6bc91cc
Update gguf-py/gguf/gguf_writer.py
Co-authored-by: compilade <git@compilade.net>
2024-06-09 17:08:11 -04:00
Christian Zhou-Zheng
0417104397 fix linting 2024-06-09 16:05:08 -04:00
Christian Zhou-Zheng
9d7f694438 fix typing and clean up 2024-06-09 16:02:23 -04:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries (#7833) 2024-06-09 20:19:35 +03:00
Christian Zhou-Zheng
f7ecd99691 appease linter 2024-06-09 13:09:05 -04:00
Christian Zhou-Zheng
5a96b8f27f remove SplitStrategy, SplitArguments 2024-06-09 13:08:06 -04:00
Christian Zhou-Zheng
0471f67f4f cleanup round 1 2024-06-09 12:40:02 -04:00
Christian Zhou-Zheng
49b9fbe942 actually make the linter happy 2024-06-09 11:37:56 -04:00
Nicolás Pérez
57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700)
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.

Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
Christian Zhou-Zheng
a234bf821b fix linting 2024-06-09 11:23:55 -04:00
Christian Zhou-Zheng
0779f2f74f tidy up 2024-06-09 11:20:14 -04:00
Christian Zhou-Zheng
69d6e7a8e9 Merge branch 'master' into convert-split 2024-06-09 11:14:02 -04:00
Christian Zhou-Zheng
ba1be979eb fix ti data messiness 2024-06-09 11:10:33 -04:00
Christian Zhou-Zheng
ff2dd7d30d try to refactor kv data (still fails) 2024-06-09 10:29:47 -04:00
mgroeber9110
3e2ee44315
server: do not remove whitespace at the start of a completion chunk (#7830) 2024-06-09 20:50:35 +10:00
Johannes Gäßler
42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q (#7824) 2024-06-09 09:42:25 +02:00
sasha0552
2decf57bc6
convert-hf : set the model name based on cli arg, if present (#7693)
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
Christian Zhou-Zheng
97dd416903 kv/ti data are still wrong 2024-06-09 00:34:36 -04:00
Christian Zhou-Zheng
03cc9bcbe8 use simplification from #7827 2024-06-08 23:14:26 -04:00
Christian Zhou-Zheng
666bb097a2 Merge branch 'master' into convert-split 2024-06-08 23:06:18 -04:00
Christian Zhou-Zheng
282e71fb39 edit cmd line args 2024-06-08 23:00:42 -04:00
compilade
5795b94182
convert-hf : match model part name prefix and suffix (#7687)
In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. 

But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.

This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
compilade
ed9f252118
gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. 

In addition use_temp_file is now opt-in instead of opt-out defaulting to False.

Also GGUFWriter now does not require output file name until when actually writing to it.

And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
slaren
fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
This reverts commit 9422c5e34b.
2024-06-09 01:43:39 +02:00
Christian Zhou-Zheng
079dfe3a8c
Update convert-hf-to-gguf.py
Co-authored-by: compilade <git@compilade.net>
2024-06-08 15:42:17 -04:00
Olivier Chafik
d4d915d351
url: save -mu downloads to new cache location (#7826)
* url: save -mu download to new cache location

* url: fs_get_cache_file_path util

* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
Christian Zhou-Zheng
f658e91f4a comma consistency 2024-06-08 08:10:12 -04:00
sasha0552
7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
Christian Zhou-Zheng
02be0dd654 attempt 3 to appease the linter 2024-06-07 21:26:40 -04:00
Christian Zhou-Zheng
891b19cb81 attempt 2 to appease the linter 2024-06-07 21:20:46 -04:00
Christian Zhou-Zheng
2e70fa1055 attempt to appease the linter 2024-06-07 21:18:30 -04:00
Christian Zhou-Zheng
c6ae1d6799 reinstate original gguf package import and fix type annotation 2024-06-07 21:09:03 -04:00
Christian Zhou-Zheng
9576965ce7 examples/convert-legacy-llama.py: restore executable file permission 2024-06-07 20:51:22 -04:00
Francis Couture-Harpin
e093dfba9f convert-hf : restore executable file permission 2024-06-07 17:31:35 -04:00
Christian Zhou-Zheng
dc5cf5fd82
Update gguf-py/gguf/gguf_writer_split.py
Co-authored-by: compilade <git@compilade.net>
2024-06-07 17:26:30 -04:00
Christian Zhou-Zheng
0283fc1771 fix line endings 2024-06-07 17:24:27 -04:00
Christian Zhou-Zheng
5f29d4a617 fix convert-hf-to-gguf.py permissions 2024-06-07 17:19:01 -04:00
Christian Zhou-Zheng
1312e287ec
Update gguf-py/gguf/constants.py
Co-authored-by: compilade <git@compilade.net>
2024-06-07 17:10:51 -04:00
slaren
da799b4189
vulkan : reuse parent extra for views (#7806)
* vulkan : reuse parent extra for views

* Fix validation error when multiple compute contexts are used in a graph

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng
6d3a256d1d rename GGUFManager to GGUFWriterSplit 2024-06-07 09:12:44 -04:00
Christian Zhou-Zheng
c00fad71e5
gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
intelmatt
27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build (#7784)
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Johannes Gäßler
7027b27d76
server: update cache_prompt documentation [no ci] (#7745) 2024-06-07 11:15:49 +02:00
woodx
a5cabd7649
server : do not get prompt in infill mode (#7286)
* avoid to get prompt in infill mode and embedding mode

* remove embedding mode

* refactor format

---------

Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
pengxin99
d5c938cd77
[SYCL] fix softmax r2r result wrong issue (#7811) 2024-06-07 14:28:26 +08:00
slaren
c9ee7118d5
check for nans in imatrix and quantize (#7807)
* imatrix : detect nan/inf values

* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
Georgi Gerganov
ee459f40f6
server : fix --threads-http arg (#7801) 2024-06-06 19:19:59 +03:00
Christian Zhou-Zheng
13ffe22ca7 base-1024 bytes to base-1000 2024-06-06 10:24:11 -04:00
Georgi Gerganov
f83351f9a6
imatrix : migrate to gpt_params (#7771)
* imatrix : migrate to gpt_params

ggml-ci

* imatrix : add --save-frequency cli arg

* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Clint Herron
ad675e1c67
Added support for . (any character) token in grammar engine. (#6467)
* Added support for . (any characer) token in grammar engine.

* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00