Christian Zhou-Zheng
70a6bc91cc
Update gguf-py/gguf/gguf_writer.py
...
Co-authored-by: compilade <git@compilade.net>
2024-06-09 17:08:11 -04:00
Christian Zhou-Zheng
0417104397
fix linting
2024-06-09 16:05:08 -04:00
Christian Zhou-Zheng
9d7f694438
fix typing and clean up
2024-06-09 16:02:23 -04:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries ( #7833 )
2024-06-09 20:19:35 +03:00
Christian Zhou-Zheng
f7ecd99691
appease linter
2024-06-09 13:09:05 -04:00
Christian Zhou-Zheng
5a96b8f27f
remove SplitStrategy, SplitArguments
2024-06-09 13:08:06 -04:00
Christian Zhou-Zheng
0471f67f4f
cleanup round 1
2024-06-09 12:40:02 -04:00
Christian Zhou-Zheng
49b9fbe942
actually make the linter happy
2024-06-09 11:37:56 -04:00
Nicolás Pérez
57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] ( #7700 )
...
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.
Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
Christian Zhou-Zheng
a234bf821b
fix linting
2024-06-09 11:23:55 -04:00
Christian Zhou-Zheng
0779f2f74f
tidy up
2024-06-09 11:20:14 -04:00
Christian Zhou-Zheng
69d6e7a8e9
Merge branch 'master' into convert-split
2024-06-09 11:14:02 -04:00
Christian Zhou-Zheng
ba1be979eb
fix ti data messiness
2024-06-09 11:10:33 -04:00
Christian Zhou-Zheng
ff2dd7d30d
try to refactor kv data (still fails)
2024-06-09 10:29:47 -04:00
mgroeber9110
3e2ee44315
server: do not remove whitespace at the start of a completion chunk ( #7830 )
2024-06-09 20:50:35 +10:00
Johannes Gäßler
42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q ( #7824 )
2024-06-09 09:42:25 +02:00
sasha0552
2decf57bc6
convert-hf : set the model name based on cli arg, if present ( #7693 )
...
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
Christian Zhou-Zheng
97dd416903
kv/ti data are still wrong
2024-06-09 00:34:36 -04:00
Christian Zhou-Zheng
03cc9bcbe8
use simplification from #7827
2024-06-08 23:14:26 -04:00
Christian Zhou-Zheng
666bb097a2
Merge branch 'master' into convert-split
2024-06-08 23:06:18 -04:00
Christian Zhou-Zheng
282e71fb39
edit cmd line args
2024-06-08 23:00:42 -04:00
compilade
5795b94182
convert-hf : match model part name prefix and suffix ( #7687 )
...
In #7075 , to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names.
But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.
This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
compilade
ed9f252118
gguf-py : decouple adding metadata from writing in GGUFWriter ( #7827 )
...
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value.
In addition use_temp_file is now opt-in instead of opt-out defaulting to False.
Also GGUFWriter now does not require output file name until when actually writing to it.
And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
slaren
fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend ( #7682 )" ( #7808 )
...
This reverts commit 9422c5e34b
.
2024-06-09 01:43:39 +02:00
Christian Zhou-Zheng
079dfe3a8c
Update convert-hf-to-gguf.py
...
Co-authored-by: compilade <git@compilade.net>
2024-06-08 15:42:17 -04:00
Olivier Chafik
d4d915d351
url: save -mu downloads to new cache location ( #7826 )
...
* url: save -mu download to new cache location
* url: fs_get_cache_file_path util
* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
Christian Zhou-Zheng
f658e91f4a
comma consistency
2024-06-08 08:10:12 -04:00
sasha0552
7a16ce7db2
server : smart slot selection using Longest Common Prefix ( #7728 )
...
* server : Smart selection of available slot using Longest Common Substring
* add usage
* remove trailing whitespaces
* Use Longest Common Prefix (LCP) instead of LCS
* Rename argument
2024-06-08 10:50:31 +03:00
Christian Zhou-Zheng
02be0dd654
attempt 3 to appease the linter
2024-06-07 21:26:40 -04:00
Christian Zhou-Zheng
891b19cb81
attempt 2 to appease the linter
2024-06-07 21:20:46 -04:00
Christian Zhou-Zheng
2e70fa1055
attempt to appease the linter
2024-06-07 21:18:30 -04:00
Christian Zhou-Zheng
c6ae1d6799
reinstate original gguf package import and fix type annotation
2024-06-07 21:09:03 -04:00
Christian Zhou-Zheng
9576965ce7
examples/convert-legacy-llama.py: restore executable file permission
2024-06-07 20:51:22 -04:00
Francis Couture-Harpin
e093dfba9f
convert-hf : restore executable file permission
2024-06-07 17:31:35 -04:00
Christian Zhou-Zheng
dc5cf5fd82
Update gguf-py/gguf/gguf_writer_split.py
...
Co-authored-by: compilade <git@compilade.net>
2024-06-07 17:26:30 -04:00
Christian Zhou-Zheng
0283fc1771
fix line endings
2024-06-07 17:24:27 -04:00
Christian Zhou-Zheng
5f29d4a617
fix convert-hf-to-gguf.py permissions
2024-06-07 17:19:01 -04:00
Christian Zhou-Zheng
1312e287ec
Update gguf-py/gguf/constants.py
...
Co-authored-by: compilade <git@compilade.net>
2024-06-07 17:10:51 -04:00
slaren
da799b4189
vulkan : reuse parent extra for views ( #7806 )
...
* vulkan : reuse parent extra for views
* Fix validation error when multiple compute contexts are used in a graph
---------
Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng
6d3a256d1d
rename GGUFManager to GGUFWriterSplit
2024-06-07 09:12:44 -04:00
Christian Zhou-Zheng
c00fad71e5
gguf-split : change binary multi-byte units to decimal ( #7803 )
2024-06-07 15:56:01 +03:00
intelmatt
27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build ( #7784 )
...
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Johannes Gäßler
7027b27d76
server: update cache_prompt documentation [no ci] ( #7745 )
2024-06-07 11:15:49 +02:00
woodx
a5cabd7649
server : do not get prompt in infill mode ( #7286 )
...
* avoid to get prompt in infill mode and embedding mode
* remove embedding mode
* refactor format
---------
Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
pengxin99
d5c938cd77
[SYCL] fix softmax r2r result wrong issue ( #7811 )
2024-06-07 14:28:26 +08:00
slaren
c9ee7118d5
check for nans in imatrix and quantize ( #7807 )
...
* imatrix : detect nan/inf values
* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
Georgi Gerganov
ee459f40f6
server : fix --threads-http arg ( #7801 )
2024-06-06 19:19:59 +03:00
Christian Zhou-Zheng
13ffe22ca7
base-1024 bytes to base-1000
2024-06-06 10:24:11 -04:00
Georgi Gerganov
f83351f9a6
imatrix : migrate to gpt_params ( #7771 )
...
* imatrix : migrate to gpt_params
ggml-ci
* imatrix : add --save-frequency cli arg
* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Clint Herron
ad675e1c67
Added support for . (any character) token in grammar engine. ( #6467 )
...
* Added support for . (any characer) token in grammar engine.
* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00