llama.cpp

Author	SHA1	Message	Date
Christian Zhou-Zheng	70a6bc91cc	Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net>	2024-06-09 17:08:11 -04:00
Christian Zhou-Zheng	0417104397	fix linting	2024-06-09 16:05:08 -04:00
Christian Zhou-Zheng	9d7f694438	fix typing and clean up	2024-06-09 16:02:23 -04:00
Georgi Gerganov	e95beeb1fc	imatrix : handle partial entries (#7833 )	2024-06-09 20:19:35 +03:00
Christian Zhou-Zheng	f7ecd99691	appease linter	2024-06-09 13:09:05 -04:00
Christian Zhou-Zheng	5a96b8f27f	remove SplitStrategy, SplitArguments	2024-06-09 13:08:06 -04:00
Christian Zhou-Zheng	0471f67f4f	cleanup round 1	2024-06-09 12:40:02 -04:00
Christian Zhou-Zheng	49b9fbe942	actually make the linter happy	2024-06-09 11:37:56 -04:00
Nicolás Pérez	57bf62ce7c	docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700 ) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>	2024-06-10 01:24:29 +10:00
Christian Zhou-Zheng	a234bf821b	fix linting	2024-06-09 11:23:55 -04:00
Christian Zhou-Zheng	0779f2f74f	tidy up	2024-06-09 11:20:14 -04:00
Christian Zhou-Zheng	69d6e7a8e9	Merge branch 'master' into convert-split	2024-06-09 11:14:02 -04:00
Christian Zhou-Zheng	ba1be979eb	fix ti data messiness	2024-06-09 11:10:33 -04:00
Christian Zhou-Zheng	ff2dd7d30d	try to refactor kv data (still fails)	2024-06-09 10:29:47 -04:00
mgroeber9110	3e2ee44315	server: do not remove whitespace at the start of a completion chunk (#7830 )	2024-06-09 20:50:35 +10:00
Johannes Gäßler	42b53d192f	CUDA: revise q8_1 data layout for mul_mat_q (#7824 )	2024-06-09 09:42:25 +02:00
sasha0552	2decf57bc6	convert-hf : set the model name based on cli arg, if present (#7693 ) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.	2024-06-09 16:39:25 +10:00
Christian Zhou-Zheng	97dd416903	kv/ti data are still wrong	2024-06-09 00:34:36 -04:00
Christian Zhou-Zheng	03cc9bcbe8	use simplification from #7827	2024-06-08 23:14:26 -04:00
Christian Zhou-Zheng	666bb097a2	Merge branch 'master' into convert-split	2024-06-08 23:06:18 -04:00
Christian Zhou-Zheng	282e71fb39	edit cmd line args	2024-06-08 23:00:42 -04:00
compilade	5795b94182	convert-hf : match model part name prefix and suffix (#7687 ) In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present. This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some persistent problem, but shall do in the meantime.	2024-06-09 12:47:25 +10:00
compilade	ed9f252118	gguf-py : decouple adding metadata from writing in GGUFWriter (#7827 ) Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. In addition use_temp_file is now opt-in instead of opt-out defaulting to False. Also GGUFWriter now does not require output file name until when actually writing to it. And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata	2024-06-09 12:34:29 +10:00
slaren	fe1e3917cf	Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682 )" (#7808 ) This reverts commit `9422c5e34b`.	2024-06-09 01:43:39 +02:00
Christian Zhou-Zheng	079dfe3a8c	Update convert-hf-to-gguf.py Co-authored-by: compilade <git@compilade.net>	2024-06-08 15:42:17 -04:00
Olivier Chafik	d4d915d351	url: save -mu downloads to new cache location (#7826 ) * url: save -mu download to new cache location * url: fs_get_cache_file_path util * url: tweak sig of fs_get_cache_file	2024-06-08 21:21:08 +02:00
Christian Zhou-Zheng	f658e91f4a	comma consistency	2024-06-08 08:10:12 -04:00
sasha0552	7a16ce7db2	server : smart slot selection using Longest Common Prefix (#7728 ) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument	2024-06-08 10:50:31 +03:00
Christian Zhou-Zheng	02be0dd654	attempt 3 to appease the linter	2024-06-07 21:26:40 -04:00
Christian Zhou-Zheng	891b19cb81	attempt 2 to appease the linter	2024-06-07 21:20:46 -04:00
Christian Zhou-Zheng	2e70fa1055	attempt to appease the linter	2024-06-07 21:18:30 -04:00
Christian Zhou-Zheng	c6ae1d6799	reinstate original gguf package import and fix type annotation	2024-06-07 21:09:03 -04:00
Christian Zhou-Zheng	9576965ce7	examples/convert-legacy-llama.py: restore executable file permission	2024-06-07 20:51:22 -04:00
Francis Couture-Harpin	e093dfba9f	convert-hf : restore executable file permission	2024-06-07 17:31:35 -04:00
Christian Zhou-Zheng	dc5cf5fd82	Update gguf-py/gguf/gguf_writer_split.py Co-authored-by: compilade <git@compilade.net>	2024-06-07 17:26:30 -04:00
Christian Zhou-Zheng	0283fc1771	fix line endings	2024-06-07 17:24:27 -04:00
Christian Zhou-Zheng	5f29d4a617	fix convert-hf-to-gguf.py permissions	2024-06-07 17:19:01 -04:00
Christian Zhou-Zheng	1312e287ec	Update gguf-py/gguf/constants.py Co-authored-by: compilade <git@compilade.net>	2024-06-07 17:10:51 -04:00
slaren	da799b4189	vulkan : reuse parent extra for views (#7806 ) * vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng	6d3a256d1d	rename GGUFManager to GGUFWriterSplit	2024-06-07 09:12:44 -04:00
Christian Zhou-Zheng	c00fad71e5	gguf-split : change binary multi-byte units to decimal (#7803 )	2024-06-07 15:56:01 +03:00
intelmatt	27615f5ab2	cmake : fix BUILD_SHARED_LIBS=ON build (#7784 ) common depends on pthreads in Linux	2024-06-07 15:15:07 +03:00
Johannes Gäßler	7027b27d76	server: update cache_prompt documentation [no ci] (#7745 )	2024-06-07 11:15:49 +02:00
woodx	a5cabd7649	server : do not get prompt in infill mode (#7286 ) * avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <wudexiang@bytedance.com>	2024-06-07 10:09:45 +03:00
pengxin99	d5c938cd77	[SYCL] fix softmax r2r result wrong issue (#7811 )	2024-06-07 14:28:26 +08:00
slaren	c9ee7118d5	check for nans in imatrix and quantize (#7807 ) * imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values	2024-06-07 09:01:29 +03:00
Georgi Gerganov	ee459f40f6	server : fix --threads-http arg (#7801 )	2024-06-06 19:19:59 +03:00
Christian Zhou-Zheng	13ffe22ca7	base-1024 bytes to base-1000	2024-06-06 10:24:11 -04:00
Georgi Gerganov	f83351f9a6	imatrix : migrate to gpt_params (#7771 ) * imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl	2024-06-06 16:30:58 +03:00
Clint Herron	ad675e1c67	Added support for . (any character) token in grammar engine. (#6467 ) * Added support for . (any characer) token in grammar engine. * Add integration tests for any-character symbol.	2024-06-06 06:08:52 -07:00

1 2 3 4 5 ...

3225 commits