llama.cpp

Author	SHA1	Message	Date
Christian Zhou-Zheng	03cc9bcbe8	use simplification from #7827	2024-06-08 23:14:26 -04:00
Christian Zhou-Zheng	666bb097a2	Merge branch 'master' into convert-split	2024-06-08 23:06:18 -04:00
Christian Zhou-Zheng	282e71fb39	edit cmd line args	2024-06-08 23:00:42 -04:00
compilade	5795b94182	convert-hf : match model part name prefix and suffix (#7687 ) In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present. This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some persistent problem, but shall do in the meantime.	2024-06-09 12:47:25 +10:00
compilade	ed9f252118	gguf-py : decouple adding metadata from writing in GGUFWriter (#7827 ) Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. In addition use_temp_file is now opt-in instead of opt-out defaulting to False. Also GGUFWriter now does not require output file name until when actually writing to it. And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata	2024-06-09 12:34:29 +10:00
slaren	fe1e3917cf	Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682 )" (#7808 ) This reverts commit `9422c5e34b`.	2024-06-09 01:43:39 +02:00
Christian Zhou-Zheng	079dfe3a8c	Update convert-hf-to-gguf.py Co-authored-by: compilade <git@compilade.net>	2024-06-08 15:42:17 -04:00
Olivier Chafik	d4d915d351	url: save -mu downloads to new cache location (#7826 ) * url: save -mu download to new cache location * url: fs_get_cache_file_path util * url: tweak sig of fs_get_cache_file	2024-06-08 21:21:08 +02:00
Christian Zhou-Zheng	f658e91f4a	comma consistency	2024-06-08 08:10:12 -04:00
sasha0552	7a16ce7db2	server : smart slot selection using Longest Common Prefix (#7728 ) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument	2024-06-08 10:50:31 +03:00
Christian Zhou-Zheng	02be0dd654	attempt 3 to appease the linter	2024-06-07 21:26:40 -04:00
Christian Zhou-Zheng	891b19cb81	attempt 2 to appease the linter	2024-06-07 21:20:46 -04:00
Christian Zhou-Zheng	2e70fa1055	attempt to appease the linter	2024-06-07 21:18:30 -04:00
Christian Zhou-Zheng	c6ae1d6799	reinstate original gguf package import and fix type annotation	2024-06-07 21:09:03 -04:00
Christian Zhou-Zheng	9576965ce7	examples/convert-legacy-llama.py: restore executable file permission	2024-06-07 20:51:22 -04:00
Francis Couture-Harpin	e093dfba9f	convert-hf : restore executable file permission	2024-06-07 17:31:35 -04:00
Christian Zhou-Zheng	dc5cf5fd82	Update gguf-py/gguf/gguf_writer_split.py Co-authored-by: compilade <git@compilade.net>	2024-06-07 17:26:30 -04:00
Christian Zhou-Zheng	0283fc1771	fix line endings	2024-06-07 17:24:27 -04:00
Christian Zhou-Zheng	5f29d4a617	fix convert-hf-to-gguf.py permissions	2024-06-07 17:19:01 -04:00
Christian Zhou-Zheng	1312e287ec	Update gguf-py/gguf/constants.py Co-authored-by: compilade <git@compilade.net>	2024-06-07 17:10:51 -04:00
slaren	da799b4189	vulkan : reuse parent extra for views (#7806 ) * vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng	6d3a256d1d	rename GGUFManager to GGUFWriterSplit	2024-06-07 09:12:44 -04:00
Christian Zhou-Zheng	c00fad71e5	gguf-split : change binary multi-byte units to decimal (#7803 )	2024-06-07 15:56:01 +03:00
intelmatt	27615f5ab2	cmake : fix BUILD_SHARED_LIBS=ON build (#7784 ) common depends on pthreads in Linux	2024-06-07 15:15:07 +03:00
Johannes Gäßler	7027b27d76	server: update cache_prompt documentation [no ci] (#7745 )	2024-06-07 11:15:49 +02:00
woodx	a5cabd7649	server : do not get prompt in infill mode (#7286 ) * avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <wudexiang@bytedance.com>	2024-06-07 10:09:45 +03:00
pengxin99	d5c938cd77	[SYCL] fix softmax r2r result wrong issue (#7811 )	2024-06-07 14:28:26 +08:00
slaren	c9ee7118d5	check for nans in imatrix and quantize (#7807 ) * imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values	2024-06-07 09:01:29 +03:00
Georgi Gerganov	ee459f40f6	server : fix --threads-http arg (#7801 )	2024-06-06 19:19:59 +03:00
Christian Zhou-Zheng	13ffe22ca7	base-1024 bytes to base-1000	2024-06-06 10:24:11 -04:00
Georgi Gerganov	f83351f9a6	imatrix : migrate to gpt_params (#7771 ) * imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl	2024-06-06 16:30:58 +03:00
Clint Herron	ad675e1c67	Added support for . (any character) token in grammar engine. (#6467 ) * Added support for . (any characer) token in grammar engine. * Add integration tests for any-character symbol.	2024-06-06 06:08:52 -07:00
Christian Zhou-Zheng	83e4a3f5cc	make pathlib explicit	2024-06-06 09:00:59 -04:00
Christian Zhou-Zheng	2037eabb64	move kv keys to constants.py	2024-06-06 08:49:46 -04:00
Christian Zhou-Zheng	1cbab22225	type consistency in format_n_bytes_to_str	2024-06-06 08:43:26 -04:00
Christian Zhou-Zheng	3328b0a991	Shard dataclass and un-negative dont_add_architecture	2024-06-06 08:37:35 -04:00
Christian Zhou-Zheng	6a05183b97	GGUFWriter compatibility fix Co-authored-by: compilade <git@compilade.net>	2024-06-06 08:28:10 -04:00
Christian Zhou-Zheng	706bd69023	re-add type hint Co-authored-by: compilade <git@compilade.net>	2024-06-06 08:27:25 -04:00
Mattheus Chediak	a143c04375	README minor fixes (#7798 ) [no ci] derievatives --> derivatives	2024-06-06 22:17:54 +10:00
Olivier Chafik	55b2d0849d	grammars: x{min,max} repetition operator (#6640 ) * grammars: x{min,max} repetition operator + tweak +//? to avoid duplication of original over alternates grammars: handle `x{n}` and fix `x{n,n}` * grammars: document new repetition operators * grammars: uniform use of int for min & max * grammars: refactor parser test * grammar: parsing tests w/ natural pretty print of updated expectations * grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) * grammars: improve test pretty print again * grammars: pretty print rules and chars * grammars: fix copy rule skipping * grammars: disallow `a{,}` (not allowed in regexps) * Update common/grammar-parser.cpp Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: fix copy rule skipping (again) & display of expectations * grammars: more test cases * grammars: update reps parsing to bring ? / * / + closer to before * json: use new GBNF repetitions{m,n} syntax * grammars: update performance gotchas w/ repetition advice * Update examples/json_schema_to_grammar.py Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: comment on rule repetitions * grammars: ensure unambiguous number alternatives * grammar: nit typo switched error msgs * grammar: nit numbering in comment * json: update numeric rule to be unambiguous * Apply suggestions from code review Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * json: fix integral-part * grammar: add repetition tests --------- Co-authored-by: Clint Herron <hanclinto@gmail.com>	2024-06-06 10:07:06 +01:00
Joan Fontanals	f5d7b268ec	llama : add jina v2 base code (#7596 ) * feat: add changes to handle jina v2 base code * fix: do not complicate things * fix: fix the usage of the code model * fix: fix comments * fix: fix linting issues * fix: remove ollama patches * style : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-06 10:22:41 +03:00
slaren	2d08b7fbb4	docker : build only main and server in their images (#7782 ) * add openmp lib to dockerfiles * build only main and server in their docker images	2024-06-06 08:19:49 +03:00
slaren	d67caea0d6	docker : add openmp lib (#7780 )	2024-06-06 08:17:21 +03:00
Christian Zhou-Zheng	ce7e6985d2	form shards while adding tensors, SHA256 sums agree with master	2024-06-05 18:29:39 -04:00
Christian Zhou-Zheng	5ad397d610	reduce diffs with master	2024-06-05 13:49:20 -04:00
Galunid	7672adeec7	Fix encoding in python scripts (#7733 )	2024-06-06 03:07:24 +10:00
Christian Zhou-Zheng	bb5ee02096	simplify even further and standardize with GGUFWriter	2024-06-05 12:49:08 -04:00
Christian Zhou-Zheng	f6fd3ea4e9	further simplify GGUFManager	2024-06-05 12:28:40 -04:00
Johannes Gäßler	7d1a378b8f	CUDA: refactor mmq, dmmv, mmvq (#7716 ) * CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits	2024-06-05 16:53:00 +02:00
Christian Zhou-Zheng	3e9430df33	reduce duplicated code from gguf_writer	2024-06-05 09:29:33 -04:00

1 2 3 4 5 ...

3157 commits