llama.cpp

Author	SHA1	Message	Date
ochafik	d6483a9c07	add min/max constrained int field to pydantic json schema example	2024-06-10 02:00:04 +01:00
ochafik	cad377d3a1	add C++11-compatible replacement for std::string_view	2024-06-09 19:35:36 +01:00
ochafik	d1f679125f	Update test-grammar-integration.cpp	2024-06-09 13:22:41 +01:00
Olivier Chafik	dcc27d1a93	fix min in [1, 9]	2024-06-09 09:42:19 +01:00
Olivier Chafik	a0f19047af	nit: move + rename _build_min_max_int	2024-06-08 21:46:18 +01:00
Olivier Chafik	e93368076b	json: port min/max integer support to Python & JS	2024-06-08 21:33:24 +01:00
Olivier Chafik	3549702da7	json: nit: move string rules together	2024-06-08 21:21:35 +01:00
Olivier Chafik	ac2a8f8930	Update test-grammar-integration.cpp	2024-06-08 20:39:42 +01:00
Olivier Chafik	4c1c29361e	json: fix negative min (w/ more than 1 digit)	2024-06-08 20:33:40 +01:00
Olivier Chafik	931b543607	json: fix negative max	2024-06-08 20:33:12 +01:00
Olivier Chafik	a786c0381b	Merge remote-tracking branch 'origin/master' into json-bounds2	2024-06-08 20:05:17 +01:00
sasha0552	7a16ce7db2	server : smart slot selection using Longest Common Prefix (#7728 ) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument	2024-06-08 10:50:31 +03:00
slaren	da799b4189	vulkan : reuse parent extra for views (#7806 ) * vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng	c00fad71e5	gguf-split : change binary multi-byte units to decimal (#7803 )	2024-06-07 15:56:01 +03:00
intelmatt	27615f5ab2	cmake : fix BUILD_SHARED_LIBS=ON build (#7784 ) common depends on pthreads in Linux	2024-06-07 15:15:07 +03:00
Johannes Gäßler	7027b27d76	server: update cache_prompt documentation [no ci] (#7745 )	2024-06-07 11:15:49 +02:00
woodx	a5cabd7649	server : do not get prompt in infill mode (#7286 ) * avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <wudexiang@bytedance.com>	2024-06-07 10:09:45 +03:00
pengxin99	d5c938cd77	[SYCL] fix softmax r2r result wrong issue (#7811 )	2024-06-07 14:28:26 +08:00
slaren	c9ee7118d5	check for nans in imatrix and quantize (#7807 ) * imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values	2024-06-07 09:01:29 +03:00
Georgi Gerganov	ee459f40f6	server : fix --threads-http arg (#7801 )	2024-06-06 19:19:59 +03:00
Georgi Gerganov	f83351f9a6	imatrix : migrate to gpt_params (#7771 ) * imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl	2024-06-06 16:30:58 +03:00
Clint Herron	ad675e1c67	Added support for . (any character) token in grammar engine. (#6467 ) * Added support for . (any characer) token in grammar engine. * Add integration tests for any-character symbol.	2024-06-06 06:08:52 -07:00
Mattheus Chediak	a143c04375	README minor fixes (#7798 ) [no ci] derievatives --> derivatives	2024-06-06 22:17:54 +10:00
ochafik	b6b6a6caee	Update json-schema-to-grammar.cpp	2024-06-06 10:14:28 +01:00
ochafik	431edb8e7b	json: fix bounds tests	2024-06-06 10:14:28 +01:00
ochafik	5a86c6f0e2	json: integration test for schemas	2024-06-06 10:14:28 +01:00
ochafik	f8db47814b	json: proper paren fix	2024-06-06 10:14:28 +01:00
ochafik	a381deb1b6	json: fix missing paren min/max bug	2024-06-06 10:14:28 +01:00
ochafik	af63f4fb27	json: handle negative min / max integer bounds	2024-06-06 10:14:28 +01:00
ochafik	c37c484029	json: min + max integer constraints	2024-06-06 10:14:28 +01:00
ochafik	d69ccb06a4	json: fix min 0	2024-06-06 10:14:28 +01:00
ochafik	057bbdc1f3	json: support minimum for positive integer values	2024-06-06 10:14:28 +01:00
Olivier Chafik	55b2d0849d	grammars: x{min,max} repetition operator (#6640 ) * grammars: x{min,max} repetition operator + tweak +//? to avoid duplication of original over alternates grammars: handle `x{n}` and fix `x{n,n}` * grammars: document new repetition operators * grammars: uniform use of int for min & max * grammars: refactor parser test * grammar: parsing tests w/ natural pretty print of updated expectations * grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) * grammars: improve test pretty print again * grammars: pretty print rules and chars * grammars: fix copy rule skipping * grammars: disallow `a{,}` (not allowed in regexps) * Update common/grammar-parser.cpp Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: fix copy rule skipping (again) & display of expectations * grammars: more test cases * grammars: update reps parsing to bring ? / * / + closer to before * json: use new GBNF repetitions{m,n} syntax * grammars: update performance gotchas w/ repetition advice * Update examples/json_schema_to_grammar.py Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: comment on rule repetitions * grammars: ensure unambiguous number alternatives * grammar: nit typo switched error msgs * grammar: nit numbering in comment * json: update numeric rule to be unambiguous * Apply suggestions from code review Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * json: fix integral-part * grammar: add repetition tests --------- Co-authored-by: Clint Herron <hanclinto@gmail.com>	2024-06-06 10:07:06 +01:00
Joan Fontanals	f5d7b268ec	llama : add jina v2 base code (#7596 ) * feat: add changes to handle jina v2 base code * fix: do not complicate things * fix: fix the usage of the code model * fix: fix comments * fix: fix linting issues * fix: remove ollama patches * style : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-06 10:22:41 +03:00
slaren	2d08b7fbb4	docker : build only main and server in their images (#7782 ) * add openmp lib to dockerfiles * build only main and server in their docker images	2024-06-06 08:19:49 +03:00
slaren	d67caea0d6	docker : add openmp lib (#7780 )	2024-06-06 08:17:21 +03:00
Galunid	7672adeec7	Fix encoding in python scripts (#7733 )	2024-06-06 03:07:24 +10:00
Johannes Gäßler	7d1a378b8f	CUDA: refactor mmq, dmmv, mmvq (#7716 ) * CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits	2024-06-05 16:53:00 +02:00
Georgi Gerganov	2b3389677a	ggml : refactor rope norm/neox (#7634 ) * ggml : unify rope norm/neox (CPU) * ggml : fix compile warning * ggml : remove GLM rope mode ggml-ci * metal : better rope implementation ggml-ci * cuda : better rope implementation ggml-ci * naming : n_orig_ctx -> n_ctx_orig ggml-ci * dev : add reminders to update backends ggml-ci * vulkan : fix ggml_rope_ext() usage * cuda : fix array size + indents ggml-ci	2024-06-05 11:29:20 +03:00
arch-btw	9973e81c5c	readme : remove -ins (#7759 ) -ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.	2024-06-05 09:40:49 +03:00
jaime-m-p	c90dbe026b	Fix per token atrributes bits (#7749 )	2024-06-05 01:26:14 +02:00
agray3	b90dc566c1	Allow number of nodes in CUDA graph to change (#7738 ) Previously the code would have failed to cope in the case that the number of nodes changes in an existing CUDA graph. This fixes the issue by removing an unnecessary conditional.	2024-06-04 22:06:49 +02:00
Georgi Gerganov	1442677f92	common : refactor cli arg parsing (#7675 ) * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params	2024-06-04 21:23:39 +03:00
Georgi Gerganov	554c247caf	ggml : remove OpenCL (#7735 ) ggml-ci	2024-06-04 21:23:20 +03:00
Georgi Gerganov	0cd6bd3483	llama : remove beam search (#7736 )	2024-06-04 21:23:05 +03:00
Georgi Gerganov	5ca0944a15	readme : remove obsolete Zig instructions (#7471 )	2024-06-04 19:43:01 +03:00
slaren	adc9ff3841	llama-bench : allow using a different printer for stderr with -oe (#7722 ) compare-commits.sh : hide stdout, use -oe to print markdown	2024-06-04 14:32:42 +02:00
Daniele	987d743d6b	Improve hipBLAS support in CMake (#7696 ) * Improve hipBLAS support in CMake This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK. * Set ROCM_PATH correctly	2024-06-04 14:09:15 +02:00
zhouwg	b226c1227b	refine .gitignore (#7688 ) This adds tags and android ndk into the git ignore list	2024-06-04 21:21:26 +10:00
jaime-m-p	3b38d48609	Per token attributes (#7685 ) * Add per token attributes enum * Using phi-3 for testing 'rstrip' * Using jina-v2 for testing 'lstrip' * Brute force test for 'lstrip' and 'rstrip' * Implement 'rstrip' and 'lstrip' * Update phi-3 GGUF file (obsolete since `917dc8c`) * Replace llama_token_type with llama_token_attribs	2024-06-04 09:17:17 +02:00

1 2 3 4 5 ...

3129 commits