llama.cpp

Author	SHA1	Message	Date
Christian Zhou-Zheng	83e4a3f5cc	make pathlib explicit	2024-06-06 09:00:59 -04:00
Christian Zhou-Zheng	2037eabb64	move kv keys to constants.py	2024-06-06 08:49:46 -04:00
Christian Zhou-Zheng	1cbab22225	type consistency in format_n_bytes_to_str	2024-06-06 08:43:26 -04:00
Christian Zhou-Zheng	3328b0a991	Shard dataclass and un-negative dont_add_architecture	2024-06-06 08:37:35 -04:00
Christian Zhou-Zheng	6a05183b97	GGUFWriter compatibility fix Co-authored-by: compilade <git@compilade.net>	2024-06-06 08:28:10 -04:00
Christian Zhou-Zheng	706bd69023	re-add type hint Co-authored-by: compilade <git@compilade.net>	2024-06-06 08:27:25 -04:00
Mattheus Chediak	a143c04375	README minor fixes (#7798 ) [no ci] derievatives --> derivatives	2024-06-06 22:17:54 +10:00
Olivier Chafik	55b2d0849d	grammars: x{min,max} repetition operator (#6640 ) * grammars: x{min,max} repetition operator + tweak +//? to avoid duplication of original over alternates grammars: handle `x{n}` and fix `x{n,n}` * grammars: document new repetition operators * grammars: uniform use of int for min & max * grammars: refactor parser test * grammar: parsing tests w/ natural pretty print of updated expectations * grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) * grammars: improve test pretty print again * grammars: pretty print rules and chars * grammars: fix copy rule skipping * grammars: disallow `a{,}` (not allowed in regexps) * Update common/grammar-parser.cpp Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: fix copy rule skipping (again) & display of expectations * grammars: more test cases * grammars: update reps parsing to bring ? / * / + closer to before * json: use new GBNF repetitions{m,n} syntax * grammars: update performance gotchas w/ repetition advice * Update examples/json_schema_to_grammar.py Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * grammars: comment on rule repetitions * grammars: ensure unambiguous number alternatives * grammar: nit typo switched error msgs * grammar: nit numbering in comment * json: update numeric rule to be unambiguous * Apply suggestions from code review Co-authored-by: Clint Herron <hanclinto@gmail.com> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <hanclinto@gmail.com> * json: fix integral-part * grammar: add repetition tests --------- Co-authored-by: Clint Herron <hanclinto@gmail.com>	2024-06-06 10:07:06 +01:00
Joan Fontanals	f5d7b268ec	llama : add jina v2 base code (#7596 ) * feat: add changes to handle jina v2 base code * fix: do not complicate things * fix: fix the usage of the code model * fix: fix comments * fix: fix linting issues * fix: remove ollama patches * style : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-06 10:22:41 +03:00
slaren	2d08b7fbb4	docker : build only main and server in their images (#7782 ) * add openmp lib to dockerfiles * build only main and server in their docker images	2024-06-06 08:19:49 +03:00
slaren	d67caea0d6	docker : add openmp lib (#7780 )	2024-06-06 08:17:21 +03:00
Christian Zhou-Zheng	ce7e6985d2	form shards while adding tensors, SHA256 sums agree with master	2024-06-05 18:29:39 -04:00
Christian Zhou-Zheng	5ad397d610	reduce diffs with master	2024-06-05 13:49:20 -04:00
Galunid	7672adeec7	Fix encoding in python scripts (#7733 )	2024-06-06 03:07:24 +10:00
Christian Zhou-Zheng	bb5ee02096	simplify even further and standardize with GGUFWriter	2024-06-05 12:49:08 -04:00
Christian Zhou-Zheng	f6fd3ea4e9	further simplify GGUFManager	2024-06-05 12:28:40 -04:00
Johannes Gäßler	7d1a378b8f	CUDA: refactor mmq, dmmv, mmvq (#7716 ) * CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits	2024-06-05 16:53:00 +02:00
Christian Zhou-Zheng	3e9430df33	reduce duplicated code from gguf_writer	2024-06-05 09:29:33 -04:00
Georgi Gerganov	2b3389677a	ggml : refactor rope norm/neox (#7634 ) * ggml : unify rope norm/neox (CPU) * ggml : fix compile warning * ggml : remove GLM rope mode ggml-ci * metal : better rope implementation ggml-ci * cuda : better rope implementation ggml-ci * naming : n_orig_ctx -> n_ctx_orig ggml-ci * dev : add reminders to update backends ggml-ci * vulkan : fix ggml_rope_ext() usage * cuda : fix array size + indents ggml-ci	2024-06-05 11:29:20 +03:00
arch-btw	9973e81c5c	readme : remove -ins (#7759 ) -ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.	2024-06-05 09:40:49 +03:00
jaime-m-p	c90dbe026b	Fix per token atrributes bits (#7749 )	2024-06-05 01:26:14 +02:00
agray3	b90dc566c1	Allow number of nodes in CUDA graph to change (#7738 ) Previously the code would have failed to cope in the case that the number of nodes changes in an existing CUDA graph. This fixes the issue by removing an unnecessary conditional.	2024-06-04 22:06:49 +02:00
Georgi Gerganov	1442677f92	common : refactor cli arg parsing (#7675 ) * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params	2024-06-04 21:23:39 +03:00
Georgi Gerganov	554c247caf	ggml : remove OpenCL (#7735 ) ggml-ci	2024-06-04 21:23:20 +03:00
Georgi Gerganov	0cd6bd3483	llama : remove beam search (#7736 )	2024-06-04 21:23:05 +03:00
Georgi Gerganov	5ca0944a15	readme : remove obsolete Zig instructions (#7471 )	2024-06-04 19:43:01 +03:00
slaren	adc9ff3841	llama-bench : allow using a different printer for stderr with -oe (#7722 ) compare-commits.sh : hide stdout, use -oe to print markdown	2024-06-04 14:32:42 +02:00
Daniele	987d743d6b	Improve hipBLAS support in CMake (#7696 ) * Improve hipBLAS support in CMake This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK. * Set ROCM_PATH correctly	2024-06-04 14:09:15 +02:00
zhouwg	b226c1227b	refine .gitignore (#7688 ) This adds tags and android ndk into the git ignore list	2024-06-04 21:21:26 +10:00
jaime-m-p	3b38d48609	Per token attributes (#7685 ) * Add per token attributes enum * Using phi-3 for testing 'rstrip' * Using jina-v2 for testing 'lstrip' * Brute force test for 'lstrip' and 'rstrip' * Implement 'rstrip' and 'lstrip' * Update phi-3 GGUF file (obsolete since `917dc8c`) * Replace llama_token_type with llama_token_attribs	2024-06-04 09:17:17 +02:00
Georgi Gerganov	6d1616944d	ggml : prevent builds with -ffinite-math-only (#7726 ) This enforces a check that -fno-finite-math-only was set and that the operating compiling mode is not in finite maths mode. This is because during rewriting of silu and softmax for cpu #7154 there emerged an issue where the result that was observed when >1 slot was nondeterministic as found by @JohannesGaessler. @LostRuins narrowed the problem down to -ffinite-math-only which was theorised to be due to SiLU, instead of flushing small values to 0, returns NaN or some other garbage. @jart proposed a fix that @ggerganov then implemented in this fix ref https://github.com/ggerganov/llama.cpp/pull/7154#issuecomment-2145661825	2024-06-04 17:01:09 +10:00
Christian Zhou-Zheng	c8ecbc67e2	oops, actually fix gguf_writer placement	2024-06-03 19:34:37 -04:00
Christian Zhou-Zheng	efead0408c	fix gguf_writer placement and remove comments	2024-06-03 19:34:01 -04:00
Radoslav Gerganov	bde7cd3cd9	llama : offload to RPC in addition to other backends (#7640 ) * llama : offload to RPC in addition to other backends * - fix copy_tensor being called on the src buffer instead of the dst buffer - always initialize views in the view_src buffer - add RPC backend to Makefile build - add endpoint to all RPC object names * add rpc-server to Makefile * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-03 20:03:26 +03:00
Masaya, Kato	a5735e4426	ggml : use OpenMP as a thread pool (#7606 ) * ggml: Added OpenMP for multi-threads processing * ggml : Limit the number of threads used to avoid deadlock * update shared state n_threads in parallel region * clear numa affinity for main thread even with openmp * enable openmp by default * fix msvc build * disable openmp on macos * ci : disable openmp with thread sanitizer * Update ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-03 17:14:15 +02:00
Johannes Gäßler	0b832d53ba	make: fix debug options not being applied to NVCC (#7714 )	2024-06-03 16:28:58 +02:00
Christian Zhou-Zheng	a9c7703c12	fix final? merge issue	2024-06-03 09:18:19 -04:00
Christian Zhou-Zheng	140eb52f3f	Merge branch 'master' into convert-split	2024-06-03 09:07:23 -04:00
Christian Zhou-Zheng	240243e63f	remove unnecessary imports in gguf_manager	2024-06-03 09:01:42 -04:00
Christian Zhou-Zheng	09baf2f3b5	fix Q8 quantization	2024-06-03 08:58:29 -04:00
0cc4m	3d7ebf6312	Vulkan Mixture of Experts (MoE) support (#7628 ) * Finish Vulkan mul_mat_id implementation * Add Vulkan sum_rows and div ops * Fix MUL_MAT_ID matrix matrix shader * Fix MUL_MAT_ID matrix vector shader dispatch size * Fix MUL_MAT_ID matrix vector shader and dispatch code * Update Vulkan CPU offload for MUL_MAT_ID * Fix crash when using split mode none and setting a main GPU	2024-06-03 10:59:14 +02:00
Andy Tai	a10cda58d3	cmake : add pkg-config spec file for llama.cpp (#7702 )	2024-06-03 11:06:24 +03:00
zhangkaihuo	6f28a333c1	llama : MiniCPM support tied embeddings (#7664 ) * support lm_head * remove the code block --------- Co-authored-by: zhangkaihuo <zhangkaihuo@modelbest.cn>	2024-06-03 10:49:30 +03:00
Georgi Gerganov	549279d804	llama : avoid double token-to-piece cache (#7654 ) ggml-ci	2024-06-03 08:34:43 +03:00
woachk	9e405b6e2e	kompute : implement op_getrows_f32 (#6403 ) op_getrows_f32 is required since https://github.com/ggerganov/llama.cpp/pull/6122 for the Vulkan w/ Kompute backend to be functional. As such, implement this op to make this backend functional again.	2024-06-03 08:32:16 +03:00
Dave Airlie	3413ae2193	fix bug introduced in using calloc (#7701 ) compilade pointed this out on the previous MR	2024-06-02 17:59:54 -04:00
Georgi Gerganov	1669810d7c	flake.lock: Update (#7686 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/8dc45382d5206bd292f9c2768b8058a8fd8311d9?narHash=sha256-/GJvTdTpuDjNn84j82cU6bXztE0MSkdnTWClUCRub78%3D' (2024-05-16) → 'github:hercules-ci/flake-parts/2a55567fcf15b1b1c7ed712a2c6fadaec7412ea8?narHash=sha256-iKzJcpdXih14qYVcZ9QC9XuZYnPc6T8YImb6dX166kw%3D' (2024-06-01) • Updated input 'flake-parts/nixpkgs-lib': 'https://github.com/NixOS/nixpkgs/archive/50eb7ecf4cd0a5756d7275c8ba36790e5bd53e33.tar.gz?narHash=sha256-QBx10%2Bk6JWz6u7VsohfSw8g8hjdBZEf8CFzXH1/1Z94%3D' (2024-05-02) → 'https://github.com/NixOS/nixpkgs/archive/eb9ceca17df2ea50a250b6b27f7bf6ab0186f198.tar.gz?narHash=sha256-lIbdfCsf8LMFloheeE6N31%2BBMIeixqyQWbSr2vk79EQ%3D' (2024-06-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/bfb7a882678e518398ce9a31a881538679f6f092?narHash=sha256-4zSIhSRRIoEBwjbPm3YiGtbd8HDWzFxJjw5DYSDy1n8%3D' (2024-05-24) → 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-06-02 14:13:12 -07:00
Austin	7c4e5b7eae	chore : add ignore rule for generated server themes (#7689 )	2024-06-02 20:39:08 +03:00
nickp27	9422c5e34b	[SYCL] Update rpc-server.cpp to include SYCL backend (#7682 ) * Update rpc-server.cpp to include SYCL backend Draft PR to address inclusion of SYCL backend for RPC server * Update rpc-server.cpp	2024-06-02 12:13:54 +03:00
Johannes Gäßler	e141ce624a	Fix FlashAttention debug test, FP32 assert (#7684 )	2024-06-01 23:26:10 +02:00

1 2 3 4 5 ...

3225 commits