llama.cpp

Author	SHA1	Message	Date
hongruichen	7cbc4fbd8c	add mul	2024-07-12 23:26:38 +08:00
hongruichen	e3aa43adbd	suppress warning	2024-07-12 23:26:11 +08:00
hongruichen	0eb595cc6e	use table to simpilify the op mapping	2024-07-12 23:22:29 +08:00
hongruichen	f0894d897a	wip wip	2024-07-12 19:57:34 +08:00
hongruichen	be3aa9631f	use template function directly	2024-07-11 11:18:06 +08:00
hongruichen	8932135fdb	add sqrt and mul ops	2024-07-11 00:08:08 +08:00
hongruichen	7ea28a6fac	add helper function for binary op	2024-07-10 23:39:03 +08:00
hongruichen	b6f29273f0	add function to get graph from cache	2024-07-10 23:08:32 +08:00
hongruichen	80051cfc4d	remove unused variables	2024-07-10 19:57:47 +08:00
hongruichen	b49b501e26	fix sprintf type	2024-07-10 19:48:57 +08:00
hongruichen	3feb574bf0	merge register_rpc_mem into alloc_rpc_mem	2024-07-10 19:40:02 +08:00
hongruichen	e97d3a6c48	fix tensor buffer allocation add log commit qnn buffer after changed add log register_rpc_mem 2 times update input tensors before graph finalize default to QNN_TENSORMEMTYPE_RAW set new tensors at execute move write input tensors to exec check if mem registered before actual do register rpc mem once allocated	2024-07-10 19:32:39 +08:00
hongruichen	dc7d83e121	add log	2024-07-10 00:33:23 +08:00
hongruichen	9add256efe	use helper function instead	2024-07-10 00:31:39 +08:00
hongruichen	a7be0693ba	add log	2024-07-10 00:29:43 +08:00
hongruichen	af869fd636	fix compiling error in debug build	2024-07-10 00:23:51 +08:00
Hongrui Chen	5f2e3918f6	refactoring ggml_qnn_tensor	2024-07-09 19:58:46 +08:00
Hongrui Chen	874216b9c8	remove unused members	2024-07-07 22:32:43 +08:00
hongruichen	263ffa962e	small opt of the qnn graph config init	2024-07-05 23:07:27 +08:00
hongruichen	4b0f6b0cd6	add helper function to get Qnn_TensorType_t from ggml_tensor	2024-07-05 19:37:58 +08:00
hongruichen	0f2e68713c	move tensor related function to utils	2024-07-05 19:02:38 +08:00
hongruichen	58cec14092	reformat	2024-07-05 17:38:54 +08:00
hongruichen	13dc3a02c3	use qnn graph inside add and mul ops	2024-07-05 13:27:16 +08:00
hongruichen	a688ed324b	add op param to add_nodes	2024-07-05 13:07:48 +08:00
hongruichen	4b2ee61f62	move graph map to backend object	2024-07-05 11:58:47 +08:00
hongruichen	ca0d999c2a	add ggml_qnn_graph	2024-07-05 11:35:18 +08:00
hongruichen	000240cf62	add clang format file and reformating	2024-07-04 23:29:31 +08:00
hongruichen	38f88d5fb1	fix compiling error after merge latest master	2024-07-03 00:13:53 +08:00
hongruichen	8b677d1b2f	move qnn backend into sub folder	2024-07-02 19:42:14 +08:00
hongruichen	3808a4c1e0	Merge branch 'master' into dev-refactoring	2024-07-01 22:52:08 +08:00
Roni	0ddeff1023	readme : update tool list (#8209 ) * Added gppm to Tool list in README * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 15:48:16 +03:00
Michael Francis	3840b6f593	nix : enable curl (#8043 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 14:47:04 +03:00
Georgi Gerganov	257f8e41e2	nix : remove OpenCL remnants (#8235 ) * nix : remove OpenCL remnants * minor : remove parentheses	2024-07-01 14:46:18 +03:00
iacore	694c59cb42	Document BERT support. (#8205 ) * Update README.md document BERT support * Update README.md	2024-07-01 13:40:58 +02:00
zhentaoyu	197fe6c1d7	[SYCL] Update SYCL-Rope op and Refactor (#8157 ) * align with rope.cu and move sycl-op to a single file	2024-07-01 19:39:06 +08:00
Georgi Gerganov	d0a7145ba9	flake.lock: Update (#8218 )	2024-06-30 16:09:34 -07:00
Xuan Son Nguyen	9ef0780062	Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203 ) * preserve new line llama_chat_format_single * disable chat template if in-prefix/suffix is set * remove redundant change	2024-06-30 20:27:13 +02:00
Andrei	1c5eba6f8e	llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197 ) * Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-29 23:44:08 -04:00
Xuan Son Nguyen	72272b83a3	fix code typo in llama-cli (#8198 )	2024-06-29 00:14:20 +02:00
Olivier Chafik	8748d8ac6f	json: attempt to skip slow tests when running under emulator (#8189 )	2024-06-28 18:02:05 +01:00
Xuan Son Nguyen	26a39bbd6b	Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_template_internal` (#8172 ) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch	2024-06-28 15:11:44 +02:00
Sigbjørn Skjæret	38373cfbab	Add SPM infill support (#8016 ) * add --spm-infill option * support --spm-infill * support --spm-infill	2024-06-28 12:53:43 +02:00
slaren	b851b3fba0	cmake : allow user to override default options (#8178 )	2024-06-28 12:37:45 +02:00
Olivier Chafik	139cc621e9	`json`: restore default additionalProperties to false, fix some pattern escapes (#8180 ) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md	2024-06-28 09:26:45 +01:00
pculliton	e57dc62057	llama: Add support for Gemma2ForCausalLM (#8156 ) * Inference support for Gemma 2 model family * Update convert-hf-to-gguf.py, constants, and tensor mappings * cleanup * format fix * Fix special token vocab bug * Don't add space prefix * fix deleted lines * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add model type names * Add control vector * Fix model type identification --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-06-27 21:00:43 -07:00
Xuan Son Nguyen	a27aa50ab7	Add missing items in makefile (#8177 )	2024-06-28 02:19:11 +02:00
Olivier Chafik	cb0b06a8a6	`json`: update grammars/README w/ examples & note about additionalProperties (#8132 ) * json: update grammars/README * mention broken prefixItems * add mention to llama-gbnf-validator * json: explicit type: object for nested items object in cli example	2024-06-27 22:08:42 +01:00
loonerin	558f44bf83	CI: fix release build (Ubuntu+Mac) (#8170 ) * CI: fix release build (Ubuntu) PR #8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <loonerin@users.noreply.github.com>	2024-06-27 21:01:23 +02:00
slaren	8172ee9da9	cmake : fix deprecated option names not working (#8171 ) * cmake : fix deprecated option names not working * remove LlAMA_OPENMP	2024-06-27 20:04:39 +02:00
Xuan Son Nguyen	16791b8f0b	Add chatml fallback for cpp `llama_chat_apply_template` (#8160 ) * add chatml fallback for cpp `llama_chat_apply_template` * remove redundant code	2024-06-27 18:14:19 +02:00

1 2 3 4 5 ...

3337 commits