Commit graph

3337 commits

Author SHA1 Message Date
hongruichen
7cbc4fbd8c add mul 2024-07-12 23:26:38 +08:00
hongruichen
e3aa43adbd suppress warning 2024-07-12 23:26:11 +08:00
hongruichen
0eb595cc6e use table to simpilify the op mapping 2024-07-12 23:22:29 +08:00
hongruichen
f0894d897a wip
wip
2024-07-12 19:57:34 +08:00
hongruichen
be3aa9631f use template function directly 2024-07-11 11:18:06 +08:00
hongruichen
8932135fdb add sqrt and mul ops 2024-07-11 00:08:08 +08:00
hongruichen
7ea28a6fac add helper function for binary op 2024-07-10 23:39:03 +08:00
hongruichen
b6f29273f0 add function to get graph from cache 2024-07-10 23:08:32 +08:00
hongruichen
80051cfc4d remove unused variables 2024-07-10 19:57:47 +08:00
hongruichen
b49b501e26 fix sprintf type 2024-07-10 19:48:57 +08:00
hongruichen
3feb574bf0 merge register_rpc_mem into alloc_rpc_mem 2024-07-10 19:40:02 +08:00
hongruichen
e97d3a6c48 fix tensor buffer allocation
add log

commit qnn buffer after changed

add log

register_rpc_mem 2 times

update input tensors before graph finalize

default to QNN_TENSORMEMTYPE_RAW

set new tensors at execute

move write input tensors to exec

check if mem registered before actual do

register rpc mem once allocated
2024-07-10 19:32:39 +08:00
hongruichen
dc7d83e121 add log 2024-07-10 00:33:23 +08:00
hongruichen
9add256efe use helper function instead 2024-07-10 00:31:39 +08:00
hongruichen
a7be0693ba add log 2024-07-10 00:29:43 +08:00
hongruichen
af869fd636 fix compiling error in debug build 2024-07-10 00:23:51 +08:00
Hongrui Chen
5f2e3918f6 refactoring ggml_qnn_tensor 2024-07-09 19:58:46 +08:00
Hongrui Chen
874216b9c8 remove unused members 2024-07-07 22:32:43 +08:00
hongruichen
263ffa962e small opt of the qnn graph config init 2024-07-05 23:07:27 +08:00
hongruichen
4b0f6b0cd6 add helper function to get Qnn_TensorType_t from ggml_tensor 2024-07-05 19:37:58 +08:00
hongruichen
0f2e68713c move tensor related function to utils 2024-07-05 19:02:38 +08:00
hongruichen
58cec14092 reformat 2024-07-05 17:38:54 +08:00
hongruichen
13dc3a02c3 use qnn graph inside add and mul ops 2024-07-05 13:27:16 +08:00
hongruichen
a688ed324b add op param to add_nodes 2024-07-05 13:07:48 +08:00
hongruichen
4b2ee61f62 move graph map to backend object 2024-07-05 11:58:47 +08:00
hongruichen
ca0d999c2a add ggml_qnn_graph 2024-07-05 11:35:18 +08:00
hongruichen
000240cf62 add clang format file and reformating 2024-07-04 23:29:31 +08:00
hongruichen
38f88d5fb1 fix compiling error after merge latest master 2024-07-03 00:13:53 +08:00
hongruichen
8b677d1b2f move qnn backend into sub folder 2024-07-02 19:42:14 +08:00
hongruichen
3808a4c1e0 Merge branch 'master' into dev-refactoring 2024-07-01 22:52:08 +08:00
Roni
0ddeff1023
readme : update tool list (#8209)
* Added gppm to Tool list in README

* Update README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 15:48:16 +03:00
Michael Francis
3840b6f593
nix : enable curl (#8043)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 14:47:04 +03:00
Georgi Gerganov
257f8e41e2
nix : remove OpenCL remnants (#8235)
* nix : remove OpenCL remnants

* minor : remove parentheses
2024-07-01 14:46:18 +03:00
iacore
694c59cb42
Document BERT support. (#8205)
* Update README.md

document BERT support

* Update README.md
2024-07-01 13:40:58 +02:00
zhentaoyu
197fe6c1d7
[SYCL] Update SYCL-Rope op and Refactor (#8157)
* align with rope.cu and move sycl-op to a single file
2024-07-01 19:39:06 +08:00
Georgi Gerganov
d0a7145ba9
flake.lock: Update (#8218) 2024-06-30 16:09:34 -07:00
Xuan Son Nguyen
9ef0780062
Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203)
* preserve new line llama_chat_format_single

* disable chat template if in-prefix/suffix is set

* remove redundant change
2024-06-30 20:27:13 +02:00
Andrei
1c5eba6f8e
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)
* Add attention and final logit softcapping.

* fix

* Add custom add_ functions

* Disable flash attention for Gemma2

* Update src/llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Add default value for attention and final logit softcap value

* Add custom kq scaling from Gemma2Attention

* Remove custom pre attention scaling and use computed value instead.

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-29 23:44:08 -04:00
Xuan Son Nguyen
72272b83a3
fix code typo in llama-cli (#8198) 2024-06-29 00:14:20 +02:00
Olivier Chafik
8748d8ac6f
json: attempt to skip slow tests when running under emulator (#8189) 2024-06-28 18:02:05 +01:00
Xuan Son Nguyen
26a39bbd6b
Add MiniCPM, Deepseek V2 chat template + clean up llama_chat_apply_template_internal (#8172)
* tmp_contains

* minicpm chat template

* add DeepSeek Lite template

* change deepseek-lite to deepseek2

* correct code comment

* correct code from master branch
2024-06-28 15:11:44 +02:00
Sigbjørn Skjæret
38373cfbab
Add SPM infill support (#8016)
* add --spm-infill option

* support --spm-infill

* support --spm-infill
2024-06-28 12:53:43 +02:00
slaren
b851b3fba0
cmake : allow user to override default options (#8178) 2024-06-28 12:37:45 +02:00
Olivier Chafik
139cc621e9
json: restore default additionalProperties to false, fix some pattern escapes (#8180)
* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset

* json: revert default of additionalProperties to false

* Update README.md
2024-06-28 09:26:45 +01:00
pculliton
e57dc62057
llama: Add support for Gemma2ForCausalLM (#8156)
* Inference support for Gemma 2 model family

* Update convert-hf-to-gguf.py, constants, and tensor mappings

* cleanup

* format fix

* Fix special token vocab bug

* Don't add space prefix

* fix deleted lines

* Update src/llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Add model type names

* Add control vector

* Fix model type identification

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-27 21:00:43 -07:00
Xuan Son Nguyen
a27aa50ab7
Add missing items in makefile (#8177) 2024-06-28 02:19:11 +02:00
Olivier Chafik
cb0b06a8a6
json: update grammars/README w/ examples & note about additionalProperties (#8132)
* json: update grammars/README

* mention broken prefixItems

* add mention to llama-gbnf-validator

* json: explicit type: object for nested items object in cli example
2024-06-27 22:08:42 +01:00
loonerin
558f44bf83
CI: fix release build (Ubuntu+Mac) (#8170)
* CI: fix release build (Ubuntu)

PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.

* CI: fix release build (Mac)

---------

Co-authored-by: loonerin <loonerin@users.noreply.github.com>
2024-06-27 21:01:23 +02:00
slaren
8172ee9da9
cmake : fix deprecated option names not working (#8171)
* cmake : fix deprecated option names not working

* remove LlAMA_OPENMP
2024-06-27 20:04:39 +02:00
Xuan Son Nguyen
16791b8f0b
Add chatml fallback for cpp llama_chat_apply_template (#8160)
* add chatml fallback for cpp `llama_chat_apply_template`

* remove redundant code
2024-06-27 18:14:19 +02:00