hongruichen
7cbc4fbd8c
add mul
2024-07-12 23:26:38 +08:00
hongruichen
e3aa43adbd
suppress warning
2024-07-12 23:26:11 +08:00
hongruichen
0eb595cc6e
use table to simpilify the op mapping
2024-07-12 23:22:29 +08:00
hongruichen
f0894d897a
wip
...
wip
2024-07-12 19:57:34 +08:00
hongruichen
be3aa9631f
use template function directly
2024-07-11 11:18:06 +08:00
hongruichen
8932135fdb
add sqrt and mul ops
2024-07-11 00:08:08 +08:00
hongruichen
7ea28a6fac
add helper function for binary op
2024-07-10 23:39:03 +08:00
hongruichen
b6f29273f0
add function to get graph from cache
2024-07-10 23:08:32 +08:00
hongruichen
80051cfc4d
remove unused variables
2024-07-10 19:57:47 +08:00
hongruichen
b49b501e26
fix sprintf type
2024-07-10 19:48:57 +08:00
hongruichen
3feb574bf0
merge register_rpc_mem into alloc_rpc_mem
2024-07-10 19:40:02 +08:00
hongruichen
e97d3a6c48
fix tensor buffer allocation
...
add log
commit qnn buffer after changed
add log
register_rpc_mem 2 times
update input tensors before graph finalize
default to QNN_TENSORMEMTYPE_RAW
set new tensors at execute
move write input tensors to exec
check if mem registered before actual do
register rpc mem once allocated
2024-07-10 19:32:39 +08:00
hongruichen
dc7d83e121
add log
2024-07-10 00:33:23 +08:00
hongruichen
9add256efe
use helper function instead
2024-07-10 00:31:39 +08:00
hongruichen
a7be0693ba
add log
2024-07-10 00:29:43 +08:00
hongruichen
af869fd636
fix compiling error in debug build
2024-07-10 00:23:51 +08:00
Hongrui Chen
5f2e3918f6
refactoring ggml_qnn_tensor
2024-07-09 19:58:46 +08:00
Hongrui Chen
874216b9c8
remove unused members
2024-07-07 22:32:43 +08:00
hongruichen
263ffa962e
small opt of the qnn graph config init
2024-07-05 23:07:27 +08:00
hongruichen
4b0f6b0cd6
add helper function to get Qnn_TensorType_t from ggml_tensor
2024-07-05 19:37:58 +08:00
hongruichen
0f2e68713c
move tensor related function to utils
2024-07-05 19:02:38 +08:00
hongruichen
58cec14092
reformat
2024-07-05 17:38:54 +08:00
hongruichen
13dc3a02c3
use qnn graph inside add and mul ops
2024-07-05 13:27:16 +08:00
hongruichen
a688ed324b
add op param to add_nodes
2024-07-05 13:07:48 +08:00
hongruichen
4b2ee61f62
move graph map to backend object
2024-07-05 11:58:47 +08:00
hongruichen
ca0d999c2a
add ggml_qnn_graph
2024-07-05 11:35:18 +08:00
hongruichen
000240cf62
add clang format file and reformating
2024-07-04 23:29:31 +08:00
hongruichen
38f88d5fb1
fix compiling error after merge latest master
2024-07-03 00:13:53 +08:00
hongruichen
8b677d1b2f
move qnn backend into sub folder
2024-07-02 19:42:14 +08:00
hongruichen
3808a4c1e0
Merge branch 'master' into dev-refactoring
2024-07-01 22:52:08 +08:00
Roni
0ddeff1023
readme : update tool list ( #8209 )
...
* Added gppm to Tool list in README
* Update README.md
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 15:48:16 +03:00
Michael Francis
3840b6f593
nix : enable curl ( #8043 )
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 14:47:04 +03:00
Georgi Gerganov
257f8e41e2
nix : remove OpenCL remnants ( #8235 )
...
* nix : remove OpenCL remnants
* minor : remove parentheses
2024-07-01 14:46:18 +03:00
iacore
694c59cb42
Document BERT support. ( #8205 )
...
* Update README.md
document BERT support
* Update README.md
2024-07-01 13:40:58 +02:00
zhentaoyu
197fe6c1d7
[SYCL] Update SYCL-Rope op and Refactor ( #8157 )
...
* align with rope.cu and move sycl-op to a single file
2024-07-01 19:39:06 +08:00
Georgi Gerganov
d0a7145ba9
flake.lock: Update ( #8218 )
2024-06-30 16:09:34 -07:00
Xuan Son Nguyen
9ef0780062
Fix new line issue with chat template, disable template when in-prefix/suffix is set ( #8203 )
...
* preserve new line llama_chat_format_single
* disable chat template if in-prefix/suffix is set
* remove redundant change
2024-06-30 20:27:13 +02:00
Andrei
1c5eba6f8e
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 ( #8197 )
...
* Add attention and final logit softcapping.
* fix
* Add custom add_ functions
* Disable flash attention for Gemma2
* Update src/llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Add default value for attention and final logit softcap value
* Add custom kq scaling from Gemma2Attention
* Remove custom pre attention scaling and use computed value instead.
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-29 23:44:08 -04:00
Xuan Son Nguyen
72272b83a3
fix code typo in llama-cli ( #8198 )
2024-06-29 00:14:20 +02:00
Olivier Chafik
8748d8ac6f
json: attempt to skip slow tests when running under emulator ( #8189 )
2024-06-28 18:02:05 +01:00
Xuan Son Nguyen
26a39bbd6b
Add MiniCPM, Deepseek V2 chat template + clean up llama_chat_apply_template_internal
( #8172 )
...
* tmp_contains
* minicpm chat template
* add DeepSeek Lite template
* change deepseek-lite to deepseek2
* correct code comment
* correct code from master branch
2024-06-28 15:11:44 +02:00
Sigbjørn Skjæret
38373cfbab
Add SPM infill support ( #8016 )
...
* add --spm-infill option
* support --spm-infill
* support --spm-infill
2024-06-28 12:53:43 +02:00
slaren
b851b3fba0
cmake : allow user to override default options ( #8178 )
2024-06-28 12:37:45 +02:00
Olivier Chafik
139cc621e9
json
: restore default additionalProperties to false, fix some pattern escapes (#8180 )
...
* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset
* json: revert default of additionalProperties to false
* Update README.md
2024-06-28 09:26:45 +01:00
pculliton
e57dc62057
llama: Add support for Gemma2ForCausalLM ( #8156 )
...
* Inference support for Gemma 2 model family
* Update convert-hf-to-gguf.py, constants, and tensor mappings
* cleanup
* format fix
* Fix special token vocab bug
* Don't add space prefix
* fix deleted lines
* Update src/llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Add model type names
* Add control vector
* Fix model type identification
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-27 21:00:43 -07:00
Xuan Son Nguyen
a27aa50ab7
Add missing items in makefile ( #8177 )
2024-06-28 02:19:11 +02:00
Olivier Chafik
cb0b06a8a6
json
: update grammars/README w/ examples & note about additionalProperties (#8132 )
...
* json: update grammars/README
* mention broken prefixItems
* add mention to llama-gbnf-validator
* json: explicit type: object for nested items object in cli example
2024-06-27 22:08:42 +01:00
loonerin
558f44bf83
CI: fix release build (Ubuntu+Mac) ( #8170 )
...
* CI: fix release build (Ubuntu)
PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.
* CI: fix release build (Mac)
---------
Co-authored-by: loonerin <loonerin@users.noreply.github.com>
2024-06-27 21:01:23 +02:00
slaren
8172ee9da9
cmake : fix deprecated option names not working ( #8171 )
...
* cmake : fix deprecated option names not working
* remove LlAMA_OPENMP
2024-06-27 20:04:39 +02:00
Xuan Son Nguyen
16791b8f0b
Add chatml fallback for cpp llama_chat_apply_template
( #8160 )
...
* add chatml fallback for cpp `llama_chat_apply_template`
* remove redundant code
2024-06-27 18:14:19 +02:00