hongruichen
80051cfc4d
remove unused variables
2024-07-10 19:57:47 +08:00
hongruichen
b49b501e26
fix sprintf type
2024-07-10 19:48:57 +08:00
hongruichen
3feb574bf0
merge register_rpc_mem into alloc_rpc_mem
2024-07-10 19:40:02 +08:00
hongruichen
e97d3a6c48
fix tensor buffer allocation
...
add log
commit qnn buffer after changed
add log
register_rpc_mem 2 times
update input tensors before graph finalize
default to QNN_TENSORMEMTYPE_RAW
set new tensors at execute
move write input tensors to exec
check if mem registered before actual do
register rpc mem once allocated
2024-07-10 19:32:39 +08:00
hongruichen
dc7d83e121
add log
2024-07-10 00:33:23 +08:00
hongruichen
9add256efe
use helper function instead
2024-07-10 00:31:39 +08:00
hongruichen
a7be0693ba
add log
2024-07-10 00:29:43 +08:00
hongruichen
af869fd636
fix compiling error in debug build
2024-07-10 00:23:51 +08:00
Hongrui Chen
5f2e3918f6
refactoring ggml_qnn_tensor
2024-07-09 19:58:46 +08:00
Hongrui Chen
874216b9c8
remove unused members
2024-07-07 22:32:43 +08:00
hongruichen
263ffa962e
small opt of the qnn graph config init
2024-07-05 23:07:27 +08:00
hongruichen
4b0f6b0cd6
add helper function to get Qnn_TensorType_t from ggml_tensor
2024-07-05 19:37:58 +08:00
hongruichen
0f2e68713c
move tensor related function to utils
2024-07-05 19:02:38 +08:00
hongruichen
58cec14092
reformat
2024-07-05 17:38:54 +08:00
hongruichen
13dc3a02c3
use qnn graph inside add and mul ops
2024-07-05 13:27:16 +08:00
hongruichen
a688ed324b
add op param to add_nodes
2024-07-05 13:07:48 +08:00
hongruichen
4b2ee61f62
move graph map to backend object
2024-07-05 11:58:47 +08:00
hongruichen
ca0d999c2a
add ggml_qnn_graph
2024-07-05 11:35:18 +08:00
hongruichen
000240cf62
add clang format file and reformating
2024-07-04 23:29:31 +08:00
hongruichen
38f88d5fb1
fix compiling error after merge latest master
2024-07-03 00:13:53 +08:00
hongruichen
8b677d1b2f
move qnn backend into sub folder
2024-07-02 19:42:14 +08:00
hongruichen
3808a4c1e0
Merge branch 'master' into dev-refactoring
2024-07-01 22:52:08 +08:00
Roni
0ddeff1023
readme : update tool list ( #8209 )
...
* Added gppm to Tool list in README
* Update README.md
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 15:48:16 +03:00
Michael Francis
3840b6f593
nix : enable curl ( #8043 )
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 14:47:04 +03:00
Georgi Gerganov
257f8e41e2
nix : remove OpenCL remnants ( #8235 )
...
* nix : remove OpenCL remnants
* minor : remove parentheses
2024-07-01 14:46:18 +03:00
iacore
694c59cb42
Document BERT support. ( #8205 )
...
* Update README.md
document BERT support
* Update README.md
2024-07-01 13:40:58 +02:00
zhentaoyu
197fe6c1d7
[SYCL] Update SYCL-Rope op and Refactor ( #8157 )
...
* align with rope.cu and move sycl-op to a single file
2024-07-01 19:39:06 +08:00
Georgi Gerganov
d0a7145ba9
flake.lock: Update ( #8218 )
2024-06-30 16:09:34 -07:00
Xuan Son Nguyen
9ef0780062
Fix new line issue with chat template, disable template when in-prefix/suffix is set ( #8203 )
...
* preserve new line llama_chat_format_single
* disable chat template if in-prefix/suffix is set
* remove redundant change
2024-06-30 20:27:13 +02:00
Andrei
1c5eba6f8e
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 ( #8197 )
...
* Add attention and final logit softcapping.
* fix
* Add custom add_ functions
* Disable flash attention for Gemma2
* Update src/llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Add default value for attention and final logit softcap value
* Add custom kq scaling from Gemma2Attention
* Remove custom pre attention scaling and use computed value instead.
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-29 23:44:08 -04:00
Xuan Son Nguyen
72272b83a3
fix code typo in llama-cli ( #8198 )
2024-06-29 00:14:20 +02:00
Olivier Chafik
8748d8ac6f
json: attempt to skip slow tests when running under emulator ( #8189 )
2024-06-28 18:02:05 +01:00
Xuan Son Nguyen
26a39bbd6b
Add MiniCPM, Deepseek V2 chat template + clean up llama_chat_apply_template_internal
( #8172 )
...
* tmp_contains
* minicpm chat template
* add DeepSeek Lite template
* change deepseek-lite to deepseek2
* correct code comment
* correct code from master branch
2024-06-28 15:11:44 +02:00
Sigbjørn Skjæret
38373cfbab
Add SPM infill support ( #8016 )
...
* add --spm-infill option
* support --spm-infill
* support --spm-infill
2024-06-28 12:53:43 +02:00
slaren
b851b3fba0
cmake : allow user to override default options ( #8178 )
2024-06-28 12:37:45 +02:00
Olivier Chafik
139cc621e9
json
: restore default additionalProperties to false, fix some pattern escapes (#8180 )
...
* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset
* json: revert default of additionalProperties to false
* Update README.md
2024-06-28 09:26:45 +01:00
pculliton
e57dc62057
llama: Add support for Gemma2ForCausalLM ( #8156 )
...
* Inference support for Gemma 2 model family
* Update convert-hf-to-gguf.py, constants, and tensor mappings
* cleanup
* format fix
* Fix special token vocab bug
* Don't add space prefix
* fix deleted lines
* Update src/llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Add model type names
* Add control vector
* Fix model type identification
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-27 21:00:43 -07:00
Xuan Son Nguyen
a27aa50ab7
Add missing items in makefile ( #8177 )
2024-06-28 02:19:11 +02:00
Olivier Chafik
cb0b06a8a6
json
: update grammars/README w/ examples & note about additionalProperties (#8132 )
...
* json: update grammars/README
* mention broken prefixItems
* add mention to llama-gbnf-validator
* json: explicit type: object for nested items object in cli example
2024-06-27 22:08:42 +01:00
loonerin
558f44bf83
CI: fix release build (Ubuntu+Mac) ( #8170 )
...
* CI: fix release build (Ubuntu)
PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.
* CI: fix release build (Mac)
---------
Co-authored-by: loonerin <loonerin@users.noreply.github.com>
2024-06-27 21:01:23 +02:00
slaren
8172ee9da9
cmake : fix deprecated option names not working ( #8171 )
...
* cmake : fix deprecated option names not working
* remove LlAMA_OPENMP
2024-06-27 20:04:39 +02:00
Xuan Son Nguyen
16791b8f0b
Add chatml fallback for cpp llama_chat_apply_template
( #8160 )
...
* add chatml fallback for cpp `llama_chat_apply_template`
* remove redundant code
2024-06-27 18:14:19 +02:00
Georgi Gerganov
ab3679112d
flake.lock: Update ( #8071 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/e9ee548d90ff586a6471b4ae80ae9cfcbceb3420?narHash=sha256-4Zu0RYRcAY/VWuu6awwq4opuiD//ahpc2aFHg2CWqFY%3D' (2024-06-13)
→ 'github:NixOS/nixpkgs/d603719ec6e294f034936c0d0dc06f689d91b6c3?narHash=sha256-k3JqJrkdoYwE3fHE6xGDY676AYmyh4U2Zw%2B0Bwe5DLU%3D' (2024-06-20)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Philip Taron <philip.taron@gmail.com>
2024-06-27 08:37:29 -07:00
jukofyork
97877eb10b
Control vector loading fixes ( #8137 )
...
* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow
* refactored `llama_control_vector_load_one()`
* allow multiple directions for same layer in same file
* llama_control_vector_load_one() and llama_control_vector_load() now break on error
* removed unnecessary ggml_free() call
2024-06-27 16:48:07 +02:00
Raj Hammeer Singh Hada
387952651a
Delete examples/llama.android/llama/CMakeLists.txt ( #8165 )
...
* Delete examples/llama.android/llama/CMakeLists.txt
https://github.com/ggerganov/llama.cpp/pull/8145#issuecomment-2194534244
This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead.
* Update CMakeLists.txt
Pick local llama.cpp files instead of fetching content from git
2024-06-27 16:39:29 +02:00
Sigbjørn Skjæret
6030c61281
Add Qwen2MoE 57B-A14B model identifier ( #8158 )
...
* Add Qwen2MoE 57B-A14B
* Add Qwen2MoE 57B-A14B
2024-06-27 16:27:41 +02:00
Johannes Gäßler
85a267daaa
CUDA: fix MMQ stream-k for --split-mode row ( #8167 )
2024-06-27 16:26:05 +02:00
kustaaya
f675b20a3b
Added support for Viking pre-tokenizer ( #8135 )
...
Co-authored-by: kustaaya <kustaaya@protonmail.com>
2024-06-27 10:58:54 +02:00
Sigbjørn Skjæret
911e35bb8b
llama : fix CodeLlama FIM token checks ( #8144 )
...
* account for space prefix character
* use find instead
2024-06-27 10:46:41 +03:00
Raj Hammeer Singh Hada
ac146628e4
Fix llama-android.cpp for error - "common/common.h not found" ( #8145 )
...
- Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"
2024-06-27 03:57:57 +02:00