Commit graph

3509 commits

Author SHA1 Message Date
hongruichen
47735cb589 fix: try fix error in 2nd run by appending dimension into graph key 2024-07-26 23:04:53 +08:00
hongruichen
ee305cc171 refactoring: split qnn rpc buffer into dedicated class 2024-07-26 22:52:23 +08:00
hongruichen
f843e5aaf5 fix: 1.free up rpc memory at destruct
2. unbind tesnsor
2024-07-25 23:45:04 +08:00
hongruichen
706793f078 fix: back to qnn tensor v1 to fix the create tensor error 2024-07-22 23:08:38 +08:00
hongruichen
3b47056c97 refactoring: change the tensor binding mode between qnn tensor and ggml tensor 2024-07-22 23:08:38 +08:00
hongruichen
b173c4e061 feat: update tensor name when bind to graph 2024-07-20 17:31:40 +08:00
hongruichen
5f3b1ae3b0 fix: try fix graph cache with append the tensors name 2024-07-20 16:39:06 +08:00
hongruichen
51f95d6980 fix: dimension could be wrong for tensor liked 1x1x8 2024-07-20 16:11:35 +08:00
hongruichen
27299463ae fix: try fix tensor type error 2024-07-20 15:13:10 +08:00
hongruichen
28a00e5e6c fix: try fix QNN_GRAPH_ERROR_INVALID_OP_CONFIG 2024-07-20 14:11:58 +08:00
hongruichen
1679dcf47e fix: check all dimentions in can offload 2024-07-20 13:29:01 +08:00
hongruichen
b1b5cc10b1 add function to convert qnn error into string 2024-07-19 22:51:17 +08:00
hongruichen
a607995f95 Reapply "tried fix the add node error 6005"
This reverts commit f45fbec8f4.
2024-07-19 15:35:55 +08:00
hongruichen
0153a23d3f fix support ops
This reverts commit f45fbec8f4.
2024-07-19 15:31:29 +08:00
hongruichen
f45fbec8f4 Revert "tried fix the add node error 6005"
This reverts commit ce3d09e5f2.
2024-07-19 12:59:38 +08:00
hongruichen
ce3d09e5f2 tried fix the add node error 6005 2024-07-19 12:59:21 +08:00
hongruichen
665f823748 fix op checker 2024-07-18 22:26:53 +08:00
hongruichen
15f5cc450c bug: fix allocation size overflow at log 2024-07-18 19:44:05 +08:00
hongruichen
d82b3a0bdb feat: add GGML_UNARY_OP_GELU 2024-07-18 11:15:48 +08:00
hongruichen
ce199b2de7 refactoring: downgrade some log to debug level 2024-07-17 23:49:47 +08:00
hongruichen
c76fc9aa2f fix warnings 2024-07-17 23:32:13 +08:00
hongruichen
6457a68bd7 disable qnn profiling in release build 2024-07-17 23:24:29 +08:00
hongruichen
b7d781ec81 remove qnn dedicated unit tests since we're now using the test-backend-ops to cross-validate backend ops 2024-07-17 23:08:16 +08:00
hongruichen
2502b57203 fix warnings 2024-07-17 22:10:12 +08:00
hongruichen
454deef83c register qnn backend 2024-07-17 21:25:55 +08:00
hongruichen
eed960575f add build step of QNN backend at ggml 2024-07-17 19:43:01 +08:00
hongruichen
861bb9c580 Merge tag 'b3405' into dev-refactoring 2024-07-17 17:13:55 +08:00
hongruichen
bb13795dce refactoring: remove unused functions and variables 2024-07-17 14:17:35 +08:00
hongruichen
63dc587dff refactoring: make the buffer alloc and free stay in same class 2024-07-17 14:08:31 +08:00
hongruichen
b1ef302991 refactoring: remove depend of dlsym at utils.hpp 2024-07-17 12:21:33 +08:00
Johannes Gäßler
5e116e8dd5
make/cmake: add missing force MMQ/cuBLAS for HIP (#8515) 2024-07-16 21:20:59 +02:00
hongruichen
0301b500cd refactoring: prevent leak the QNN_INTERFACE_VER_TYPE and QNN_SYSTEM_INTERFACE_VER_TYPE outside of qnn.hpp 2024-07-17 00:18:38 +08:00
Brian
1666f92dcd
gguf-hash : update clib.json to point to original xxhash repo (#8491)
* Update clib.json to point to Cyan4973 original xxhash

Convinced Cyan4973 to add clib.json directly to his repo, so can now point the clib package directly to him now. Previously pointed to my fork with the clib.json package metadata

https://github.com/Cyan4973/xxHash/pull/954

* gguf-hash: readme update to point to Cyan4973 xxHash repo [no ci]
2024-07-16 10:14:16 +03:00
Steve Bonds
37b12f92ab
export-lora : handle help argument (#8497)
The --help option on export-lora isn't accepted as valid. The help still gets displayed by default, but the script exits with an error message and nonzero status.
2024-07-16 10:04:45 +03:00
Georgi Gerganov
0efec57787
llama : valign + remove unused ftype (#8502) 2024-07-16 10:00:30 +03:00
compilade
7acfd4e8d5
convert_hf : faster lazy safetensors (#8482)
* convert_hf : faster lazy safetensors

This makes '--dry-run' much, much faster.

* convert_hf : fix memory leak in lazy MoE conversion

The '_lazy' queue was sometimes self-referential,
which caused reference cycles of objects old enough
to avoid garbage collection until potential memory exhaustion.
2024-07-15 23:13:10 -04:00
Xuan Son Nguyen
97bdd26eee
Refactor lora adapter support (#8332)
* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <slarengh@gmail.com>

* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-15 20:50:47 +02:00
Xuan Son Nguyen
4db8f60fe7
fix ci (#8494) 2024-07-15 19:23:10 +02:00
hongruichen
ff601abc1c add todo 2024-07-16 00:05:40 +08:00
Daniel Bevenius
8fac431b06
ggml : suppress unknown pragma 'GCC' on windows (#8460)
This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:

```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```
2024-07-15 15:48:17 +03:00
M-A
f17f39ff9c
server: update README.md with llama-server --help output [no ci] (#8472)
The README.md had a stale information. In particular, the --ctx-size
"defaults to 512" confused me and I had to check the code to confirm
this was false. This the server is evolving rapidly, it's probably
better to keep the source of truth at a single place (in the source) and
generate the README.md based on that.

Did:

    make llama-server
    ./llama-server --help > t.txt
    vimdiff t.txt examples/server/README.md

I copied the content inside a backquote block. I would have preferred
proper text but it would require a fair amount of surgery to make the
current output compatible with markdown. A follow up could be to
automate this process with a script.

No functional change.
2024-07-15 15:04:56 +03:00
Georgi Gerganov
9104bc20ed
common : add --no-cont-batching arg (#6358) 2024-07-15 14:54:58 +03:00
NikolaiLyssogor
fc690b018e
docs: fix links in development docs [no ci] (#8481)
Fixes a few links to within the repo that were broken in the reorganization of the
documentation in #8325.
2024-07-15 14:46:39 +03:00
Meng, Hengyu
16bdfa42ac
[SYCL] add concat through dim 1/2 (#8483)
* add concat through dim 1/2
2024-07-15 19:32:15 +08:00
Georgi Gerganov
3dfda05956
llama : de-duplicate deepseek2 norm 2024-07-15 14:10:39 +03:00
0cc4m
bda62d7999
Vulkan MMQ Fix (#8479)
* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error
2024-07-15 09:38:52 +02:00
hongruichen
f32327e2b2 remove multiply declearation of log in unit test 2024-07-15 12:06:12 +08:00
hongruichen
cd5a7331f7 add cpu backend as cross reference 2024-07-15 10:55:17 +08:00
hongruichen
4410fd6563 format with clang-format 2024-07-15 10:30:57 +08:00
hongruichen
c46b4deea9 [unit test] init all tensor by one function 2024-07-15 10:23:19 +08:00