Commit graph

4244 commits

Author SHA1 Message Date
HimariO
b24ab8634f add m-rope testcase to test-backend-ops 2024-12-09 22:13:11 +08:00
HimariO
3ba7664de9 minor updates on debug util, bug fixs 2024-12-09 22:12:30 +08:00
HimariO
12f17f754d rename mrope related function, params 2024-12-08 01:32:19 +08:00
HimariO
ac2089c378 add mrope unit test, fix few compiler warnings 2024-12-08 00:47:48 +08:00
HimariO
6c39aa38f5 add makefile entry, update speical image padding token 2024-12-07 21:59:54 +08:00
HimariO
cbd08b4204 resolve linter, test errors 2024-11-29 22:18:15 +08:00
HimariO
fac034530f update to keep up stream changes 2024-11-29 17:55:18 +08:00
HimariO
07553cfb0f update llama_hparams 2024-11-29 17:53:45 +08:00
HimariO
241bb45714 fix rope op mode switching, out dated func args 2024-11-29 17:53:45 +08:00
HimariO
f1fa60f84c add GGML_ROPE_TYPE_MROPE, GGML_ROPE_TYPE_VISION 2024-11-29 17:53:45 +08:00
HimariO
201f7043c3 add fp16 support for qwen2vl and m-rope 2024-11-29 17:53:44 +08:00
HimariO
3237bb4614 add fp32 mrope, vision rope kernel 2024-11-29 17:53:44 +08:00
HimariO
0882f57612 cuda-gdb cmake preset 2024-11-29 17:53:44 +08:00
HimariO
53480d2bdb replace variable size array with vector 2024-11-29 17:52:47 +08:00
HimariO
3d19dd44b6 add arg parser to qwen2vl_surgery 2024-11-29 17:52:47 +08:00
HimariO
023f0076e0 correcting vision-rope behavior, add the missing last layer back to ViT 2024-11-29 17:52:47 +08:00
HimariO
bcd49f5984 [WIP] create inference workflow, gguf convert script but fix 2024-11-29 17:52:47 +08:00
HimariO
7e9fc7202e make batch and clip utils compatible with qwen2vl 2024-11-29 17:52:47 +08:00
HimariO
c13edfed59 [WIP] qwen2vl vision model 2024-11-29 17:52:47 +08:00
HimariO
3c3691e10f update 5D tensor op workaround 2024-11-29 17:52:47 +08:00
HimariO
f661483ea7 update qwen2vl cli tool 2024-11-29 17:52:47 +08:00
HimariO
9d389a051b Add vl-rope/2d-rope support for qwen2vl ViT 2024-11-29 17:52:47 +08:00
HimariO
35411963d2 Verify m-rope output 2024-11-29 17:52:46 +08:00
HimariO
b24bd89e77 [WIP] add qwen2vl arch 2024-11-29 17:52:46 +08:00
HimariO
7c6f793492 Add Qwen2VL cli entrypoint 2024-11-29 17:52:46 +08:00
HimariO
c17546fffa Barebone Qwen2VL LLM convertor 2024-11-29 17:52:46 +08:00
Chenguang Li
938f608742
CANN: RoPE operator optimization (#10563)
* [cann] RoPE operator optimization

* [CANN]Code Formatting

---------

Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-29 14:46:55 +08:00
Jeff Bolz
f095a649ec
vulkan: get the first command buffer submitted sooner (#10499)
This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.

With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.
2024-11-29 07:18:02 +01:00
Ting Lou
678d7994f4
llava: return false instead of exit (#10546) 2024-11-29 01:09:46 +01:00
Georgi Gerganov
dc22344088
ggml : remove redundant copyright notice + update authors 2024-11-28 20:46:40 +02:00
Georgi Gerganov
4c0a95b107
llama : add missing model types 2024-11-28 20:45:07 +02:00
Xuan Son Nguyen
6c59567689
server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568)
* server : (tests) don't use thread for capturing stdout/stderr

* test: bump openai to 1.55.2

* bump openai to 1.55.3
2024-11-28 19:17:49 +01:00
Johannes Gäßler
890719311b
common: fix warning message when no GPU found (#10564) 2024-11-28 18:15:25 +01:00
Random Fly
7281cf13ad
docs: fix outdated usage of llama-simple (#10565) 2024-11-28 16:03:11 +01:00
Diego Devesa
e90688edd0
ci : fix tag name in cuda and hip releases (#10566) 2024-11-28 15:58:54 +01:00
Georgi Gerganov
76b27d29c2
ggml : fix row condition for i8mm kernels (#10561)
ggml-ci
2024-11-28 14:56:37 +02:00
Georgi Gerganov
eea986f215
cmake : fix ARM feature detection (#10543)
ggml-ci
2024-11-28 14:56:23 +02:00
Shupei Fan
c202cef168
ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541)
* ggml-cpu: support IQ4_NL_4_4 by runtime repack

* ggml-cpu: add __ARM_FEATURE_DOTPROD guard
2024-11-28 13:52:03 +01:00
Sergio López
2025fa67e9
kompute : improve backend to pass test_backend_ops (#10542)
* kompute: op_unary: reject unsupported parameters

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: softmax: implement ALiBi support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: rope: implement neox and phi3 support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: op_mul_mat_q4_k permutted support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: op_mul_mat_f16 permutted support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: op_mul_mat_q6_k permutted support

Signed-off-by: Sergio Lopez <slp@redhat.com>

---------

Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-11-28 12:51:38 +01:00
Ruixin Huang
c6bc73951e
CANN: Update cann.md to display correctly in CLion (#10538) 2024-11-28 15:27:11 +08:00
leo-pony
605fa66c50
CANN: Fix SOC_TYPE compile bug (#10519)
* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment

* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.

* fix CANN  compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version
2024-11-28 15:25:24 +08:00
Chenguang Li
b7420131bf
CANN: ROPE operator optimization (#10540)
* [cann] ROPE operator optimization

Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-28 14:24:46 +08:00
Xuan Son Nguyen
9f912511bc
common : fix duplicated file name with hf_repo and hf_file (#10550) 2024-11-27 22:30:52 +01:00
uvos
3ad5451f3b
Add some minimal optimizations for CDNA (#10498)
* Add some minimal optimizations for CDNA

* ggml_cuda: set launch bounds also for GCN as it helps there too
2024-11-27 17:10:08 +01:00
Diego Devesa
46c69e0e75
ci : faster CUDA toolkit installation method and use ccache (#10537)
* ci : faster CUDA toolkit installation method and use ccache

* remove fetch-depth

* only pack CUDA runtime on master
2024-11-27 11:03:25 +01:00
Georgi Gerganov
9e2301f4a4
metal : fix group_norm support condition (#0) 2024-11-27 11:22:14 +02:00
Georgi Gerganov
fee824a1a1
sync : ggml 2024-11-27 11:10:42 +02:00
Frankie Robertson
9150f8fef9
Do not include arm_neon.h when compiling CUDA code (ggml/1028) 2024-11-27 11:10:27 +02:00
Jeff Bolz
c31ed2abfc
vulkan: define all quant data structures in types.comp (#10440) 2024-11-27 08:32:54 +01:00
Jeff Bolz
5b3466bedf
vulkan: Handle GPUs with less shared memory (#10468)
There have been reports of failure to compile on systems with <= 32KB
of shared memory (e.g. #10037). This change makes the large tile size
fall back to a smaller size if necessary, and makes mul_mat_id fall
back to CPU if there's only 16KB of shared memory.
2024-11-27 08:30:27 +01:00