Georgi Gerganov
6acce39710
readme : update the usage section with examples ( #10596 )
...
* readme : update the usage section with examples
* readme : more examples
2024-12-01 11:25:17 +02:00
Wang Qin
43957ef203
build: update Makefile comments for C++ version change ( #10598 )
2024-12-01 04:19:44 +01:00
Adrien Gallouët
0c39f44d70
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() ( #10567 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-11-30 09:13:18 -08:00
Georgi Gerganov
3e0ba0e604
readme : remove old badge
2024-11-30 10:09:21 +02:00
Georgi Gerganov
abadba05be
readme : refresh ( #10587 )
...
* readme : refresh
* readme : move section [no ci]
* readme : clarify [no ci]
* readme : fixes [no ci]
* readme : more fixes [no ci]
* readme : simplify [no ci]
* readme : clarify GGUF
2024-11-30 09:47:07 +02:00
Eve
0533e7fb38
vulkan: Dynamic subgroup size support for Q6_K mat_vec ( #10536 )
...
* subgroup 64 version with subgroup add. 15% faster
scalable version
tested for subgroup sizes 16-128
* check for subgroup multiple of 16 and greater than 16
* subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45 )
* force 16 sequential threads per block
* make 16 subgroup size a constant
2024-11-30 08:00:02 +01:00
Diego Devesa
7cc2d2c889
ggml : move AMX to the CPU backend ( #10570 )
...
* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-29 21:54:58 +01:00
Xuan Son Nguyen
b782e5c7d4
server : add more test cases ( #10569 )
...
* server : add split model test
* add test speculative
* add invalid cases
2024-11-29 21:48:56 +01:00
Robert Collins
3a8e9af402
imatrix : support combine-only ( #10492 )
...
* imatrix-combine-only idea
* ensured that behavior consistent with log
2024-11-29 19:21:37 +02:00
Diego Devesa
a3a3048e7a
cleanup UI link list ( #10577 )
...
* cleanup UI link list
* sort list alphabetically
* add missing licenses
2024-11-29 17:45:08 +01:00
Georgi Gerganov
f0678c5ff4
ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )
...
ggml-ci
2024-11-29 16:25:39 +02:00
HimariO
cbd08b4204
resolve linter, test errors
2024-11-29 22:18:15 +08:00
Shupei Fan
4b3242bbea
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 ( #10580 )
2024-11-29 14:49:02 +01:00
Alberto Cabrera Pérez
0f77aae560
sycl : offload of get_rows set to 0 ( #10432 )
2024-11-29 20:38:45 +08:00
HimariO
fac034530f
update to keep up stream changes
2024-11-29 17:55:18 +08:00
HimariO
07553cfb0f
update llama_hparams
2024-11-29 17:53:45 +08:00
HimariO
241bb45714
fix rope op mode switching, out dated func args
2024-11-29 17:53:45 +08:00
HimariO
f1fa60f84c
add GGML_ROPE_TYPE_MROPE
, GGML_ROPE_TYPE_VISION
2024-11-29 17:53:45 +08:00
HimariO
201f7043c3
add fp16 support for qwen2vl and m-rope
2024-11-29 17:53:44 +08:00
HimariO
3237bb4614
add fp32 mrope, vision rope kernel
2024-11-29 17:53:44 +08:00
HimariO
0882f57612
cuda-gdb cmake preset
2024-11-29 17:53:44 +08:00
HimariO
53480d2bdb
replace variable size array with vector
2024-11-29 17:52:47 +08:00
HimariO
3d19dd44b6
add arg parser to qwen2vl_surgery
2024-11-29 17:52:47 +08:00
HimariO
023f0076e0
correcting vision-rope behavior, add the missing last layer back to ViT
2024-11-29 17:52:47 +08:00
HimariO
bcd49f5984
[WIP] create inference workflow, gguf convert script but fix
2024-11-29 17:52:47 +08:00
HimariO
7e9fc7202e
make batch and clip utils compatible with qwen2vl
2024-11-29 17:52:47 +08:00
HimariO
c13edfed59
[WIP] qwen2vl vision model
2024-11-29 17:52:47 +08:00
HimariO
3c3691e10f
update 5D tensor op workaround
2024-11-29 17:52:47 +08:00
HimariO
f661483ea7
update qwen2vl cli tool
2024-11-29 17:52:47 +08:00
HimariO
9d389a051b
Add vl-rope/2d-rope support for qwen2vl ViT
2024-11-29 17:52:47 +08:00
HimariO
35411963d2
Verify m-rope output
2024-11-29 17:52:46 +08:00
HimariO
b24bd89e77
[WIP] add qwen2vl arch
2024-11-29 17:52:46 +08:00
HimariO
7c6f793492
Add Qwen2VL cli entrypoint
2024-11-29 17:52:46 +08:00
HimariO
c17546fffa
Barebone Qwen2VL LLM convertor
2024-11-29 17:52:46 +08:00
Alberto Cabrera Pérez
266b8519ee
sycl : Reroute permuted mul_mats through oneMKL ( #10408 )
...
This PR fixes the failing MUL_MAT tests for the sycl backend.
2024-11-29 09:49:43 +00:00
Chenguang Li
938f608742
CANN: RoPE operator optimization ( #10563 )
...
* [cann] RoPE operator optimization
* [CANN]Code Formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-29 14:46:55 +08:00
Jeff Bolz
f095a649ec
vulkan: get the first command buffer submitted sooner ( #10499 )
...
This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.
With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.
2024-11-29 07:18:02 +01:00
Ting Lou
678d7994f4
llava: return false instead of exit ( #10546 )
2024-11-29 01:09:46 +01:00
Georgi Gerganov
dc22344088
ggml : remove redundant copyright notice + update authors
2024-11-28 20:46:40 +02:00
Georgi Gerganov
4c0a95b107
llama : add missing model types
2024-11-28 20:45:07 +02:00
Xuan Son Nguyen
6c59567689
server : (tests) don't use thread for capturing stdout/stderr, bump openai client library ( #10568 )
...
* server : (tests) don't use thread for capturing stdout/stderr
* test: bump openai to 1.55.2
* bump openai to 1.55.3
2024-11-28 19:17:49 +01:00
Johannes Gäßler
890719311b
common: fix warning message when no GPU found ( #10564 )
2024-11-28 18:15:25 +01:00
Random Fly
7281cf13ad
docs: fix outdated usage of llama-simple ( #10565 )
2024-11-28 16:03:11 +01:00
Diego Devesa
e90688edd0
ci : fix tag name in cuda and hip releases ( #10566 )
2024-11-28 15:58:54 +01:00
Georgi Gerganov
76b27d29c2
ggml : fix row condition for i8mm kernels ( #10561 )
...
ggml-ci
2024-11-28 14:56:37 +02:00
Georgi Gerganov
eea986f215
cmake : fix ARM feature detection ( #10543 )
...
ggml-ci
2024-11-28 14:56:23 +02:00
Shupei Fan
c202cef168
ggml-cpu: support IQ4_NL_4_4 by runtime repack ( #10541 )
...
* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard
2024-11-28 13:52:03 +01:00
Sergio López
2025fa67e9
kompute : improve backend to pass test_backend_ops ( #10542 )
...
* kompute: op_unary: reject unsupported parameters
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: softmax: implement ALiBi support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: rope: implement neox and phi3 support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_q4_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_f16 permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_q6_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
---------
Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-11-28 12:51:38 +01:00
Ruixin Huang
c6bc73951e
CANN: Update cann.md to display correctly in CLion ( #10538 )
2024-11-28 15:27:11 +08:00
leo-pony
605fa66c50
CANN: Fix SOC_TYPE compile bug ( #10519 )
...
* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment
* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.
* fix CANN compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version
2024-11-28 15:25:24 +08:00