Commit graph

4068 commits

Author SHA1 Message Date
Eve
a847973656 16 bit add for q4_0 only 2024-11-12 18:47:23 -05:00
Eve
13dfe631cb Merge https://github.com/ggerganov/llama.cpp into avx_opt 2024-11-07 20:37:46 -05:00
Eve
54e6c887ac Merge branch 'avx_opt' of https://github.com/netrunnereve/llama.cpp into avx_opt 2024-11-07 20:37:28 -05:00
Xuan Son Nguyen
76c6e7f105
server : minor UI fix (#10207) 2024-11-07 18:44:38 -04:00
Xuan Son Nguyen
a71d81cf8c
server : revamp chat UI with vuejs and daisyui (#10175)
* server : simple chat UI with vuejs and daisyui

* move old files to legacy folder

* embed deps into binary

* basic markdown support

* add conversation history, save to localStorage

* fix bg-base classes

* save theme preferences

* fix tests

* regenerate, edit, copy buttons

* small fixes

* docs: how to use legacy ui

* better error handling

* make CORS preflight more explicit

* add GET method for CORS

* fix tests

* clean up a bit

* better auto scroll

* small fixes

* use collapse-arrow

* fix closeAndSaveConfigDialog

* small fix

* remove console.log

* fix style for <pre> element

* lighter bubble color (less distract when reading)
2024-11-07 17:31:10 -04:00
Georgi Gerganov
eec4d71737
scripts : add amx to sync-ggml.sh [no ci] 2024-11-07 23:11:36 +02:00
Georgi Gerganov
3b08828674
sync : ggml 2024-11-07 23:08:24 +02:00
Georgi Gerganov
a2c6fd747c
scripts : sync update 2024-11-07 23:07:55 +02:00
Diego Devesa
97404c4a03
ggml : add ggml-cpu.h to the public headers (#10204) 2024-11-07 18:16:08 +01:00
Faisal Zaghloul
60e17ce23c
Remove identical wte/etw logic for jais (#10203) 2024-11-07 08:46:12 -08:00
wwoodsTM
5107e8cea3
DRY: Fixes clone functionality (#10192) 2024-11-07 16:20:25 +01:00
snadampal
2319126a70
fix q4_0_8_8 format for corrupted tokens issue (#10198)
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-62-167.us-west-2.compute.internal>
2024-11-07 09:02:08 +01:00
Zhiyuan Li
3bcd40b3c5
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133)
* rwkv6: rename to wkv6

* rwkv6: support avx2 avx512 armv8 armv9

* rwkv6: update cuda file name

* rwkv6: rename params

* wkv on sycl

* sycl: add some ops

* sycl: Enhance OP support judgment

* wkv6: drop armv9 and tranfer to GGML style

ggml-ci

* sync : ggml

* update the function to use appropriate types

* fix define error

* Update ggml/src/ggml-cpu.c

* add appropriate asserts

* move element-wise functions outside

* put the declaration outside the loop

* rewrite to be more inline with the common pattern for distributing threads

* use recommended way GGML_TENSOR_LOCALS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-11-07 15:19:10 +08:00
Georgi Gerganov
5c333e0140
metal : add BF16 support (#8439)
* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support
2024-11-06 19:53:51 +02:00
Georgi Gerganov
b11f9ba9b8
server : remove hack for extra parallel slot (#10187)
ggml-ci
2024-11-06 13:29:01 +02:00
Diego Devesa
94d8cb8be1
metal : fix from ptr buffer name (#10189) 2024-11-06 12:10:07 +01:00
Georgi Gerganov
1dc04b2dee
ggml : adjust is_first_call init value (#10193)
ggml-ci
2024-11-06 11:20:10 +02:00
Georgi Gerganov
a1eaf6a960
metal : add quantized FA support (#10149)
* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]
2024-11-06 10:24:23 +02:00
Eve
8c29230848
Merge branch 'ggerganov:master' into avx_opt 2024-11-05 17:24:14 +00:00
Gabe Goodhart
b8deef0ec0
llama : add <|tool_call|> formatting to Granite template (#10177)
Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2024-11-05 14:23:04 +02:00
Eve
ec6366f7d0 Merge https://github.com/ggerganov/llama.cpp into avx_opt 2024-11-04 17:17:22 -05:00
Diego Devesa
a9e8a9a030
ggml : fix arch check in bf16_to_fp32 (#10164) 2024-11-04 23:17:01 +01:00
Eve
3407364776
Q6_K AVX improvements (#10118)
* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86
2024-11-04 23:06:31 +01:00
Eve
b0e9b96e5d rebase to master 2024-11-04 14:25:06 -05:00
Diego Devesa
d5a409e57f
ggml : fix gelu tables initialization (#10172) 2024-11-04 20:06:58 +01:00
Diego Devesa
401558b7ba
ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167) 2024-11-04 17:34:08 +01:00
Xuan Son Nguyen
9e0ecfb697
server : clarify /slots endpoint, add is_processing (#10162)
* server : clarify /slots endpoint, add is_processing

* fix tests
2024-11-04 16:33:29 +01:00
snadampal
6a066b9978
fix build break on arm64 linux (#10166)
This fixes the build break from the recent changes
to move the CPU backend to separate files
https://github.com/ggerganov/llama.cpp/pull/10144
2024-11-04 16:08:33 +01:00
Diego Devesa
ea02c753eb
cuda : clear error after changing peer access (#10153) 2024-11-04 13:10:23 +01:00
Georgi Gerganov
05697f670b
metal : simplify f16 and f32 dequant kernels (#0) 2024-11-04 13:49:34 +02:00
Georgi Gerganov
f8e58135cf
metal : move dequantize templates to beginning of MSL source (#0) 2024-11-04 13:44:06 +02:00
leo-pony
329ed914c9
CANN: adjust backend registry refactor. (#10158)
remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.
2024-11-04 19:08:22 +08:00
Georgi Gerganov
ce027adfb3
sync : ggml 2024-11-04 10:33:37 +02:00
Yuri Khrustalev
284e5b0275
cmake : make it possible linking ggml as external lib (ggml/1003) 2024-11-04 10:33:11 +02:00
Plamen Minev
e2292aaa17
metal : fix minor string leaks (ggml/1004) 2024-11-04 10:33:10 +02:00
Diego Devesa
9f40989351
ggml : move CPU backend to a separate file (#10144) 2024-11-03 19:34:08 +01:00
Georgi Gerganov
08828a6d7d
metal : minor fixup in FA kernel (#10143)
* metal : minor fixup in FA kernel

ggml-ci

* metal : use the unrolled loop variable

* metal : remove unused var
2024-11-03 15:18:40 +02:00
Georgi Gerganov
1839f69130
flake.lock: Update (#10146) 2024-11-03 05:14:15 -08:00
Eve
a83ac00565
Merge branch 'ggerganov:master' into avx_opt 2024-11-03 01:53:49 +00:00
Eve
6a4c080824 fix potential overflow (performance reduced) 2024-11-02 21:27:35 -04:00
Eve
6667edeaec Q8_0 and IQ4_NL, 5-7% faster 2024-11-02 20:44:40 -04:00
Christian Köhnenkamp
9830b6923b
Add apple arm to presets (#10134)
* Add apple arm to presets

* Add final new line
2024-11-02 15:35:31 -07:00
Eve
b8d592fe2c split to functions 2024-11-02 18:08:13 -04:00
Eve
7de0bdc2db faster with madd 2024-11-02 17:12:23 -04:00
Eve
629befc729 revert f16 2024-11-02 16:58:40 -04:00
Eve
1335c78639 256b version, also slow. i tried :) 2024-11-02 16:36:56 -04:00
Eve
f8dd133ce4 slower f16c version, kep for reference 2024-11-02 16:30:03 -04:00
sasha0552
42cadc74bd
server : fix slot selection by lru (#10126)
* server : fix slot selection by lru, migrate lcs to `size_t`

* minor debug log fix
2024-11-02 18:34:56 +02:00
Georgi Gerganov
45950415ed
server : fix endpoint checks (#10135)
ggml-ci
2024-11-02 18:34:00 +02:00
Eve
fffe7e6204 +7% tg +5% pp compared to master 2024-11-02 11:04:58 -04:00