Commit graph

4061 commits

Author SHA1 Message Date
Zhiyuan Li
623db3b06f update lint 2024-11-05 13:31:41 +11:00
Zhiyuan Li
e264c35fc9 remove some codes 2024-11-05 03:03:38 +11:00
Zhiyuan Li
acb1b9d22c
Merge branch 'ggerganov:master' into master 2024-11-05 02:58:54 +11:00
Zhiyuan Li
4574795cd5 use recommended way GGML_TENSOR_LOCALS 2024-11-05 02:57:08 +11:00
Zhiyuan Li
4693b4611f rewrite to be more inline with the common pattern for distributing threads 2024-11-05 02:49:22 +11:00
Zhiyuan Li
a749ba7701 put the declaration outside the loop 2024-11-05 02:45:40 +11:00
Zhiyuan Li
6a1e977e34
Update ggml/src/ggml-sycl/concat.cpp
Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-11-05 02:41:55 +11:00
Zhiyuan Li
35a1a2dfa9 move element-wise functions outside 2024-11-05 02:40:11 +11:00
Xuan Son Nguyen
9e0ecfb697
server : clarify /slots endpoint, add is_processing (#10162)
* server : clarify /slots endpoint, add is_processing

* fix tests
2024-11-04 16:33:29 +01:00
snadampal
6a066b9978
fix build break on arm64 linux (#10166)
This fixes the build break from the recent changes
to move the CPU backend to separate files
https://github.com/ggerganov/llama.cpp/pull/10144
2024-11-04 16:08:33 +01:00
Zhiyuan Li
72e4432577 add appropriate asserts 2024-11-05 01:20:52 +11:00
Zhiyuan Li
b81602477b
Update ggml/src/ggml-cpu.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-05 01:14:27 +11:00
Zhiyuan Li
a878502f43 fix define error 2024-11-05 01:07:33 +11:00
Zhiyuan Li
81cb301224 update the function to use appropriate types 2024-11-05 00:55:59 +11:00
Zhiyuan Li
bb0685fad5
Update ggml/src/ggml-sycl/wkv6.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-05 00:42:37 +11:00
Zhiyuan Li
8c7b4ec22a
Update ggml/src/ggml-sycl/outprod.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-05 00:42:31 +11:00
Zhiyuan Li
9ea34a78cb fix: add defualt 2024-11-04 23:28:26 +11:00
Diego Devesa
ea02c753eb
cuda : clear error after changing peer access (#10153) 2024-11-04 13:10:23 +01:00
Georgi Gerganov
05697f670b
metal : simplify f16 and f32 dequant kernels (#0) 2024-11-04 13:49:34 +02:00
Georgi Gerganov
f8e58135cf
metal : move dequantize templates to beginning of MSL source (#0) 2024-11-04 13:44:06 +02:00
Zhiyuan Li
61c665b7f1 fix: update changes to upstream 2024-11-04 22:17:12 +11:00
Zhiyuan Li
5f792141c5
Merge branch 'ggerganov:master' into master 2024-11-04 22:12:31 +11:00
Georgi Gerganov
153251f761 sync : ggml 2024-11-04 22:10:53 +11:00
Yuri Khrustalev
eb5711c496 cmake : make it possible linking ggml as external lib (ggml/1003) 2024-11-04 22:10:53 +11:00
Plamen Minev
8050d021ab metal : fix minor string leaks (ggml/1004) 2024-11-04 22:10:53 +11:00
Diego Devesa
89812b157a ggml : move CPU backend to a separate file (#10144) 2024-11-04 22:10:53 +11:00
Georgi Gerganov
b18963085b metal : minor fixup in FA kernel (#10143)
* metal : minor fixup in FA kernel

ggml-ci

* metal : use the unrolled loop variable

* metal : remove unused var
2024-11-04 22:09:57 +11:00
Georgi Gerganov
4d266310f5 flake.lock: Update (#10146) 2024-11-04 22:09:57 +11:00
leo-pony
329ed914c9
CANN: adjust backend registry refactor. (#10158)
remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.
2024-11-04 19:08:22 +08:00
Georgi Gerganov
ce027adfb3
sync : ggml 2024-11-04 10:33:37 +02:00
Yuri Khrustalev
284e5b0275
cmake : make it possible linking ggml as external lib (ggml/1003) 2024-11-04 10:33:11 +02:00
Plamen Minev
e2292aaa17
metal : fix minor string leaks (ggml/1004) 2024-11-04 10:33:10 +02:00
Diego Devesa
9f40989351
ggml : move CPU backend to a separate file (#10144) 2024-11-03 19:34:08 +01:00
Georgi Gerganov
08828a6d7d
metal : minor fixup in FA kernel (#10143)
* metal : minor fixup in FA kernel

ggml-ci

* metal : use the unrolled loop variable

* metal : remove unused var
2024-11-03 15:18:40 +02:00
Georgi Gerganov
1839f69130
flake.lock: Update (#10146) 2024-11-03 05:14:15 -08:00
Zhiyuan Li
811aa872d6 wkv6: drop armv9 and tranfer to GGML style 2024-11-03 23:54:57 +11:00
Zhiyuan Li
042c3e0fd3
Merge branch 'ggerganov:master' into master 2024-11-03 17:30:25 +11:00
Zhiyuan Li
1c58096f6f sycl: Enhance OP support judgment 2024-11-03 16:43:17 +11:00
Zhiyuan Li
bee1cec7d2 sycl: add some ops 2024-11-03 16:43:17 +11:00
Zhiyuan Li
2fc42b6a82 wkv on sycl 2024-11-03 16:43:17 +11:00
Zhiyuan Li
3f75f12114 rwkv6: rename params 2024-11-03 16:43:17 +11:00
Zhiyuan Li
e198f7b9df rwkv6: update cuda file name 2024-11-03 16:43:17 +11:00
Zhiyuan Li
b4254c5550 rwkv6: support avx2 avx512 armv8 armv9 2024-11-03 16:43:17 +11:00
Zhiyuan Li
f66c75a495 rwkv6: rename to wkv6 2024-11-03 16:43:17 +11:00
Christian Köhnenkamp
9830b6923b
Add apple arm to presets (#10134)
* Add apple arm to presets

* Add final new line
2024-11-02 15:35:31 -07:00
sasha0552
42cadc74bd
server : fix slot selection by lru (#10126)
* server : fix slot selection by lru, migrate lcs to `size_t`

* minor debug log fix
2024-11-02 18:34:56 +02:00
Georgi Gerganov
45950415ed
server : fix endpoint checks (#10135)
ggml-ci
2024-11-02 18:34:00 +02:00
Georgi Gerganov
1926d6e39d
llama : adjust default context size + print warnings (#10136)
* llama : adjust default context size + print warnings

ggml-ci

* ggml-ci : add missing gpu-layers + adjust context sizes
2024-11-02 15:18:56 +02:00
Diego Devesa
b634f8a26f
simple-chat : only add bos on first prompt (#10129) 2024-11-02 13:08:53 +01:00
Xuan Son Nguyen
7554aa4655
convert-lora : make --base optional (#10110)
* convert-lora : make `--base` optional

* lint

* handle case where base_model_name_or_path is invalid

* do not include metadata from base model

* clarify unspecified --base

* add small comment [no ci]

* trigger ci
2024-11-02 12:53:17 +01:00