Commit graph

4680 commits

Author SHA1 Message Date
Akarshan Biswas
efb5773bc2
ggml-sycl: hide matrix engine info for now from print sycl devices 2025-02-05 09:02:06 +05:30
Akarshan Biswas
0b602f0ecd
Final touches 2025-02-05 09:02:06 +05:30
Akarshan Biswas
52b0652601
conv: add space before eof 2025-02-05 09:02:05 +05:30
Akarshan Biswas
e5926374a5
Add remaining SYCL exception handler to kernel and refactor 2025-02-05 09:02:05 +05:30
Akarshan Biswas
7369e54b33
Add back ggml_sycl_set_device to kernels 2025-02-05 09:02:05 +05:30
Akarshan Biswas
0ae9a07cf8
ggml_sycl_op_argmax)Add debug logs to ggml_sycl_mul_ma0 2025-02-05 09:02:05 +05:30
Akarshan Biswas
18d706ab0e
gemm.hpp: remove unused include 2025-02-05 09:02:04 +05:30
Akarshan Biswas
539b0c662e
ggml-sycl: sort includes 2025-02-05 09:02:04 +05:30
Akarshan Biswas
6eb30d9403
Adjust EOF spaces and usused variable 2025-02-05 09:02:04 +05:30
Akarshan Biswas
a6a239cf39
norm: add a space at the end of file 2025-02-05 09:02:04 +05:30
Akarshan Biswas
6dbb7ac827
softmax: handle SYCL exceptions and add debug logs 2025-02-05 09:02:04 +05:30
Akarshan Biswas
bba4b66a81
concat: Handle SYCL exceptions 2025-02-05 09:02:03 +05:30
Akarshan Biswas
1ccfaaedbb
Add sum to backend hpp 2025-02-05 09:02:03 +05:30
Akarshan Biswas
d31c62d758
norm: add try catch sycl exception 2025-02-05 09:02:03 +05:30
Akarshan Biswas
5c05a3eedc
Move sum and sum rows to a separate file 2025-02-05 09:02:03 +05:30
Akarshan Biswas
eb466d733a
pool2d: move to a separate file 2025-02-05 09:02:03 +05:30
Akarshan Biswas
4db56d6ed2
im2col: add try catch block and move wrapper function from ggml-sycl.cpp 2025-02-05 09:02:02 +05:30
Akarshan Biswas
ba79258a2b
Add spaces to end of files 2025-02-05 09:02:02 +05:30
Akarshan Biswas
ddc5e428f2
clamp: move to a separate file 2025-02-05 09:02:02 +05:30
Akarshan Biswas
0c319bf721
DUP: move to cpy.cpp, set debug logs and adjust include 2025-02-05 09:02:02 +05:30
Akarshan Biswas
927925ffe2
scale: move to a separate file 2025-02-05 09:02:02 +05:30
Akarshan Biswas
7f2d24fdca
rope: add try catch sycl exception and debug log 2025-02-05 09:02:01 +05:30
Akarshan Biswas
8e86732cf2
diagmask: move to a separate file 2025-02-05 09:02:01 +05:30
Akarshan Biswas
98f5fd2fd1
getrows: move to a separate file 2025-02-05 09:02:01 +05:30
Akarshan Biswas
04d8b038b8
Add back split buffer type checks 2025-02-05 09:02:01 +05:30
Akarshan Biswas
7d8d689d39
eltwise: add back split buffer type checks 2025-02-05 09:02:01 +05:30
Akarshan Biswas
ecacff3f6e
CPY: move to a separate file 2025-02-05 09:02:00 +05:30
Akarshan Biswas
a16b6b7681
eltwise: sort includes 2025-02-05 09:02:00 +05:30
Akarshan Biswas
aaf9ed070d
Add spaces 2025-02-05 09:02:00 +05:30
Akarshan Biswas
3a346592b8
argsort: add a space at the end of file 2025-02-05 09:02:00 +05:30
Akarshan Biswas
51bedb847e
argmax: move missing function to file and fix function name 2025-02-05 09:02:00 +05:30
Akarshan Biswas
a153f1972d
ggml_sycl_compute_forward: fixup function calling names and remove comments 2025-02-05 09:01:59 +05:30
Akarshan Biswas
5288bd5896
Argsort: move to a separate file 2025-02-05 09:01:59 +05:30
Akarshan Biswas
95a09ab505
ARGMAX: move to a separate file 2025-02-05 09:01:59 +05:30
Akarshan Biswas
fa7c4d86f3
Fix GGML_SYCL_DEBUG in kernels in other files 2025-02-05 09:01:59 +05:30
Akarshan Biswas
e1326a7897
binbcast: add try catch sycl::exception 2025-02-05 09:01:59 +05:30
Akarshan Biswas
108be39dfe
binbcast: move to a separate file 2025-02-05 09:01:58 +05:30
Akarshan Biswas
957c11b2cf
binbcast: use void pointer to prevent intermediate type conversions 2025-02-05 09:01:58 +05:30
Akarshan Biswas
2d72bd94b0
SYCL: remove ggml_sycl_op_flatten function 2025-02-05 09:01:58 +05:30
Olivier Chafik
9f4cc8f8d3
sync: minja (#11641)
* `sync`: minja

182de30cda

https://github.com/google/minja/pull/46

https://github.com/google/minja/pull/45
2025-02-05 01:00:12 +00:00
Johannes Gäßler
fd08255d0d
CUDA: non-contiguous (RMS) norm support (#11659)
* CUDA: non-contiguous (RMS) norm support

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-04 22:21:42 +01:00
fxzjshm
3ec9fd4b77
HIP: force max threads per block to be 1024 (#11621)
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.

Signed-off-by: fxzjshm <fxzjshm@163.com>
2025-02-04 19:18:38 +01:00
Xuan-Son Nguyen
3962fc1a79
server : add try..catch to places not covered by set_exception_handler (#11620)
* server : add try..catch to places not covered by set_exception_handler

* log_server_request: rm try catch, add reminder
2025-02-04 18:25:42 +01:00
Radoslav Gerganov
1bef571f6a
arg : list RPC devices first when using --list-devices (#11655)
List devices in the same order as they appear when evaluating the model
and splitting tensors across devices, i.e. RPC devices come first in the
list.

ref #11435
2025-02-04 18:16:20 +02:00
Olivier Chafik
db288b60cb
tool-call: command r7b fix for normal responses (#11608)
* fix command r7b normal response regex + add to server test

* test multiline non-tool-call responses in test-chat
2025-02-04 15:48:53 +00:00
Shelby Jenkins
106045e7bb
readme : add llm_client Rust crate to readme bindings (#11628)
[This crate](https://github.com/ShelbyJenkins/llm_client) has been in a usable state for quite awhile, so I figured now is fair to add it.

It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible.

It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face.

So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.
2025-02-04 13:20:55 +02:00
Jhen-Jie Hong
f117d84b48
swift : fix llama-vocab api usage (#11645)
* swiftui : fix vocab api usage

* batched.swift : fix vocab api usage
2025-02-04 13:15:24 +02:00
Jhen-Jie Hong
534c46b53c
metal : use residency set for other platforms (#11648) 2025-02-04 13:07:18 +02:00
Georgi Gerganov
387a1598ca
authors : update 2025-02-04 13:04:10 +02:00
Georgi Gerganov
7c9e0ca520
sync : ggml 2025-02-04 12:59:21 +02:00