Akarshan Biswas
1ccfaaedbb
Add sum to backend hpp
2025-02-05 09:02:03 +05:30
Akarshan Biswas
d31c62d758
norm: add try catch sycl exception
2025-02-05 09:02:03 +05:30
Akarshan Biswas
5c05a3eedc
Move sum and sum rows to a separate file
2025-02-05 09:02:03 +05:30
Akarshan Biswas
eb466d733a
pool2d: move to a separate file
2025-02-05 09:02:03 +05:30
Akarshan Biswas
4db56d6ed2
im2col: add try catch block and move wrapper function from ggml-sycl.cpp
2025-02-05 09:02:02 +05:30
Akarshan Biswas
ba79258a2b
Add spaces to end of files
2025-02-05 09:02:02 +05:30
Akarshan Biswas
ddc5e428f2
clamp: move to a separate file
2025-02-05 09:02:02 +05:30
Akarshan Biswas
0c319bf721
DUP: move to cpy.cpp, set debug logs and adjust include
2025-02-05 09:02:02 +05:30
Akarshan Biswas
927925ffe2
scale: move to a separate file
2025-02-05 09:02:02 +05:30
Akarshan Biswas
7f2d24fdca
rope: add try catch sycl exception and debug log
2025-02-05 09:02:01 +05:30
Akarshan Biswas
8e86732cf2
diagmask: move to a separate file
2025-02-05 09:02:01 +05:30
Akarshan Biswas
98f5fd2fd1
getrows: move to a separate file
2025-02-05 09:02:01 +05:30
Akarshan Biswas
04d8b038b8
Add back split buffer type checks
2025-02-05 09:02:01 +05:30
Akarshan Biswas
7d8d689d39
eltwise: add back split buffer type checks
2025-02-05 09:02:01 +05:30
Akarshan Biswas
ecacff3f6e
CPY: move to a separate file
2025-02-05 09:02:00 +05:30
Akarshan Biswas
a16b6b7681
eltwise: sort includes
2025-02-05 09:02:00 +05:30
Akarshan Biswas
aaf9ed070d
Add spaces
2025-02-05 09:02:00 +05:30
Akarshan Biswas
3a346592b8
argsort: add a space at the end of file
2025-02-05 09:02:00 +05:30
Akarshan Biswas
51bedb847e
argmax: move missing function to file and fix function name
2025-02-05 09:02:00 +05:30
Akarshan Biswas
a153f1972d
ggml_sycl_compute_forward: fixup function calling names and remove comments
2025-02-05 09:01:59 +05:30
Akarshan Biswas
5288bd5896
Argsort: move to a separate file
2025-02-05 09:01:59 +05:30
Akarshan Biswas
95a09ab505
ARGMAX: move to a separate file
2025-02-05 09:01:59 +05:30
Akarshan Biswas
fa7c4d86f3
Fix GGML_SYCL_DEBUG in kernels in other files
2025-02-05 09:01:59 +05:30
Akarshan Biswas
e1326a7897
binbcast: add try catch sycl::exception
2025-02-05 09:01:59 +05:30
Akarshan Biswas
108be39dfe
binbcast: move to a separate file
2025-02-05 09:01:58 +05:30
Akarshan Biswas
957c11b2cf
binbcast: use void pointer to prevent intermediate type conversions
2025-02-05 09:01:58 +05:30
Akarshan Biswas
2d72bd94b0
SYCL: remove ggml_sycl_op_flatten function
2025-02-05 09:01:58 +05:30
Olivier Chafik
9f4cc8f8d3
sync
: minja (#11641 )
...
* `sync`: minja
182de30cda
https://github.com/google/minja/pull/46
https://github.com/google/minja/pull/45
2025-02-05 01:00:12 +00:00
Johannes Gäßler
fd08255d0d
CUDA: non-contiguous (RMS) norm support ( #11659 )
...
* CUDA: non-contiguous (RMS) norm support
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-04 22:21:42 +01:00
fxzjshm
3ec9fd4b77
HIP: force max threads per block to be 1024 ( #11621 )
...
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.
Signed-off-by: fxzjshm <fxzjshm@163.com>
2025-02-04 19:18:38 +01:00
Xuan-Son Nguyen
3962fc1a79
server : add try..catch to places not covered by set_exception_handler ( #11620 )
...
* server : add try..catch to places not covered by set_exception_handler
* log_server_request: rm try catch, add reminder
2025-02-04 18:25:42 +01:00
Radoslav Gerganov
1bef571f6a
arg : list RPC devices first when using --list-devices ( #11655 )
...
List devices in the same order as they appear when evaluating the model
and splitting tensors across devices, i.e. RPC devices come first in the
list.
ref #11435
2025-02-04 18:16:20 +02:00
Olivier Chafik
db288b60cb
tool-call
: command r7b fix for normal responses (#11608 )
...
* fix command r7b normal response regex + add to server test
* test multiline non-tool-call responses in test-chat
2025-02-04 15:48:53 +00:00
Shelby Jenkins
106045e7bb
readme : add llm_client Rust crate to readme bindings ( #11628 )
...
[This crate](https://github.com/ShelbyJenkins/llm_client ) has been in a usable state for quite awhile, so I figured now is fair to add it.
It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible.
It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face.
So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.
2025-02-04 13:20:55 +02:00
Jhen-Jie Hong
f117d84b48
swift : fix llama-vocab api usage ( #11645 )
...
* swiftui : fix vocab api usage
* batched.swift : fix vocab api usage
2025-02-04 13:15:24 +02:00
Jhen-Jie Hong
534c46b53c
metal : use residency set for other platforms ( #11648 )
2025-02-04 13:07:18 +02:00
Georgi Gerganov
387a1598ca
authors : update
2025-02-04 13:04:10 +02:00
Georgi Gerganov
7c9e0ca520
sync : ggml
2025-02-04 12:59:21 +02:00
Christian Kastner
8f8290ada9
cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
...
This makes git as a dependency optional, and is useful in the case where
ggml is built not from git, but from a tarball, or a distribution source
package.
This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be
using it, though, so there doesn't seem much value factor it out, or
even require it.
2025-02-04 12:59:15 +02:00
Georgi Gerganov
b34aedd558
ci : do not stale-close roadmap issues
2025-02-04 09:31:01 +02:00
Olivier Chafik
cde3833239
tool-call
: allow --chat-template chatml
w/ --jinja
, default to chatml upon parsing issue, avoid double bos (#11616 )
...
* tool-call: allow `--jinja --chat-template chatml`
* fix double bos issue (drop bos/eos tokens from jinja template)
* add missing try catch around jinja parsing to default to chatml
* Simplify default chatml logic
2025-02-03 23:49:27 +00:00
Xuan-Son Nguyen
b3451785ac
server : (webui) revert hacky solution from #11626 ( #11634 )
2025-02-04 00:10:52 +01:00
Woof Dog
1d1e6a90bc
server : (webui) allow typing and submitting during llm response ( #11626 )
2025-02-03 23:16:27 +01:00
Daniel Bevenius
5598f475be
server : remove CPPHTTPLIB_NO_EXCEPTIONS define ( #11622 )
...
This commit removes the CPPHTTPLIB_NO_EXCEPTIONS define from the server
code.
The motivation for this is that when using a debug build the server
would crash when an exception was throws and terminate the server
process, as it was unhandled. When CPPHTTPLIB_NO_EXCEPTIONS is set
cpp_httplib will not call the exception handler, which would normally
return a 500 error to the client. This caused tests to fail when using
a debug build.
Fixes: https://github.com/ggerganov/llama.cpp/issues/11613
2025-02-03 16:45:38 +01:00
Georgi Gerganov
8ec05832fa
sync : ggml
2025-02-03 14:57:08 +02:00
Johannes Gäßler
21c84b5d2d
CUDA: fix Volta FlashAttention logic ( #11615 )
2025-02-03 14:25:56 +02:00
mashdragon
d92cb67e37
server : (webui) Fix Shift+Enter handling ( #11609 )
...
* Fix Shift+Enter handling
`exact` on the Enter handler means the message is not sent when Shift+Enter is pressed anyway
* build index.html.gz
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-02-03 10:42:55 +01:00
Johannes Gäßler
6eecde3cc8
HIP: fix flash_attn_stream_k_fixup warning ( #11604 )
2025-02-02 23:48:29 +01:00
uvos
396856b400
CUDA/HIP: add support for selectable warp size to mmv ( #11519 )
...
CUDA/HIP: add support for selectable warp size to mmv
2025-02-02 22:40:09 +01:00
uvos
4d0598e144
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other ( #11601 )
...
This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly
2025-02-02 22:08:05 +01:00