Commit graph

4352 commits

Author SHA1 Message Date
Max Krasnyansky
9697d07b21 opencl: update log message for unsupported GPUs 2024-12-13 11:14:27 -08:00
Li He
dbaa360a55 opencl: check for various requirements, allow deprecated API 2024-12-13 11:14:27 -08:00
Max Krasnyansky
b41b6e679f opencl: fix MSVC builds (string length error) 2024-12-13 11:14:27 -08:00
Max Krasnyansky
b25a4caaf4 opencl: fail gracefully if opencl devices are not available
Also for unsupported GPUs.
2024-12-13 11:14:27 -08:00
Max Krasnyansky
c971a1885d opencl: fix compiler warnings with GCC and Clang
Still getting the warning about clCreateCmdQueue being obsolete.
Will fix that separately.
2024-12-13 11:14:27 -08:00
Li He
3bc085b359 opencl: use pools for tensor_extra 2024-12-13 11:14:27 -08:00
Li He
74a9bafcb9 opencl: remove limits on tensor_extra 2024-12-13 11:14:27 -08:00
Max Krasnyansky
70063c6c0c opencl: replace some more OPENCL2 leftovers 2024-12-13 11:14:27 -08:00
Li He
c64ef0fb5c opencl: remove copyright marker since main license already covers 2024-12-13 11:14:27 -08:00
Li He
e447dbcc01 opencl: rename backend - funcs, structs, etc opencl2 -> opencl 2024-12-13 11:14:27 -08:00
Li He
22411ab58f opencl: make OpenCL required, remove redundant lib and inc directories
* `ggml-base`, `..` and `.` are added by `ggml_add_backend_library`
2024-12-13 11:14:27 -08:00
Li He
97a12703dd opencl: rename kernel files ggml-opencl2 -> ggml-opencl 2024-12-13 11:14:27 -08:00
Li He
34f2fc15ea opencl: rename backend opencl2 -> opencl 2024-12-13 11:14:27 -08:00
Li He
e9a97381f2 opencl: use GGML_LOG_xxx instead of fprintf(stderr, ...) 2024-12-13 11:14:27 -08:00
Max Krasnyansky
9a9d92b0b9 opencl: use cl_ulong for sizes and strides 2024-12-13 11:14:27 -08:00
Max Krasnyansky
c21fc8c5f9 opencl: use cl_ulong for all offsets 2024-12-13 11:14:27 -08:00
Max Krasnyansky
31f305ea01 opencl: use ulong for offsets and strides in ADD kernel 2024-12-13 11:14:27 -08:00
Max Krasnyansky
0451edd936 opencl: cleanup ggml-opencl2 header file 2024-12-13 11:14:27 -08:00
Li He
66d4330377 opencl: Clean up small-alloc in CMake files 2024-12-13 11:14:27 -08:00
Max Krasnyansky
969a00a4b9 opencl: CI workflow fixes 2024-12-13 11:14:27 -08:00
Max Krasnyansky
4bca601be6 opencl: fix embed tool invocation with python3 2024-12-13 11:14:27 -08:00
Max Krasnyansky
9b6540b6f9 opencl-ci: use RUNNER_TEMP instead of github.workspace 2024-12-13 11:14:27 -08:00
Max Krasnyansky
d24b360255 opencl: fixed merge conflict (MUSA added twice in cmake) 2024-12-13 11:14:27 -08:00
Max Krasnyansky
671c7af6b9 opencl: remove small-alloc support and fix build errors for non-opencl platforms 2024-12-13 11:14:27 -08:00
Max Krasnyansky
8ad0bb30df opencl: integrate backend dyn.load interface and fix compiler and format warnings 2024-12-13 11:14:27 -08:00
Li He
c1af4b72b7 [cl][adreno] Fix memory leak for non SMALL_ALLOC path 2024-12-13 11:14:27 -08:00
Li
3571bb6c63 [cl][ci] Add workflow for CL 2024-12-13 11:14:27 -08:00
Li He
f56fb699bc [cl][adreno] Add Adreno GPU support
Add new OpenCL backend to support Adreno GPUs

---------

Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
2024-12-13 11:14:27 -08:00
Eric Curtin
c27ac678dd
Opt class for positional argument handling (#10508)
Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:

  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-13 19:34:25 +01:00
Corentin REGAL
11e07fd63b
fix: graceful shutdown for Docker images (#10815) 2024-12-13 18:23:50 +01:00
Jett Janiak
4601a8bb67
gguf-py : numpy 2 newbyteorder fix (#9772) 2024-12-13 16:48:44 +02:00
谢乃闻
9f35e44592
Fix crash caused by ggml_backend_load_all when launching on Android Activity (#10812)
* Fix crash caused by ggml_backend_load_all when launching on AndroidActivity.

Details:
Calling ggml_backend_load_all during initialization in the AndroidActivity project leads to a crash with the error:
terminating with uncaught exception of type std::__ndk1::__fs::filesystem::filesystem_error: filesystem error: in directory_iterator::directory_iterator(...): Permission denied [./].
This issue occurs because AndroidActivity restricts file access due to sandboxing.

Reproduction:
In the example folder, the LlamaAndroid project can reproduce the crash by calling ggml_backend_load_all first in Java_android_llama_cpp_LLamaAndroid_backend_1init.

* Update ggml/src/ggml-backend-reg.cpp

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-13 13:56:07 +01:00
Eve
64ae065511
vulkan: small mul_mat_vec optimizations (#10665)
* double the number of rows per workgroup

* Update ggml-vulkan.cpp

* Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats

* only increase the number of rows for amd and subgroup size 64

* fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested

* use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721)

* manual merge ggml-vulkan.cpp

* set min and max subgroup size in any case

* Also double the number of rows for Intel GPUs
2024-12-13 09:42:04 +01:00
Akarshan Biswas
83ed24a97b
SYCL: Reduce most of the compiler warnings (#10748)
* Try to reduce some unused and typecast warnings

* Reduce compiler warnings step 2

* add a newline at the end of the file

* Initialize nreduce as size_t

* [SYCL] Remove pragma directives from mmq.cpp

* SYCL: mmq add condition to prevent blocks_per_tile_x_row variable from becoming 0

* SYCL softmax: Initialize nreduce as size_t

* ggml-sycl.cpp: fix some trailing whitespaces

* SYCL: remove the unused variables instead of commenting it out

* SYCL poo2d kernel: set NAN for invalid pooling op

* SYCL gemm.hpp: remove pragma directives

* SYCL gemm.hpp: use const cast to properly support dnnl::memory

* SYCL: wkv6 remove a comment

* SYCL: clean comments step 2

* SYCL: clean comments and variables step 3

* SYCL: Use GGML_UNUSED for unused variables

* SYCL: remove extra empty lines and a comment

* Remove TODO

* cleanup spaces

* add a stdout for unsupported op

* use sycl printf over fprintf

* remove prints for CI

* SYCL ggml-sycl: pool2D use sycl::nan and remove if-else block

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-12-13 12:12:15 +05:30
Karol Kontny
d583cd03f6
ggml : Fix compilation issues on ARM platform when building without fp16 (#10811) 2024-12-13 01:04:19 +01:00
Xuan Son Nguyen
adffa6ffd5
common : improve -ctv -ctk CLI arguments (#10806)
* common : improve ctv ctk cli argument

* regenerate docs

* even better approach

* use std::vector
2024-12-12 22:53:05 +01:00
Xuan Son Nguyen
274ec65af6
contrib : add ngxson as codeowner (#10804) 2024-12-12 20:52:28 +01:00
a3sh
8faa1d4dd4
CUDA: faster non-contiguous concat (#10760)
* faster uncontiguous concat

* Use a lambda to avoid code duplication

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Update ggml/src/ggml-cuda/concat.cu

* add constexpr  and static assert

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-12 19:09:50 +01:00
Diego Devesa
cb13ef85a4
remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797)
other windows build fixes
2024-12-12 19:02:49 +01:00
0cc4m
4064c0e3b6
Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798) 2024-12-12 18:36:00 +01:00
0cc4m
dc5301d565
Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats (#10721)
* Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats

* Fix subgroup size control extension support check

Add accf32 and accf16 checks for coopmats

* Also disable coopmats on amdvlk
2024-12-12 18:35:37 +01:00
Xuan Son Nguyen
9fdb124304
common : add missing env var for speculative (#10801) 2024-12-12 16:57:32 +01:00
CentricStorm
5555c0c1f6
docs: update server streaming mode documentation (#9519)
Provide more documentation for streaming mode.
2024-12-11 23:40:40 +01:00
Georgi Gerganov
973f328b1e
Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0 2024-12-11 23:14:46 +02:00
Georgi Gerganov
fb18934a97
gguf-py : bump version to 0.11.0 2024-12-11 23:13:31 +02:00
Xuan Son Nguyen
235f6e14bf
server : (UI) add tok/s, get rid of completion.js (#10786)
* get rid of completion.js

* extract chat bubble to a component

* add tok/s info

* sync

* fix BASE_URL

* only extract timings when it's enabled

* fix auto scroll
2024-12-11 20:52:14 +01:00
qingy1337
1a31d0dc00
Update README.md (#10772) 2024-12-11 16:16:32 +01:00
Xuan Son Nguyen
92f77a640f
ci : pin nodejs to 22.11.0 (#10779) 2024-12-11 14:59:41 +01:00
kallewoof
484d2f31ae
bug-fix: snprintf prints NULL in place of the last character (#10419)
* bug-fix: snprintf prints NULL in place of the last character

We need to give snprintf enough space to print the last character and the null character, thus we allocate one extra byte and then ignore it when converting to std::string.

* add comment about extra null-term byte requirement
2024-12-11 14:48:04 +01:00
CentricStorm
4b4d92b098
docs: fix server documentation formatting (#10776) 2024-12-11 11:47:43 +01:00