llama.cpp

Author	SHA1	Message	Date
Max Krasnyansky	9697d07b21	opencl: update log message for unsupported GPUs	2024-12-13 11:14:27 -08:00
Li He	dbaa360a55	opencl: check for various requirements, allow deprecated API	2024-12-13 11:14:27 -08:00
Max Krasnyansky	b41b6e679f	opencl: fix MSVC builds (string length error)	2024-12-13 11:14:27 -08:00
Max Krasnyansky	b25a4caaf4	opencl: fail gracefully if opencl devices are not available Also for unsupported GPUs.	2024-12-13 11:14:27 -08:00
Max Krasnyansky	c971a1885d	opencl: fix compiler warnings with GCC and Clang Still getting the warning about clCreateCmdQueue being obsolete. Will fix that separately.	2024-12-13 11:14:27 -08:00
Li He	3bc085b359	opencl: use pools for `tensor_extra`	2024-12-13 11:14:27 -08:00
Li He	74a9bafcb9	opencl: remove limits on `tensor_extra`	2024-12-13 11:14:27 -08:00
Max Krasnyansky	70063c6c0c	opencl: replace some more OPENCL2 leftovers	2024-12-13 11:14:27 -08:00
Li He	c64ef0fb5c	opencl: remove copyright marker since main license already covers	2024-12-13 11:14:27 -08:00
Li He	e447dbcc01	opencl: rename backend - funcs, structs, etc `opencl2` -> `opencl`	2024-12-13 11:14:27 -08:00
Li He	22411ab58f	opencl: make OpenCL required, remove redundant lib and inc directories * `ggml-base`, `..` and `.` are added by `ggml_add_backend_library`	2024-12-13 11:14:27 -08:00
Li He	97a12703dd	opencl: rename kernel files `ggml-opencl2` -> `ggml-opencl`	2024-12-13 11:14:27 -08:00
Li He	34f2fc15ea	opencl: rename backend `opencl2` -> `opencl`	2024-12-13 11:14:27 -08:00
Li He	e9a97381f2	opencl: use `GGML_LOG_xxx` instead of `fprintf(stderr, ...)`	2024-12-13 11:14:27 -08:00
Max Krasnyansky	9a9d92b0b9	opencl: use cl_ulong for sizes and strides	2024-12-13 11:14:27 -08:00
Max Krasnyansky	c21fc8c5f9	opencl: use cl_ulong for all offsets	2024-12-13 11:14:27 -08:00
Max Krasnyansky	31f305ea01	opencl: use ulong for offsets and strides in ADD kernel	2024-12-13 11:14:27 -08:00
Max Krasnyansky	0451edd936	opencl: cleanup ggml-opencl2 header file	2024-12-13 11:14:27 -08:00
Li He	66d4330377	opencl: Clean up small-alloc in CMake files	2024-12-13 11:14:27 -08:00
Max Krasnyansky	969a00a4b9	opencl: CI workflow fixes	2024-12-13 11:14:27 -08:00
Max Krasnyansky	4bca601be6	opencl: fix embed tool invocation with python3	2024-12-13 11:14:27 -08:00
Max Krasnyansky	9b6540b6f9	opencl-ci: use RUNNER_TEMP instead of github.workspace	2024-12-13 11:14:27 -08:00
Max Krasnyansky	d24b360255	opencl: fixed merge conflict (MUSA added twice in cmake)	2024-12-13 11:14:27 -08:00
Max Krasnyansky	671c7af6b9	opencl: remove small-alloc support and fix build errors for non-opencl platforms	2024-12-13 11:14:27 -08:00
Max Krasnyansky	8ad0bb30df	opencl: integrate backend dyn.load interface and fix compiler and format warnings	2024-12-13 11:14:27 -08:00
Li He	c1af4b72b7	[cl][adreno] Fix memory leak for non SMALL_ALLOC path	2024-12-13 11:14:27 -08:00
Li	3571bb6c63	[cl][ci] Add workflow for CL	2024-12-13 11:14:27 -08:00
Li He	f56fb699bc	[cl][adreno] Add Adreno GPU support Add new OpenCL backend to support Adreno GPUs --------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>	2024-12-13 11:14:27 -08:00
Eric Curtin	c27ac678dd	Opt class for positional argument handling (#10508 ) Added support for positional arguments `model` and `prompt`. Added functionality to download via strings like: llama-run llama3 llama-run ollama://granite-code llama-run ollama://granite-code:8b llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run https://example.com/some-file1.gguf llama-run some-file2.gguf llama-run file://some-file3.gguf Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2024-12-13 19:34:25 +01:00
Corentin REGAL	11e07fd63b	fix: graceful shutdown for Docker images (#10815 )	2024-12-13 18:23:50 +01:00
Jett Janiak	4601a8bb67	gguf-py : numpy 2 newbyteorder fix (#9772 )	2024-12-13 16:48:44 +02:00
谢乃闻	9f35e44592	Fix crash caused by ggml_backend_load_all when launching on Android Activity (#10812 ) * Fix crash caused by ggml_backend_load_all when launching on AndroidActivity. Details: Calling ggml_backend_load_all during initialization in the AndroidActivity project leads to a crash with the error: terminating with uncaught exception of type std::__ndk1::__fs::filesystem::filesystem_error: filesystem error: in directory_iterator::directory_iterator(...): Permission denied [./]. This issue occurs because AndroidActivity restricts file access due to sandboxing. Reproduction: In the example folder, the LlamaAndroid project can reproduce the crash by calling ggml_backend_load_all first in Java_android_llama_cpp_LLamaAndroid_backend_1init. * Update ggml/src/ggml-backend-reg.cpp --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-12-13 13:56:07 +01:00
Eve	64ae065511	vulkan: small mul_mat_vec optimizations (#10665 ) * double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs	2024-12-13 09:42:04 +01:00
Akarshan Biswas	83ed24a97b	SYCL: Reduce most of the compiler warnings (#10748 ) * Try to reduce some unused and typecast warnings * Reduce compiler warnings step 2 * add a newline at the end of the file * Initialize nreduce as size_t * [SYCL] Remove pragma directives from mmq.cpp * SYCL: mmq add condition to prevent blocks_per_tile_x_row variable from becoming 0 * SYCL softmax: Initialize nreduce as size_t * ggml-sycl.cpp: fix some trailing whitespaces * SYCL: remove the unused variables instead of commenting it out * SYCL poo2d kernel: set NAN for invalid pooling op * SYCL gemm.hpp: remove pragma directives * SYCL gemm.hpp: use const cast to properly support dnnl::memory * SYCL: wkv6 remove a comment * SYCL: clean comments step 2 * SYCL: clean comments and variables step 3 * SYCL: Use GGML_UNUSED for unused variables * SYCL: remove extra empty lines and a comment * Remove TODO * cleanup spaces * add a stdout for unsupported op * use sycl printf over fprintf * remove prints for CI * SYCL ggml-sycl: pool2D use sycl::nan and remove if-else block --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-12-13 12:12:15 +05:30
Karol Kontny	d583cd03f6	ggml : Fix compilation issues on ARM platform when building without fp16 (#10811 )	2024-12-13 01:04:19 +01:00
Xuan Son Nguyen	adffa6ffd5	common : improve -ctv -ctk CLI arguments (#10806 ) * common : improve ctv ctk cli argument * regenerate docs * even better approach * use std::vector	2024-12-12 22:53:05 +01:00
Xuan Son Nguyen	274ec65af6	contrib : add ngxson as codeowner (#10804 )	2024-12-12 20:52:28 +01:00
a3sh	8faa1d4dd4	CUDA: faster non-contiguous concat (#10760 ) * faster uncontiguous concat * Use a lambda to avoid code duplication Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update ggml/src/ggml-cuda/concat.cu * add constexpr and static assert --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-12-12 19:09:50 +01:00
Diego Devesa	cb13ef85a4	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 ) other windows build fixes	2024-12-12 19:02:49 +01:00
0cc4m	4064c0e3b6	Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798 )	2024-12-12 18:36:00 +01:00
0cc4m	dc5301d565	Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats (#10721 ) * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * Fix subgroup size control extension support check Add accf32 and accf16 checks for coopmats * Also disable coopmats on amdvlk	2024-12-12 18:35:37 +01:00
Xuan Son Nguyen	9fdb124304	common : add missing env var for speculative (#10801 )	2024-12-12 16:57:32 +01:00
CentricStorm	5555c0c1f6	docs: update server streaming mode documentation (#9519 ) Provide more documentation for streaming mode.	2024-12-11 23:40:40 +01:00
Georgi Gerganov	973f328b1e	Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0	2024-12-11 23:14:46 +02:00
Georgi Gerganov	fb18934a97	gguf-py : bump version to 0.11.0	2024-12-11 23:13:31 +02:00
Xuan Son Nguyen	235f6e14bf	server : (UI) add tok/s, get rid of completion.js (#10786 ) * get rid of completion.js * extract chat bubble to a component * add tok/s info * sync * fix BASE_URL * only extract timings when it's enabled * fix auto scroll	2024-12-11 20:52:14 +01:00
qingy1337	1a31d0dc00	Update README.md (#10772 )	2024-12-11 16:16:32 +01:00
Xuan Son Nguyen	92f77a640f	ci : pin nodejs to 22.11.0 (#10779 )	2024-12-11 14:59:41 +01:00
kallewoof	484d2f31ae	bug-fix: snprintf prints NULL in place of the last character (#10419 ) * bug-fix: snprintf prints NULL in place of the last character We need to give snprintf enough space to print the last character and the null character, thus we allocate one extra byte and then ignore it when converting to std::string. * add comment about extra null-term byte requirement	2024-12-11 14:48:04 +01:00
CentricStorm	4b4d92b098	docs: fix server documentation formatting (#10776 )	2024-12-11 11:47:43 +01:00

1 2 3 4 5 ...

4352 commits