llama.cpp

Jeximo 8ae63953fe re-word for clarity method seems to be more correct, instead of alternative in this context	2024-05-05 18:30:14 -03:00
.devops	build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964 )	2024-04-29 17:02:45 +01:00
.github	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
ci	ggml : add Flash Attention (#5021 )	2024-04-30 12:16:08 +03:00
cmake	cmake : MSVC instruction detection (fixed up #809 ) (#3923 )	2023-11-05 10:03:09 +02:00
common	Fix Linux /sys cpu path to guess number of cores (#7064 )	2024-05-04 15:26:53 +02:00
docs	eval-callback: Example how to use eval callback for debugging (#6576 )	2024-04-11 14:51:07 +02:00
examples	gguf-split: add --no-tensor-first-split (#7072 )	2024-05-04 18:56:22 +02:00
ggml-cuda	CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019 )	2024-05-01 14:46:37 +02:00
gguf-py	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
grammars	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00
kompute@4565194ed7	Nomic Vulkan backend (#4456 )	2024-01-29 15:50:50 -05:00
kompute-shaders	Nomic Vulkan backend (#4456 )	2024-01-29 15:50:50 -05:00
media	README: add graphic for matrix multiplication (#6881 )	2024-04-24 21:29:13 +02:00
models	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
pocs	ggml : add mmla kernels for quantized GEMM (#4966 )	2024-02-11 15:22:33 +02:00
prompts	llama : add Qwen support (#4281 )	2023-12-01 20:16:31 +02:00
requirements	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
scripts	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
spm-headers	swift : package no longer use ggml dependency (#5465 )	2024-02-12 19:54:29 +02:00
tests	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
.clang-tidy	cuda : refactor into multiple files (#6269 )	2024-03-25 13:50:23 +01:00
.dockerignore	docker : ignore Git files (#3314 )	2023-10-02 11:53:53 +03:00
.ecrc	Nomic Vulkan backend (#4456 )	2024-01-29 15:50:50 -05:00
.editorconfig	llama.swiftui : add bench functionality (#4483 )	2023-12-17 19:38:41 +02:00
.flake8	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
.gitignore	Improve usability of --model-url & related flags (#6930 )	2024-04-30 00:52:50 +01:00
.gitmodules	Nomic Vulkan backend (#4456 )	2024-01-29 15:50:50 -05:00
.pre-commit-config.yaml	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
AUTHORS	license : update copyright notice + add AUTHORS (#6405 )	2024-04-09 09:23:19 +03:00
build.zig	`build`: generate hex dump of server assets during build (#6661 )	2024-04-21 18:48:53 +01:00
CMakeLists.txt	cmake : restore LLAMA_LLAMAFILE_DEFAULT	2024-04-25 21:37:27 +03:00
codecov.yml	cov : disable comment in PRs (#2989 )	2023-09-03 13:19:01 +03:00
convert-hf-to-gguf-update.py	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
convert-hf-to-gguf.py	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
convert-llama-ggml-to-gguf.py	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
convert-lora-to-ggml.py	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
convert-persimmon-to-gguf.py	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
convert.py	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
flake.lock	flake.lock: Update	2024-04-28 11:12:50 +00:00
flake.nix	nix: .#windows: proper cross-compilation set-up	2024-03-28 07:48:27 +00:00
Further tidy on Android instructions README.md	re-word for clarity	2024-05-05 18:30:14 -03:00
ggml-alloc.c	ggml : fix calloc argument ordering. (#6820 )	2024-04-22 16:05:06 +02:00
ggml-alloc.h	llama : add pipeline parallelism support (#6017 )	2024-03-13 18:54:21 +01:00
ggml-backend-impl.h	backend : offload large batches to GPU (#6083 )	2024-03-18 11:03:04 +01:00
ggml-backend.c	Reset schedule earlier to allow overlap with ggml graph computation on device (#6933 )	2024-04-26 20:08:30 +02:00
ggml-backend.h	backend : fix typo in scheduler documentation (ggml/781)	2024-04-06 17:42:26 +03:00
ggml-common.h	[SYCL] Disable iqx on windows as WA (#6435 )	2024-04-03 10:34:40 +08:00
ggml-cuda.cu	ggml : add Flash Attention (#5021 )	2024-04-30 12:16:08 +03:00
ggml-cuda.h	backend : offload large batches to GPU (#6083 )	2024-03-18 11:03:04 +01:00
ggml-impl.h	ggml : fix __MSC_VER -> _MSC_VER (#6977 )	2024-04-29 17:55:02 +03:00
ggml-kompute.cpp	ggml : add Flash Attention (#5021 )	2024-04-30 12:16:08 +03:00
ggml-kompute.h	Nomic Vulkan backend (#4456 )	2024-01-29 15:50:50 -05:00
ggml-metal.h	metal : add debug capture backend function (ggml/694)	2024-01-30 16:20:25 +02:00
ggml-metal.m	switch to using localizedDescription (#7010 )	2024-04-30 17:14:02 +02:00
ggml-metal.metal	ggml : add Flash Attention (#5021 )	2024-04-30 12:16:08 +03:00
ggml-mpi.c	ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178 )	2023-07-11 19:31:10 +03:00
ggml-mpi.h	mpi : add support for distributed inference via MPI (#2099 )	2023-07-10 18:49:56 +03:00
ggml-opencl.cpp	llama : greatly reduce output buffer memory usage (#6122 )	2024-03-26 16:46:41 +02:00
ggml-opencl.h	Add OpenCL add kernel (#5151 )	2024-01-26 23:07:32 +01:00
ggml-quants.c	add basic tensor data validation function (#6884 )	2024-04-26 18:39:58 +02:00
ggml-quants.h	llama : add Command R Plus support (#6491 )	2024-04-09 11:16:13 +03:00
ggml-sycl.cpp	ggml : add Flash Attention (#5021 )	2024-04-30 12:16:08 +03:00
ggml-sycl.h	[SYCL] offload op (#6217 )	2024-03-24 12:04:25 +08:00
ggml-vulkan-shaders.hpp	Vulkan k-quant mmq and ggml-backend offload functionality (#6155 )	2024-03-29 17:29:21 +01:00
ggml-vulkan.cpp	ggml : add Flash Attention (#5021 )	2024-04-30 12:16:08 +03:00
ggml-vulkan.h	Vulkan k-quant mmq and ggml-backend offload functionality (#6155 )	2024-03-29 17:29:21 +01:00
ggml.c	gguf-split: add --no-tensor-first-split (#7072 )	2024-05-04 18:56:22 +02:00
ggml.h	ggml : add Flash Attention (#5021 )	2024-04-30 12:16:08 +03:00
ggml_vk_generate_shaders.py	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
LICENSE	license : update copyright notice + add AUTHORS (#6405 )	2024-04-09 09:23:19 +03:00
llama.cpp	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
llama.h	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
Makefile	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
mypy.ini	convert : partially revert PR #4818 (#5041 )	2024-01-20 18:14:18 -05:00
Package.swift	ggml : add llamafile sgemm (#6414 )	2024-04-16 21:55:30 +03:00
README-sycl.md	build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964 )	2024-04-29 17:02:45 +01:00
requirements.txt	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
SECURITY.md	chore: Fix markdown warnings (#6625 )	2024-04-12 10:52:36 +02:00
sgemm.cpp	llamafile : use 64-bit integers in sgemm (#6928 )	2024-04-26 17:05:33 +03:00
sgemm.h	llamafile : use 64-bit integers in sgemm (#6928 )	2024-04-26 17:05:33 +03:00
unicode-data.cpp	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
unicode-data.h	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
unicode.cpp	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
unicode.h	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00

.devops

build(cmake): simplify instructions (cmake -B build && cmake --build build ...) (#6964 )

2024-04-29 17:02:45 +01:00

.github

convert.py : add python logging instead of print() (#6511 )

2024-05-03 22:36:41 +03:00

ci

ggml : add Flash Attention (#5021 )

2024-04-30 12:16:08 +03:00

cmake

cmake : MSVC instruction detection (fixed up #809 ) (#3923 )

2023-11-05 10:03:09 +02:00

common

Fix Linux /sys cpu path to guess number of cores (#7064 )

2024-05-04 15:26:53 +02:00

docs

eval-callback: Example how to use eval callback for debugging (#6576 )

2024-04-11 14:51:07 +02:00

examples

gguf-split: add --no-tensor-first-split (#7072 )

2024-05-04 18:56:22 +02:00

ggml-cuda

CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019 )

2024-05-01 14:46:37 +02:00

gguf-py

convert.py : add python logging instead of print() (#6511 )

2024-05-03 22:36:41 +03:00

grammars

JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )

2024-04-12 19:43:38 +01:00

kompute@4565194ed7

Nomic Vulkan backend (#4456 )

2024-01-29 15:50:50 -05:00

kompute-shaders

Nomic Vulkan backend (#4456 )

2024-01-29 15:50:50 -05:00

media

README: add graphic for matrix multiplication (#6881 )

2024-04-24 21:29:13 +02:00

models

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

pocs

ggml : add mmla kernels for quantized GEMM (#4966 )

2024-02-11 15:22:33 +02:00

prompts

llama : add Qwen support (#4281 )

2023-12-01 20:16:31 +02:00

requirements

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

scripts

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

spm-headers

swift : package no longer use ggml dependency (#5465 )

2024-02-12 19:54:29 +02:00

tests

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

.clang-tidy

cuda : refactor into multiple files (#6269 )

2024-03-25 13:50:23 +01:00

.dockerignore

docker : ignore Git files (#3314 )

2023-10-02 11:53:53 +03:00

.ecrc

Nomic Vulkan backend (#4456 )

2024-01-29 15:50:50 -05:00

.editorconfig

llama.swiftui : add bench functionality (#4483 )

2023-12-17 19:38:41 +02:00

.flake8

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

.gitignore

Improve usability of --model-url & related flags (#6930 )

2024-04-30 00:52:50 +01:00

.gitmodules

Nomic Vulkan backend (#4456 )

2024-01-29 15:50:50 -05:00

.pre-commit-config.yaml

convert.py : add python logging instead of print() (#6511 )

2024-05-03 22:36:41 +03:00

AUTHORS

license : update copyright notice + add AUTHORS (#6405 )

2024-04-09 09:23:19 +03:00

build.zig

build: generate hex dump of server assets during build (#6661 )

2024-04-21 18:48:53 +01:00

CMakeLists.txt

cmake : restore LLAMA_LLAMAFILE_DEFAULT

2024-04-25 21:37:27 +03:00

codecov.yml

cov : disable comment in PRs (#2989 )

2023-09-03 13:19:01 +03:00

convert-hf-to-gguf-update.py

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

convert-hf-to-gguf.py

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

convert-llama-ggml-to-gguf.py

convert.py : add python logging instead of print() (#6511 )

2024-05-03 22:36:41 +03:00

convert-lora-to-ggml.py

convert.py : add python logging instead of print() (#6511 )

2024-05-03 22:36:41 +03:00

convert-persimmon-to-gguf.py

convert.py : add python logging instead of print() (#6511 )

2024-05-03 22:36:41 +03:00

convert.py

convert.py : add python logging instead of print() (#6511 )

2024-05-03 22:36:41 +03:00

flake.lock

flake.lock: Update

2024-04-28 11:12:50 +00:00

flake.nix

nix: .#windows: proper cross-compilation set-up

2024-03-28 07:48:27 +00:00

Further tidy on Android instructions README.md

re-word for clarity

2024-05-05 18:30:14 -03:00

ggml-alloc.c

ggml : fix calloc argument ordering. (#6820 )

2024-04-22 16:05:06 +02:00

ggml-alloc.h

llama : add pipeline parallelism support (#6017 )

2024-03-13 18:54:21 +01:00

ggml-backend-impl.h

backend : offload large batches to GPU (#6083 )

2024-03-18 11:03:04 +01:00

ggml-backend.c

Reset schedule earlier to allow overlap with ggml graph computation on device (#6933 )

2024-04-26 20:08:30 +02:00

ggml-backend.h

backend : fix typo in scheduler documentation (ggml/781)

2024-04-06 17:42:26 +03:00

ggml-common.h

[SYCL] Disable iqx on windows as WA (#6435 )

2024-04-03 10:34:40 +08:00

ggml-cuda.cu

ggml : add Flash Attention (#5021 )

2024-04-30 12:16:08 +03:00

ggml-cuda.h

backend : offload large batches to GPU (#6083 )

2024-03-18 11:03:04 +01:00

ggml-impl.h

ggml : fix __MSC_VER -> _MSC_VER (#6977 )

2024-04-29 17:55:02 +03:00

ggml-kompute.cpp

ggml : add Flash Attention (#5021 )

2024-04-30 12:16:08 +03:00

ggml-kompute.h

Nomic Vulkan backend (#4456 )

2024-01-29 15:50:50 -05:00

ggml-metal.h

metal : add debug capture backend function (ggml/694)

2024-01-30 16:20:25 +02:00

ggml-metal.m

switch to using localizedDescription (#7010 )

2024-04-30 17:14:02 +02:00

ggml-metal.metal

ggml : add Flash Attention (#5021 )

2024-04-30 12:16:08 +03:00

ggml-mpi.c

ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178 )

2023-07-11 19:31:10 +03:00

ggml-mpi.h

mpi : add support for distributed inference via MPI (#2099 )

2023-07-10 18:49:56 +03:00

ggml-opencl.cpp

llama : greatly reduce output buffer memory usage (#6122 )

2024-03-26 16:46:41 +02:00

ggml-opencl.h

Add OpenCL add kernel (#5151 )

2024-01-26 23:07:32 +01:00

ggml-quants.c

add basic tensor data validation function (#6884 )

2024-04-26 18:39:58 +02:00

ggml-quants.h

llama : add Command R Plus support (#6491 )

2024-04-09 11:16:13 +03:00

ggml-sycl.cpp

ggml : add Flash Attention (#5021 )

2024-04-30 12:16:08 +03:00

ggml-sycl.h

[SYCL] offload op (#6217 )

2024-03-24 12:04:25 +08:00

ggml-vulkan-shaders.hpp

Vulkan k-quant mmq and ggml-backend offload functionality (#6155 )

2024-03-29 17:29:21 +01:00

ggml-vulkan.cpp

ggml : add Flash Attention (#5021 )

2024-04-30 12:16:08 +03:00

ggml-vulkan.h

Vulkan k-quant mmq and ggml-backend offload functionality (#6155 )

2024-03-29 17:29:21 +01:00

ggml.c

gguf-split: add --no-tensor-first-split (#7072 )

2024-05-04 18:56:22 +02:00

ggml.h

ggml : add Flash Attention (#5021 )

2024-04-30 12:16:08 +03:00

ggml_vk_generate_shaders.py

convert.py : add python logging instead of print() (#6511 )

2024-05-03 22:36:41 +03:00

LICENSE

license : update copyright notice + add AUTHORS (#6405 )

2024-04-09 09:23:19 +03:00

llama.cpp

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

llama.h

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

Makefile

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

mypy.ini

convert : partially revert PR #4818 (#5041 )

2024-01-20 18:14:18 -05:00

Package.swift

ggml : add llamafile sgemm (#6414 )

2024-04-16 21:55:30 +03:00

README-sycl.md

build(cmake): simplify instructions (cmake -B build && cmake --build build ...) (#6964 )

2024-04-29 17:02:45 +01:00

requirements.txt

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

SECURITY.md

chore: Fix markdown warnings (#6625 )

2024-04-12 10:52:36 +02:00

sgemm.cpp

llamafile : use 64-bit integers in sgemm (#6928 )

2024-04-26 17:05:33 +03:00

sgemm.h

llamafile : use 64-bit integers in sgemm (#6928 )

2024-04-26 17:05:33 +03:00

unicode-data.cpp

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

unicode-data.h

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

unicode.cpp

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

unicode.h

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00