.devops
|
build(cmake): simplify instructions (cmake -B build && cmake --build build ... ) (#6964)
|
2024-04-29 17:02:45 +01:00 |
.github
|
convert.py : add python logging instead of print() (#6511)
|
2024-05-03 22:36:41 +03:00 |
ci
|
ggml : add Flash Attention (#5021)
|
2024-04-30 12:16:08 +03:00 |
cmake
|
cmake : MSVC instruction detection (fixed up #809) (#3923)
|
2023-11-05 10:03:09 +02:00 |
common
|
Fix Linux /sys cpu path to guess number of cores (#7064)
|
2024-05-04 15:26:53 +02:00 |
docs
|
eval-callback: Example how to use eval callback for debugging (#6576)
|
2024-04-11 14:51:07 +02:00 |
examples
|
gguf-split: add --no-tensor-first-split (#7072)
|
2024-05-04 18:56:22 +02:00 |
ggml-cuda
|
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)
|
2024-05-01 14:46:37 +02:00 |
gguf-py
|
convert.py : add python logging instead of print() (#6511)
|
2024-05-03 22:36:41 +03:00 |
grammars
|
JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555)
|
2024-04-12 19:43:38 +01:00 |
kompute@4565194ed7
|
Nomic Vulkan backend (#4456)
|
2024-01-29 15:50:50 -05:00 |
kompute-shaders
|
Nomic Vulkan backend (#4456)
|
2024-01-29 15:50:50 -05:00 |
media
|
README: add graphic for matrix multiplication (#6881)
|
2024-04-24 21:29:13 +02:00 |
models
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
pocs
|
ggml : add mmla kernels for quantized GEMM (#4966)
|
2024-02-11 15:22:33 +02:00 |
prompts
|
llama : add Qwen support (#4281)
|
2023-12-01 20:16:31 +02:00 |
requirements
|
llama : fix BPE pre-tokenization (#6920)
|
2024-04-29 16:58:41 +03:00 |
scripts
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
spm-headers
|
swift : package no longer use ggml dependency (#5465)
|
2024-02-12 19:54:29 +02:00 |
tests
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
.clang-tidy
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
.dockerignore
|
docker : ignore Git files (#3314)
|
2023-10-02 11:53:53 +03:00 |
.ecrc
|
Nomic Vulkan backend (#4456)
|
2024-01-29 15:50:50 -05:00 |
.editorconfig
|
llama.swiftui : add bench functionality (#4483)
|
2023-12-17 19:38:41 +02:00 |
.flake8
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
.gitignore
|
Improve usability of --model-url & related flags (#6930)
|
2024-04-30 00:52:50 +01:00 |
.gitmodules
|
Nomic Vulkan backend (#4456)
|
2024-01-29 15:50:50 -05:00 |
.pre-commit-config.yaml
|
convert.py : add python logging instead of print() (#6511)
|
2024-05-03 22:36:41 +03:00 |
AUTHORS
|
license : update copyright notice + add AUTHORS (#6405)
|
2024-04-09 09:23:19 +03:00 |
build.zig
|
build : generate hex dump of server assets during build (#6661)
|
2024-04-21 18:48:53 +01:00 |
CMakeLists.txt
|
cmake : restore LLAMA_LLAMAFILE_DEFAULT
|
2024-04-25 21:37:27 +03:00 |
codecov.yml
|
cov : disable comment in PRs (#2989)
|
2023-09-03 13:19:01 +03:00 |
convert-hf-to-gguf-update.py
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
convert-hf-to-gguf.py
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
convert-llama-ggml-to-gguf.py
|
convert.py : add python logging instead of print() (#6511)
|
2024-05-03 22:36:41 +03:00 |
convert-lora-to-ggml.py
|
convert.py : add python logging instead of print() (#6511)
|
2024-05-03 22:36:41 +03:00 |
convert-persimmon-to-gguf.py
|
convert.py : add python logging instead of print() (#6511)
|
2024-05-03 22:36:41 +03:00 |
convert.py
|
convert.py : add python logging instead of print() (#6511)
|
2024-05-03 22:36:41 +03:00 |
flake.lock
|
flake.lock: Update
|
2024-04-28 11:12:50 +00:00 |
flake.nix
|
nix: .#windows: proper cross-compilation set-up
|
2024-03-28 07:48:27 +00:00 |
Further tidy on Android instructions README.md
|
Organized required packages per build type
|
2024-05-05 19:47:24 -03:00 |
ggml-alloc.c
|
ggml : fix calloc argument ordering. (#6820)
|
2024-04-22 16:05:06 +02:00 |
ggml-alloc.h
|
llama : add pipeline parallelism support (#6017)
|
2024-03-13 18:54:21 +01:00 |
ggml-backend-impl.h
|
backend : offload large batches to GPU (#6083)
|
2024-03-18 11:03:04 +01:00 |
ggml-backend.c
|
Reset schedule earlier to allow overlap with ggml graph computation on device (#6933)
|
2024-04-26 20:08:30 +02:00 |
ggml-backend.h
|
backend : fix typo in scheduler documentation (ggml/781)
|
2024-04-06 17:42:26 +03:00 |
ggml-common.h
|
[SYCL] Disable iqx on windows as WA (#6435)
|
2024-04-03 10:34:40 +08:00 |
ggml-cuda.cu
|
ggml : add Flash Attention (#5021)
|
2024-04-30 12:16:08 +03:00 |
ggml-cuda.h
|
backend : offload large batches to GPU (#6083)
|
2024-03-18 11:03:04 +01:00 |
ggml-impl.h
|
ggml : fix __MSC_VER -> _MSC_VER (#6977)
|
2024-04-29 17:55:02 +03:00 |
ggml-kompute.cpp
|
ggml : add Flash Attention (#5021)
|
2024-04-30 12:16:08 +03:00 |
ggml-kompute.h
|
Nomic Vulkan backend (#4456)
|
2024-01-29 15:50:50 -05:00 |
ggml-metal.h
|
metal : add debug capture backend function (ggml/694)
|
2024-01-30 16:20:25 +02:00 |
ggml-metal.m
|
switch to using localizedDescription (#7010)
|
2024-04-30 17:14:02 +02:00 |
ggml-metal.metal
|
ggml : add Flash Attention (#5021)
|
2024-04-30 12:16:08 +03:00 |
ggml-mpi.c
|
ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)
|
2023-07-11 19:31:10 +03:00 |
ggml-mpi.h
|
mpi : add support for distributed inference via MPI (#2099)
|
2023-07-10 18:49:56 +03:00 |
ggml-opencl.cpp
|
llama : greatly reduce output buffer memory usage (#6122)
|
2024-03-26 16:46:41 +02:00 |
ggml-opencl.h
|
Add OpenCL add kernel (#5151)
|
2024-01-26 23:07:32 +01:00 |
ggml-quants.c
|
add basic tensor data validation function (#6884)
|
2024-04-26 18:39:58 +02:00 |
ggml-quants.h
|
llama : add Command R Plus support (#6491)
|
2024-04-09 11:16:13 +03:00 |
ggml-sycl.cpp
|
ggml : add Flash Attention (#5021)
|
2024-04-30 12:16:08 +03:00 |
ggml-sycl.h
|
[SYCL] offload op (#6217)
|
2024-03-24 12:04:25 +08:00 |
ggml-vulkan-shaders.hpp
|
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
|
2024-03-29 17:29:21 +01:00 |
ggml-vulkan.cpp
|
ggml : add Flash Attention (#5021)
|
2024-04-30 12:16:08 +03:00 |
ggml-vulkan.h
|
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
|
2024-03-29 17:29:21 +01:00 |
ggml.c
|
gguf-split: add --no-tensor-first-split (#7072)
|
2024-05-04 18:56:22 +02:00 |
ggml.h
|
ggml : add Flash Attention (#5021)
|
2024-04-30 12:16:08 +03:00 |
ggml_vk_generate_shaders.py
|
convert.py : add python logging instead of print() (#6511)
|
2024-05-03 22:36:41 +03:00 |
LICENSE
|
license : update copyright notice + add AUTHORS (#6405)
|
2024-04-09 09:23:19 +03:00 |
llama.cpp
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
llama.h
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
Makefile
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
mypy.ini
|
convert : partially revert PR #4818 (#5041)
|
2024-01-20 18:14:18 -05:00 |
Package.swift
|
ggml : add llamafile sgemm (#6414)
|
2024-04-16 21:55:30 +03:00 |
README-sycl.md
|
build(cmake): simplify instructions (cmake -B build && cmake --build build ... ) (#6964)
|
2024-04-29 17:02:45 +01:00 |
requirements.txt
|
llama : fix BPE pre-tokenization (#6920)
|
2024-04-29 16:58:41 +03:00 |
SECURITY.md
|
chore: Fix markdown warnings (#6625)
|
2024-04-12 10:52:36 +02:00 |
sgemm.cpp
|
llamafile : use 64-bit integers in sgemm (#6928)
|
2024-04-26 17:05:33 +03:00 |
sgemm.h
|
llamafile : use 64-bit integers in sgemm (#6928)
|
2024-04-26 17:05:33 +03:00 |
unicode-data.cpp
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
unicode-data.h
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
unicode.cpp
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |
unicode.h
|
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
2024-05-04 08:32:32 +03:00 |