Find a file
Jeximo 8ae63953fe
re-word for clarity
method seems to be more correct, instead of alternative in this context
2024-05-05 18:30:14 -03:00
.devops build(cmake): simplify instructions (cmake -B build && cmake --build build ...) (#6964) 2024-04-29 17:02:45 +01:00
.github convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
ci ggml : add Flash Attention (#5021) 2024-04-30 12:16:08 +03:00
cmake cmake : MSVC instruction detection (fixed up #809) (#3923) 2023-11-05 10:03:09 +02:00
common Fix Linux /sys cpu path to guess number of cores (#7064) 2024-05-04 15:26:53 +02:00
docs eval-callback: Example how to use eval callback for debugging (#6576) 2024-04-11 14:51:07 +02:00
examples gguf-split: add --no-tensor-first-split (#7072) 2024-05-04 18:56:22 +02:00
ggml-cuda CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019) 2024-05-01 14:46:37 +02:00
gguf-py convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
grammars JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00
kompute@4565194ed7 Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
kompute-shaders Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
media README: add graphic for matrix multiplication (#6881) 2024-04-24 21:29:13 +02:00
models tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
pocs ggml : add mmla kernels for quantized GEMM (#4966) 2024-02-11 15:22:33 +02:00
prompts llama : add Qwen support (#4281) 2023-12-01 20:16:31 +02:00
requirements llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
scripts tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
spm-headers swift : package no longer use ggml dependency (#5465) 2024-02-12 19:54:29 +02:00
tests tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
.clang-tidy cuda : refactor into multiple files (#6269) 2024-03-25 13:50:23 +01:00
.dockerignore docker : ignore Git files (#3314) 2023-10-02 11:53:53 +03:00
.ecrc Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
.editorconfig llama.swiftui : add bench functionality (#4483) 2023-12-17 19:38:41 +02:00
.flake8 tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
.gitignore Improve usability of --model-url & related flags (#6930) 2024-04-30 00:52:50 +01:00
.gitmodules Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
.pre-commit-config.yaml convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
AUTHORS license : update copyright notice + add AUTHORS (#6405) 2024-04-09 09:23:19 +03:00
build.zig build: generate hex dump of server assets during build (#6661) 2024-04-21 18:48:53 +01:00
CMakeLists.txt cmake : restore LLAMA_LLAMAFILE_DEFAULT 2024-04-25 21:37:27 +03:00
codecov.yml cov : disable comment in PRs (#2989) 2023-09-03 13:19:01 +03:00
convert-hf-to-gguf-update.py tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
convert-hf-to-gguf.py tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
convert-llama-ggml-to-gguf.py convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
convert-lora-to-ggml.py convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
convert-persimmon-to-gguf.py convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
convert.py convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
flake.lock flake.lock: Update 2024-04-28 11:12:50 +00:00
flake.nix nix: .#windows: proper cross-compilation set-up 2024-03-28 07:48:27 +00:00
Further tidy on Android instructions README.md re-word for clarity 2024-05-05 18:30:14 -03:00
ggml-alloc.c ggml : fix calloc argument ordering. (#6820) 2024-04-22 16:05:06 +02:00
ggml-alloc.h llama : add pipeline parallelism support (#6017) 2024-03-13 18:54:21 +01:00
ggml-backend-impl.h backend : offload large batches to GPU (#6083) 2024-03-18 11:03:04 +01:00
ggml-backend.c Reset schedule earlier to allow overlap with ggml graph computation on device (#6933) 2024-04-26 20:08:30 +02:00
ggml-backend.h backend : fix typo in scheduler documentation (ggml/781) 2024-04-06 17:42:26 +03:00
ggml-common.h [SYCL] Disable iqx on windows as WA (#6435) 2024-04-03 10:34:40 +08:00
ggml-cuda.cu ggml : add Flash Attention (#5021) 2024-04-30 12:16:08 +03:00
ggml-cuda.h backend : offload large batches to GPU (#6083) 2024-03-18 11:03:04 +01:00
ggml-impl.h ggml : fix __MSC_VER -> _MSC_VER (#6977) 2024-04-29 17:55:02 +03:00
ggml-kompute.cpp ggml : add Flash Attention (#5021) 2024-04-30 12:16:08 +03:00
ggml-kompute.h Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
ggml-metal.h metal : add debug capture backend function (ggml/694) 2024-01-30 16:20:25 +02:00
ggml-metal.m switch to using localizedDescription (#7010) 2024-04-30 17:14:02 +02:00
ggml-metal.metal ggml : add Flash Attention (#5021) 2024-04-30 12:16:08 +03:00
ggml-mpi.c ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) 2023-07-11 19:31:10 +03:00
ggml-mpi.h mpi : add support for distributed inference via MPI (#2099) 2023-07-10 18:49:56 +03:00
ggml-opencl.cpp llama : greatly reduce output buffer memory usage (#6122) 2024-03-26 16:46:41 +02:00
ggml-opencl.h Add OpenCL add kernel (#5151) 2024-01-26 23:07:32 +01:00
ggml-quants.c add basic tensor data validation function (#6884) 2024-04-26 18:39:58 +02:00
ggml-quants.h llama : add Command R Plus support (#6491) 2024-04-09 11:16:13 +03:00
ggml-sycl.cpp ggml : add Flash Attention (#5021) 2024-04-30 12:16:08 +03:00
ggml-sycl.h [SYCL] offload op (#6217) 2024-03-24 12:04:25 +08:00
ggml-vulkan-shaders.hpp Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 2024-03-29 17:29:21 +01:00
ggml-vulkan.cpp ggml : add Flash Attention (#5021) 2024-04-30 12:16:08 +03:00
ggml-vulkan.h Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 2024-03-29 17:29:21 +01:00
ggml.c gguf-split: add --no-tensor-first-split (#7072) 2024-05-04 18:56:22 +02:00
ggml.h ggml : add Flash Attention (#5021) 2024-04-30 12:16:08 +03:00
ggml_vk_generate_shaders.py convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
LICENSE license : update copyright notice + add AUTHORS (#6405) 2024-04-09 09:23:19 +03:00
llama.cpp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
llama.h tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
Makefile tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
mypy.ini convert : partially revert PR #4818 (#5041) 2024-01-20 18:14:18 -05:00
Package.swift ggml : add llamafile sgemm (#6414) 2024-04-16 21:55:30 +03:00
README-sycl.md build(cmake): simplify instructions (cmake -B build && cmake --build build ...) (#6964) 2024-04-29 17:02:45 +01:00
requirements.txt llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
SECURITY.md chore: Fix markdown warnings (#6625) 2024-04-12 10:52:36 +02:00
sgemm.cpp llamafile : use 64-bit integers in sgemm (#6928) 2024-04-26 17:05:33 +03:00
sgemm.h llamafile : use 64-bit integers in sgemm (#6928) 2024-04-26 17:05:33 +03:00
unicode-data.cpp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
unicode-data.h tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
unicode.cpp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
unicode.h tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00