Commit graph

1689 commits

Author SHA1 Message Date
crasm
bdfe4ba85c Add nocleanup special arg 2023-12-21 04:55:28 -05:00
crasm
e86b8cd93a Remove shellcheck installation step from workflow 2023-12-21 04:29:05 -05:00
crasm
c9a6de8f8a Add check-requirements.sh script and GitHub workflow 2023-12-21 04:16:41 -05:00
crasm
b853df4207 Add convert-persimmon-to-gguf.py to new requirements.txt scheme 2023-12-20 03:32:22 -05:00
crasm
ba46057b11 Merge remote-tracking branch 'upstream/master' into cancel-model-load 2023-12-20 00:15:09 -05:00
crasm
ca122dc9e0 Add comment 2023-12-20 00:14:56 -05:00
crasm
a0eab1ea19 Make per-python-script requirements work alone
This doesn't break the main requirements.txt.
2023-12-20 00:10:31 -05:00
crasm
267cfa408b Merge commit 'c50e400163' into cancel-model-load 2023-12-20 00:04:20 -05:00
crasm
293d16fd40 Restructure requirements.txt
Top-level now imports the specific additional requirements for each
python file. Using `pip install -r requirements.txt` will fail if
versions become mismatched in the per-file requirements.
2023-12-20 00:00:08 -05:00
crasm
9a056ed708 Remove venv before creation 2023-12-19 20:56:22 -05:00
crasm
9809314bbf Disable test-model-load-cancel in make 2023-12-19 17:46:36 -05:00
Eric Sommerlade
328b83de23
ggml : fixed check for _MSC_VER (#4535)
Co-authored-by: Eric Sommerlade <ersomme@microsoft.com>
2023-12-19 18:17:01 +02:00
crasm
1e79625910 update requirements.txt 2023-12-19 02:42:07 -05:00
crasm
121b04d121 ci : restrict .github/workflows/build.yml ctest to -L main 2023-12-19 02:20:01 -05:00
crasm
f80ff4dc6a ci : get ci/run.sh working with test-model-load-cancel 2023-12-19 02:18:50 -05:00
arlo-phoenix
a7aee47b98
ggml-cuda: Fix HIP build (#4528)
regression of #4490
Adds defines for two new datatypes
cublasComputeType_t, cudaDataType_t.

Currently using deprecated hipblasDatatype_t since newer ones very recent.
2023-12-18 22:33:45 +01:00
Georgi Gerganov
0e18b2e7d0
llama.swiftui : add tinyllama 1.1B F16 2023-12-18 20:17:43 +02:00
Georgi Gerganov
6ff39b129d
llama.swiftui : add more models 2023-12-18 20:05:12 +02:00
Ebey Abraham
b9e74f9bca
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)
* phi2 implementation

* fix breaking change

* phi-2 : various fixes

* phi-2 : use layer norm eps

* py : whitespaces

* llama : fix meta KV override bug

* convert : phi don't add BOS token

* convert : revert "added_tokens_decoder" change

* phi-2 : scale Q instead of KQ for better precision

* ggml : fix NeoX rope to rotate just first n_dims

* cuda : less diff in the rope_neox kernel

* ggml : add ggml_mul_mat_set_prec

ggml-ci

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision

* cuda : remove oboslete comment

---------

Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-18 19:27:47 +02:00
hankcs
3c04bf6da8
llama : fix try_override for bool_value which always return true (#4519) 2023-12-18 15:14:58 +02:00
crasm
aed3cf838c Attempt at writing ctest_with_model 2023-12-18 04:45:39 -05:00
crasm
4b63355f45 ci : ctest uses -L main 2023-12-18 04:23:58 -05:00
crasm
fd9d247dd2 Label all ctest tests 2023-12-18 04:23:20 -05:00
crasm
6bba3410fa Simplify .gitignore for tests, clang-tidy fixes 2023-12-17 22:33:38 -05:00
crasm
fe6a6fb6d1 Revert "Revert "Fail test if model file is missing""
This reverts commit 2796953257.
2023-12-17 22:24:17 -05:00
crasm
068e7c408f Add test-model-load-cancel to Makefile 2023-12-17 22:22:42 -05:00
Jared Van Bortel
2994f0c5a2
decode : fix logits_valid for legacy API (#4516) 2023-12-17 19:39:02 -05:00
crasm
2796953257 Revert "Fail test if model file is missing"
This reverts commit 32ebd525bf.
2023-12-17 14:37:01 -05:00
crasm
cb8a4be5d0 Merge branch 'cancel-model-load' of github.com:crasm/llama.cpp into cancel-model-load 2023-12-17 14:31:49 -05:00
crasm
32ebd525bf Fail test if model file is missing 2023-12-17 14:31:03 -05:00
Georgi Gerganov
1160de38f6
Update llama.cpp
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-12-17 21:25:19 +02:00
Georgi Gerganov
b1306c4394
readme : update hot topics 2023-12-17 20:16:23 +02:00
Georgi Gerganov
800a489e4a
llama.swiftui : add bench functionality (#4483)
* llama.swiftui : add bench button

* llama.swiftui : initial bench functionality

* force to use n_gpu_layers on simulator

* add download buttons & expose llamaState.loadModel

* update project.pbxproj

* comment #Preview & fix editorconfig check

* gitignore : xcode stuff

* llama.swiftui : UX improvements

* llama.swiftui : avoid data copy via "downloadTask"

* llama.swiftui : remove model from project

* llama : remove "mostly" from model infos

* llama.swiftui : improve bench

---------

Co-authored-by: jhen <developer@jhen.me>
2023-12-17 19:38:41 +02:00
Jared Van Bortel
f7f468a97d
gguf-py : fail fast on nonsensical special token IDs (#4489) 2023-12-17 10:45:46 -05:00
Matheus Gabriel Alves Silva
919c40660f
build : Check the ROCm installation location (#4485)
* build : Check the ROCm installation location

* more generic approach

* fixup! It was returning the path instead of the command output

* fixup! Trailing whitespace
2023-12-17 17:23:33 +02:00
slaren
45668633fd
finetune : keep allocs alive until all allocations are done (#4486) 2023-12-17 16:05:56 +01:00
olexiyb
0ffc92d2d2
server : disable llm logs if SERVER_VERBOSE is off (#3792) 2023-12-17 17:02:16 +02:00
AdithyanI
8edd2b40fd
server : fix grammar being ignored (#4494)
Fix bug in identifying the grammar.
2023-12-17 16:57:56 +02:00
Alexey Parfenov
eb16dae7e7
server : fix possible ambiguity in content type charset (#4501) 2023-12-17 16:56:09 +02:00
mzcu
62bd52b7bf
server : allow requests larger than 8K (#4500) 2023-12-17 16:54:37 +02:00
Bach Le
5daa5f54fd
Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506) 2023-12-17 11:57:33 +01:00
slaren
c6c4fc081c
lora : add support for non-llama models (#3333)
* lora : add support for non-llama models

ggml-ci

* avoid leaking ggml_context on failure
cleanup

ggml-ci

* lora : allow 1d tensors

* lora : include embd and output layers in size calculation

* fix style
2023-12-16 18:58:46 +01:00
Jared Van Bortel
8a5be3bd58
llama : sanity checks for access to logits (#4274)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 22:16:15 -05:00
ShadovvBeast
88ae8952b6
server : add optional API Key Authentication example (#4441)
* Add API key authentication for enhanced server-client security

* server : to snake_case

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 13:49:01 +02:00
slaren
ee4725a686
ggml : group mul_mat_id rows by matrix (cpu only) (#4480)
* ggml : group mul_mat_id rows by matrix (cpu only)

* remove mmid parameters from mm forward

* store row groups in wdata and calculate only once in GGML_TASK_INIT

ggml-ci
2023-12-15 12:45:50 +01:00
crasm
4b1f70cb03 Fix bool return in llama_model_load, remove std::ignore use 2023-12-14 16:29:05 -05:00
slaren
6744dbe924
ggml : use ggml_row_size where possible (#4472)
* ggml : use ggml_row_size where possible

ggml-ci

* ggml : move ggml_nbytes_split to ggml-cuda.cu
2023-12-14 20:05:21 +01:00
slaren
cafcd4f895
ggml : remove n_dims from ggml_tensor (#4469)
ggml-ci
2023-12-14 16:52:08 +01:00
wonjun Jang
c50e400163
py : add protobuf dependency (#4466) 2023-12-14 14:44:49 +02:00
LostRuins
20a68a7030
ggml : add ggml_row_size() (fixes llama out of space) (#4461)
* Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values

* do not cast to size_t, instead just use doubles

* ggml : add ggml_row_size(), deprecate ggml_type_sizef()

* ggml : fix row size compute to avoid overflows

* tests : fix sizey -> sizez

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 14:13:33 +02:00