crasm
e86b8cd93a
Remove shellcheck installation step from workflow
2023-12-21 04:29:05 -05:00
crasm
c9a6de8f8a
Add check-requirements.sh script and GitHub workflow
2023-12-21 04:16:41 -05:00
crasm
b853df4207
Add convert-persimmon-to-gguf.py to new requirements.txt scheme
2023-12-20 03:32:22 -05:00
crasm
ba46057b11
Merge remote-tracking branch 'upstream/master' into cancel-model-load
2023-12-20 00:15:09 -05:00
crasm
ca122dc9e0
Add comment
2023-12-20 00:14:56 -05:00
crasm
a0eab1ea19
Make per-python-script requirements work alone
...
This doesn't break the main requirements.txt.
2023-12-20 00:10:31 -05:00
crasm
267cfa408b
Merge commit ' c50e400163
' into cancel-model-load
2023-12-20 00:04:20 -05:00
crasm
293d16fd40
Restructure requirements.txt
...
Top-level now imports the specific additional requirements for each
python file. Using `pip install -r requirements.txt` will fail if
versions become mismatched in the per-file requirements.
2023-12-20 00:00:08 -05:00
crasm
9a056ed708
Remove venv before creation
2023-12-19 20:56:22 -05:00
crasm
9809314bbf
Disable test-model-load-cancel in make
2023-12-19 17:46:36 -05:00
Eric Sommerlade
328b83de23
ggml : fixed check for _MSC_VER ( #4535 )
...
Co-authored-by: Eric Sommerlade <ersomme@microsoft.com>
2023-12-19 18:17:01 +02:00
crasm
1e79625910
update requirements.txt
2023-12-19 02:42:07 -05:00
crasm
121b04d121
ci : restrict .github/workflows/build.yml ctest to -L main
2023-12-19 02:20:01 -05:00
crasm
f80ff4dc6a
ci : get ci/run.sh working with test-model-load-cancel
2023-12-19 02:18:50 -05:00
arlo-phoenix
a7aee47b98
ggml-cuda: Fix HIP build ( #4528 )
...
regression of #4490
Adds defines for two new datatypes
cublasComputeType_t, cudaDataType_t.
Currently using deprecated hipblasDatatype_t since newer ones very recent.
2023-12-18 22:33:45 +01:00
Georgi Gerganov
0e18b2e7d0
llama.swiftui : add tinyllama 1.1B F16
2023-12-18 20:17:43 +02:00
Georgi Gerganov
6ff39b129d
llama.swiftui : add more models
2023-12-18 20:05:12 +02:00
Ebey Abraham
b9e74f9bca
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec ( #4490 )
...
* phi2 implementation
* fix breaking change
* phi-2 : various fixes
* phi-2 : use layer norm eps
* py : whitespaces
* llama : fix meta KV override bug
* convert : phi don't add BOS token
* convert : revert "added_tokens_decoder" change
* phi-2 : scale Q instead of KQ for better precision
* ggml : fix NeoX rope to rotate just first n_dims
* cuda : less diff in the rope_neox kernel
* ggml : add ggml_mul_mat_set_prec
ggml-ci
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision
* cuda : remove oboslete comment
---------
Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-18 19:27:47 +02:00
hankcs
3c04bf6da8
llama : fix try_override for bool_value which always return true ( #4519 )
2023-12-18 15:14:58 +02:00
crasm
aed3cf838c
Attempt at writing ctest_with_model
2023-12-18 04:45:39 -05:00
crasm
4b63355f45
ci : ctest uses -L main
2023-12-18 04:23:58 -05:00
crasm
fd9d247dd2
Label all ctest tests
2023-12-18 04:23:20 -05:00
crasm
6bba3410fa
Simplify .gitignore for tests, clang-tidy fixes
2023-12-17 22:33:38 -05:00
crasm
fe6a6fb6d1
Revert "Revert "Fail test if model file is missing""
...
This reverts commit 2796953257
.
2023-12-17 22:24:17 -05:00
crasm
068e7c408f
Add test-model-load-cancel to Makefile
2023-12-17 22:22:42 -05:00
Jared Van Bortel
2994f0c5a2
decode : fix logits_valid for legacy API ( #4516 )
2023-12-17 19:39:02 -05:00
crasm
2796953257
Revert "Fail test if model file is missing"
...
This reverts commit 32ebd525bf
.
2023-12-17 14:37:01 -05:00
crasm
cb8a4be5d0
Merge branch 'cancel-model-load' of github.com:crasm/llama.cpp into cancel-model-load
2023-12-17 14:31:49 -05:00
crasm
32ebd525bf
Fail test if model file is missing
2023-12-17 14:31:03 -05:00
Georgi Gerganov
1160de38f6
Update llama.cpp
...
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-12-17 21:25:19 +02:00
Georgi Gerganov
b1306c4394
readme : update hot topics
2023-12-17 20:16:23 +02:00
Georgi Gerganov
800a489e4a
llama.swiftui : add bench functionality ( #4483 )
...
* llama.swiftui : add bench button
* llama.swiftui : initial bench functionality
* force to use n_gpu_layers on simulator
* add download buttons & expose llamaState.loadModel
* update project.pbxproj
* comment #Preview & fix editorconfig check
* gitignore : xcode stuff
* llama.swiftui : UX improvements
* llama.swiftui : avoid data copy via "downloadTask"
* llama.swiftui : remove model from project
* llama : remove "mostly" from model infos
* llama.swiftui : improve bench
---------
Co-authored-by: jhen <developer@jhen.me>
2023-12-17 19:38:41 +02:00
Jared Van Bortel
f7f468a97d
gguf-py : fail fast on nonsensical special token IDs ( #4489 )
2023-12-17 10:45:46 -05:00
Matheus Gabriel Alves Silva
919c40660f
build : Check the ROCm installation location ( #4485 )
...
* build : Check the ROCm installation location
* more generic approach
* fixup! It was returning the path instead of the command output
* fixup! Trailing whitespace
2023-12-17 17:23:33 +02:00
slaren
45668633fd
finetune : keep allocs alive until all allocations are done ( #4486 )
2023-12-17 16:05:56 +01:00
olexiyb
0ffc92d2d2
server : disable llm logs if SERVER_VERBOSE is off ( #3792 )
2023-12-17 17:02:16 +02:00
AdithyanI
8edd2b40fd
server : fix grammar being ignored ( #4494 )
...
Fix bug in identifying the grammar.
2023-12-17 16:57:56 +02:00
Alexey Parfenov
eb16dae7e7
server : fix possible ambiguity in content type charset ( #4501 )
2023-12-17 16:56:09 +02:00
mzcu
62bd52b7bf
server : allow requests larger than 8K ( #4500 )
2023-12-17 16:54:37 +02:00
Bach Le
5daa5f54fd
Link to cublas dynamically on Windows even with LLAMA_STATIC ( #4506 )
2023-12-17 11:57:33 +01:00
slaren
c6c4fc081c
lora : add support for non-llama models ( #3333 )
...
* lora : add support for non-llama models
ggml-ci
* avoid leaking ggml_context on failure
cleanup
ggml-ci
* lora : allow 1d tensors
* lora : include embd and output layers in size calculation
* fix style
2023-12-16 18:58:46 +01:00
Jared Van Bortel
8a5be3bd58
llama : sanity checks for access to logits ( #4274 )
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 22:16:15 -05:00
ShadovvBeast
88ae8952b6
server : add optional API Key Authentication example ( #4441 )
...
* Add API key authentication for enhanced server-client security
* server : to snake_case
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 13:49:01 +02:00
slaren
ee4725a686
ggml : group mul_mat_id rows by matrix (cpu only) ( #4480 )
...
* ggml : group mul_mat_id rows by matrix (cpu only)
* remove mmid parameters from mm forward
* store row groups in wdata and calculate only once in GGML_TASK_INIT
ggml-ci
2023-12-15 12:45:50 +01:00
crasm
4b1f70cb03
Fix bool return in llama_model_load, remove std::ignore use
2023-12-14 16:29:05 -05:00
slaren
6744dbe924
ggml : use ggml_row_size where possible ( #4472 )
...
* ggml : use ggml_row_size where possible
ggml-ci
* ggml : move ggml_nbytes_split to ggml-cuda.cu
2023-12-14 20:05:21 +01:00
slaren
cafcd4f895
ggml : remove n_dims from ggml_tensor ( #4469 )
...
ggml-ci
2023-12-14 16:52:08 +01:00
wonjun Jang
c50e400163
py : add protobuf dependency ( #4466 )
2023-12-14 14:44:49 +02:00
LostRuins
20a68a7030
ggml : add ggml_row_size() (fixes llama out of space) ( #4461 )
...
* Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values
* do not cast to size_t, instead just use doubles
* ggml : add ggml_row_size(), deprecate ggml_type_sizef()
* ggml : fix row size compute to avoid overflows
* tests : fix sizey -> sizez
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 14:13:33 +02:00
crasm
3425e62745
llama : Add test for model load cancellation
2023-12-14 04:47:54 -05:00