slaren
1398823922
cuda : replace asserts in wrong architecture checks with __trap ( #4556 )
...
* cuda : replace asserts in wrong architecture checks with __trap
* make bad_arch noreturn, remove returns
2023-12-21 18:02:30 +01:00
Johannes Gäßler
d3223afdad
llama : disable per-tensor info prints on model load ( #4562 )
2023-12-21 18:34:17 +02:00
Concedo
2378a29bde
better error handling, try to avoid segfault in sillytavern
2023-12-21 22:58:48 +08:00
Concedo
c05d195583
Merge branch 'concedo' into concedo_experimental
2023-12-21 20:08:54 +08:00
Concedo
ff4c2b18d7
testing workflow for windows cuda builds
...
(cherry picked from commit e1f013bbf8
)
2023-12-21 20:08:11 +08:00
Concedo
96c12cf395
Merge branch 'master' into concedo_experimental
2023-12-21 20:03:21 +08:00
Concedo
e1f013bbf8
testing workflow for windows cuda builds
2023-12-21 19:36:52 +08:00
LoganDark
1d7a1912ce
Fix access violation in ggml_cuda_free_data if tensor->extra is NULL ( #4554 )
2023-12-21 10:59:27 +01:00
Eugene Palmoff
a787ebe7cf
Handle broken pipe error ( #572 )
2023-12-21 17:51:36 +08:00
Johannes Gäßler
799fc22689
CUDA: Faster Mixtral prompt processing ( #4538 )
...
* CUDA: make MoE tensors contiguous for batch size>1
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-20 15:41:22 +01:00
Eric Sommerlade
328b83de23
ggml : fixed check for _MSC_VER ( #4535 )
...
Co-authored-by: Eric Sommerlade <ersomme@microsoft.com>
2023-12-19 18:17:01 +02:00
Concedo
3f863eed72
add presence penalty
2023-12-19 23:18:56 +08:00
Concedo
da2db0302c
Added support for ssl cert and key
2023-12-19 22:23:19 +08:00
Concedo
49a5dfc604
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
2023-12-19 16:07:48 +08:00
Concedo
1f77d2ad73
move multiprocessing import into function scope
2023-12-19 15:56:58 +08:00
ebolam
6948da5a0d
Fix for windows model unloading not releasing memory ( #569 )
...
* Add in model processes as a separate process so it can be killed when unloading to release memory on windows
* Fix from Henky
2023-12-19 15:55:41 +08:00
Concedo
4c274dc2fd
fix tools compilation
2023-12-19 15:53:22 +08:00
arlo-phoenix
a7aee47b98
ggml-cuda: Fix HIP build ( #4528 )
...
regression of #4490
Adds defines for two new datatypes
cublasComputeType_t, cudaDataType_t.
Currently using deprecated hipblasDatatype_t since newer ones very recent.
2023-12-18 22:33:45 +01:00
Georgi Gerganov
0e18b2e7d0
llama.swiftui : add tinyllama 1.1B F16
2023-12-18 20:17:43 +02:00
Georgi Gerganov
6ff39b129d
llama.swiftui : add more models
2023-12-18 20:05:12 +02:00
Ebey Abraham
b9e74f9bca
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec ( #4490 )
...
* phi2 implementation
* fix breaking change
* phi-2 : various fixes
* phi-2 : use layer norm eps
* py : whitespaces
* llama : fix meta KV override bug
* convert : phi don't add BOS token
* convert : revert "added_tokens_decoder" change
* phi-2 : scale Q instead of KQ for better precision
* ggml : fix NeoX rope to rotate just first n_dims
* cuda : less diff in the rope_neox kernel
* ggml : add ggml_mul_mat_set_prec
ggml-ci
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision
* cuda : remove oboslete comment
---------
Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-18 19:27:47 +02:00
hankcs
3c04bf6da8
llama : fix try_override for bool_value which always return true ( #4519 )
2023-12-18 15:14:58 +02:00
Jared Van Bortel
2994f0c5a2
decode : fix logits_valid for legacy API ( #4516 )
2023-12-17 19:39:02 -05:00
Georgi Gerganov
b1306c4394
readme : update hot topics
2023-12-17 20:16:23 +02:00
Georgi Gerganov
800a489e4a
llama.swiftui : add bench functionality ( #4483 )
...
* llama.swiftui : add bench button
* llama.swiftui : initial bench functionality
* force to use n_gpu_layers on simulator
* add download buttons & expose llamaState.loadModel
* update project.pbxproj
* comment #Preview & fix editorconfig check
* gitignore : xcode stuff
* llama.swiftui : UX improvements
* llama.swiftui : avoid data copy via "downloadTask"
* llama.swiftui : remove model from project
* llama : remove "mostly" from model infos
* llama.swiftui : improve bench
---------
Co-authored-by: jhen <developer@jhen.me>
2023-12-17 19:38:41 +02:00
Jared Van Bortel
f7f468a97d
gguf-py : fail fast on nonsensical special token IDs ( #4489 )
2023-12-17 10:45:46 -05:00
Matheus Gabriel Alves Silva
919c40660f
build : Check the ROCm installation location ( #4485 )
...
* build : Check the ROCm installation location
* more generic approach
* fixup! It was returning the path instead of the command output
* fixup! Trailing whitespace
2023-12-17 17:23:33 +02:00
slaren
45668633fd
finetune : keep allocs alive until all allocations are done ( #4486 )
2023-12-17 16:05:56 +01:00
olexiyb
0ffc92d2d2
server : disable llm logs if SERVER_VERBOSE is off ( #3792 )
2023-12-17 17:02:16 +02:00
AdithyanI
8edd2b40fd
server : fix grammar being ignored ( #4494 )
...
Fix bug in identifying the grammar.
2023-12-17 16:57:56 +02:00
Alexey Parfenov
eb16dae7e7
server : fix possible ambiguity in content type charset ( #4501 )
2023-12-17 16:56:09 +02:00
mzcu
62bd52b7bf
server : allow requests larger than 8K ( #4500 )
2023-12-17 16:54:37 +02:00
Bach Le
5daa5f54fd
Link to cublas dynamically on Windows even with LLAMA_STATIC ( #4506 )
2023-12-17 11:57:33 +01:00
Concedo
ec05230703
updated lite, up ver
2023-12-17 14:38:39 +08:00
Concedo
e8cf7f6ed3
Merge remote-tracking branch 'origin/master' into concedo_experimental
2023-12-17 14:37:14 +08:00
slaren
c6c4fc081c
lora : add support for non-llama models ( #3333 )
...
* lora : add support for non-llama models
ggml-ci
* avoid leaking ggml_context on failure
cleanup
ggml-ci
* lora : allow 1d tensors
* lora : include embd and output layers in size calculation
* fix style
2023-12-16 18:58:46 +01:00
Concedo
76a3ba42eb
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# ggml.c
# ggml.h
# requirements.txt
# tests/test-quantize-perf.cpp
2023-12-16 22:58:53 +08:00
Jared Van Bortel
8a5be3bd58
llama : sanity checks for access to logits ( #4274 )
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 22:16:15 -05:00
ShadovvBeast
88ae8952b6
server : add optional API Key Authentication example ( #4441 )
...
* Add API key authentication for enhanced server-client security
* server : to snake_case
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 13:49:01 +02:00
slaren
ee4725a686
ggml : group mul_mat_id rows by matrix (cpu only) ( #4480 )
...
* ggml : group mul_mat_id rows by matrix (cpu only)
* remove mmid parameters from mm forward
* store row groups in wdata and calculate only once in GGML_TASK_INIT
ggml-ci
2023-12-15 12:45:50 +01:00
slaren
6744dbe924
ggml : use ggml_row_size where possible ( #4472 )
...
* ggml : use ggml_row_size where possible
ggml-ci
* ggml : move ggml_nbytes_split to ggml-cuda.cu
2023-12-14 20:05:21 +01:00
slaren
cafcd4f895
ggml : remove n_dims from ggml_tensor ( #4469 )
...
ggml-ci
2023-12-14 16:52:08 +01:00
wonjun Jang
c50e400163
py : add protobuf dependency ( #4466 )
2023-12-14 14:44:49 +02:00
LostRuins
20a68a7030
ggml : add ggml_row_size() (fixes llama out of space) ( #4461 )
...
* Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values
* do not cast to size_t, instead just use doubles
* ggml : add ggml_row_size(), deprecate ggml_type_sizef()
* ggml : fix row size compute to avoid overflows
* tests : fix sizey -> sizez
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 14:13:33 +02:00
Concedo
7798587990
Workflow Build from experimental branch
2023-12-14 19:17:19 +08:00
Concedo
ae3d829d0c
manual workflow for generating builds instead
2023-12-14 19:00:58 +08:00
Concedo
aac7f0b944
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# ggml.c
2023-12-14 17:24:42 +08:00
Concedo
f0de4953ae
fixed length exceeding max ctx
2023-12-14 16:58:41 +08:00
Concedo
04bd895311
Revert "Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values"
...
This reverts commit 34b3dac66d
.
2023-12-14 16:46:29 +08:00
Concedo
53bbd1ee43
Merge branch 'pr_fix_buf_resize_type' into concedo_experimental
2023-12-14 16:45:18 +08:00