Eugene Palmoff
a787ebe7cf
Handle broken pipe error ( #572 )
2023-12-21 17:51:36 +08:00
Johannes Gäßler
799fc22689
CUDA: Faster Mixtral prompt processing ( #4538 )
...
* CUDA: make MoE tensors contiguous for batch size>1
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-20 15:41:22 +01:00
Eric Sommerlade
328b83de23
ggml : fixed check for _MSC_VER ( #4535 )
...
Co-authored-by: Eric Sommerlade <ersomme@microsoft.com>
2023-12-19 18:17:01 +02:00
Concedo
3f863eed72
add presence penalty
2023-12-19 23:18:56 +08:00
Concedo
da2db0302c
Added support for ssl cert and key
2023-12-19 22:23:19 +08:00
Concedo
49a5dfc604
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
2023-12-19 16:07:48 +08:00
Concedo
1f77d2ad73
move multiprocessing import into function scope
2023-12-19 15:56:58 +08:00
ebolam
6948da5a0d
Fix for windows model unloading not releasing memory ( #569 )
...
* Add in model processes as a separate process so it can be killed when unloading to release memory on windows
* Fix from Henky
2023-12-19 15:55:41 +08:00
Concedo
4c274dc2fd
fix tools compilation
2023-12-19 15:53:22 +08:00
arlo-phoenix
a7aee47b98
ggml-cuda: Fix HIP build ( #4528 )
...
regression of #4490
Adds defines for two new datatypes
cublasComputeType_t, cudaDataType_t.
Currently using deprecated hipblasDatatype_t since newer ones very recent.
2023-12-18 22:33:45 +01:00
Georgi Gerganov
0e18b2e7d0
llama.swiftui : add tinyllama 1.1B F16
2023-12-18 20:17:43 +02:00
Georgi Gerganov
6ff39b129d
llama.swiftui : add more models
2023-12-18 20:05:12 +02:00
Ebey Abraham
b9e74f9bca
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec ( #4490 )
...
* phi2 implementation
* fix breaking change
* phi-2 : various fixes
* phi-2 : use layer norm eps
* py : whitespaces
* llama : fix meta KV override bug
* convert : phi don't add BOS token
* convert : revert "added_tokens_decoder" change
* phi-2 : scale Q instead of KQ for better precision
* ggml : fix NeoX rope to rotate just first n_dims
* cuda : less diff in the rope_neox kernel
* ggml : add ggml_mul_mat_set_prec
ggml-ci
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision
* cuda : remove oboslete comment
---------
Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-18 19:27:47 +02:00
hankcs
3c04bf6da8
llama : fix try_override for bool_value which always return true ( #4519 )
2023-12-18 15:14:58 +02:00
Jared Van Bortel
2994f0c5a2
decode : fix logits_valid for legacy API ( #4516 )
2023-12-17 19:39:02 -05:00
Georgi Gerganov
b1306c4394
readme : update hot topics
2023-12-17 20:16:23 +02:00
Georgi Gerganov
800a489e4a
llama.swiftui : add bench functionality ( #4483 )
...
* llama.swiftui : add bench button
* llama.swiftui : initial bench functionality
* force to use n_gpu_layers on simulator
* add download buttons & expose llamaState.loadModel
* update project.pbxproj
* comment #Preview & fix editorconfig check
* gitignore : xcode stuff
* llama.swiftui : UX improvements
* llama.swiftui : avoid data copy via "downloadTask"
* llama.swiftui : remove model from project
* llama : remove "mostly" from model infos
* llama.swiftui : improve bench
---------
Co-authored-by: jhen <developer@jhen.me>
2023-12-17 19:38:41 +02:00
Jared Van Bortel
f7f468a97d
gguf-py : fail fast on nonsensical special token IDs ( #4489 )
2023-12-17 10:45:46 -05:00
Matheus Gabriel Alves Silva
919c40660f
build : Check the ROCm installation location ( #4485 )
...
* build : Check the ROCm installation location
* more generic approach
* fixup! It was returning the path instead of the command output
* fixup! Trailing whitespace
2023-12-17 17:23:33 +02:00
slaren
45668633fd
finetune : keep allocs alive until all allocations are done ( #4486 )
2023-12-17 16:05:56 +01:00
olexiyb
0ffc92d2d2
server : disable llm logs if SERVER_VERBOSE is off ( #3792 )
2023-12-17 17:02:16 +02:00
AdithyanI
8edd2b40fd
server : fix grammar being ignored ( #4494 )
...
Fix bug in identifying the grammar.
2023-12-17 16:57:56 +02:00
Alexey Parfenov
eb16dae7e7
server : fix possible ambiguity in content type charset ( #4501 )
2023-12-17 16:56:09 +02:00
mzcu
62bd52b7bf
server : allow requests larger than 8K ( #4500 )
2023-12-17 16:54:37 +02:00
Bach Le
5daa5f54fd
Link to cublas dynamically on Windows even with LLAMA_STATIC ( #4506 )
2023-12-17 11:57:33 +01:00
Concedo
ec05230703
updated lite, up ver
2023-12-17 14:38:39 +08:00
Concedo
e8cf7f6ed3
Merge remote-tracking branch 'origin/master' into concedo_experimental
2023-12-17 14:37:14 +08:00
slaren
c6c4fc081c
lora : add support for non-llama models ( #3333 )
...
* lora : add support for non-llama models
ggml-ci
* avoid leaking ggml_context on failure
cleanup
ggml-ci
* lora : allow 1d tensors
* lora : include embd and output layers in size calculation
* fix style
2023-12-16 18:58:46 +01:00
Concedo
76a3ba42eb
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# ggml.c
# ggml.h
# requirements.txt
# tests/test-quantize-perf.cpp
2023-12-16 22:58:53 +08:00
Jared Van Bortel
8a5be3bd58
llama : sanity checks for access to logits ( #4274 )
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 22:16:15 -05:00
ShadovvBeast
88ae8952b6
server : add optional API Key Authentication example ( #4441 )
...
* Add API key authentication for enhanced server-client security
* server : to snake_case
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 13:49:01 +02:00
slaren
ee4725a686
ggml : group mul_mat_id rows by matrix (cpu only) ( #4480 )
...
* ggml : group mul_mat_id rows by matrix (cpu only)
* remove mmid parameters from mm forward
* store row groups in wdata and calculate only once in GGML_TASK_INIT
ggml-ci
2023-12-15 12:45:50 +01:00
slaren
6744dbe924
ggml : use ggml_row_size where possible ( #4472 )
...
* ggml : use ggml_row_size where possible
ggml-ci
* ggml : move ggml_nbytes_split to ggml-cuda.cu
2023-12-14 20:05:21 +01:00
slaren
cafcd4f895
ggml : remove n_dims from ggml_tensor ( #4469 )
...
ggml-ci
2023-12-14 16:52:08 +01:00
wonjun Jang
c50e400163
py : add protobuf dependency ( #4466 )
2023-12-14 14:44:49 +02:00
LostRuins
20a68a7030
ggml : add ggml_row_size() (fixes llama out of space) ( #4461 )
...
* Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values
* do not cast to size_t, instead just use doubles
* ggml : add ggml_row_size(), deprecate ggml_type_sizef()
* ggml : fix row size compute to avoid overflows
* tests : fix sizey -> sizez
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 14:13:33 +02:00
Concedo
7798587990
Workflow Build from experimental branch
2023-12-14 19:17:19 +08:00
Concedo
ae3d829d0c
manual workflow for generating builds instead
2023-12-14 19:00:58 +08:00
Concedo
aac7f0b944
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# ggml.c
2023-12-14 17:24:42 +08:00
Concedo
f0de4953ae
fixed length exceeding max ctx
2023-12-14 16:58:41 +08:00
Concedo
04bd895311
Revert "Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values"
...
This reverts commit 34b3dac66d
.
2023-12-14 16:46:29 +08:00
Concedo
53bbd1ee43
Merge branch 'pr_fix_buf_resize_type' into concedo_experimental
2023-12-14 16:45:18 +08:00
Concedo
05f7db4b29
do not cast to size_t, instead just use doubles
2023-12-14 16:43:34 +08:00
Georgi Gerganov
55e87c3749
ggml : fix OpenCL broadcast requirement for ggml_mul ( close #4453 )
2023-12-14 10:35:29 +02:00
Concedo
c88fc19d59
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
2023-12-14 16:32:42 +08:00
Concedo
34b3dac66d
Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values
...
(cherry picked from commit 1ad8f0d80e
)
2023-12-14 16:16:25 +08:00
wonjun Jang
873637afc7
convert : support loading vocab from fast tokenizer config ( #3633 )
...
* Add HFVocab into convert.py
* Update convert.py
* Update convert.py
* add bytes_to_unicode function
* change add_meta_vocab fucntion
* remove debug code
* remove byte_encoder
* Add newline between classes
* Check tokenizer.json when tokenizer.model is not exist.
* Move transformers dependency to local code
* Add error context with 'raise from'
* Add fast tokenizer option to BpeVocab
* Update convert.py
* Add VocabLoader and remove *Vocab class
* Add transformers dependency
* remove added tokens and check newline token to decide spm or bpe
* Update convert.py
* Add special token type
* Update convert.py
* Update convert.py
* Update convert.py
* Fix typo in convert.py
* Fix when params.n_vocab < tokenizer vocab size
* update vocab class
* change funtion name
* Remove unused variable/functions, add types to class variable and methods, delete blank liens
* fix flake8 warnings
* code style cleanup
* make mypy happy
* change exception
---------
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-12-14 10:09:34 +02:00
Concedo
1ad8f0d80e
Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values
2023-12-14 16:00:44 +08:00
BarfingLemurs
0353a18401
readme : update supported model list ( #4457 )
2023-12-14 09:38:49 +02:00
Concedo
0e31f53422
Revert "lowvram var defaults"
...
This reverts commit 7a691522a6
.
2023-12-14 15:14:11 +08:00