howlger
880e352277
py : open merges file as 'utf-8' ( #4566 )
...
Otherwise, on Windows converting bling-phi-2-v0 (<https://huggingface.co/llmware/bling-phi-2-v0 >) via convert-hf-to-gguf.py will fail with the following error:
```
Traceback (most recent call last):
File "C:\Users\User\git\gguf\convert-hf-to-gguf.py", line 1061, in <module>
model_instance.set_vocab()
File "C:\Users\User\git\gguf\convert-hf-to-gguf.py", line 52, in set_vocab
self._set_vocab_gpt2()
File "C:\Users\User\git\gguf\convert-hf-to-gguf.py", line 264, in _set_vocab_gpt2
special_vocab = gguf.SpecialVocab(dir_model, load_merges=True)
File "C:\Users\User\git\gguf\gguf\vocab.py", line 33, in __init__
self._load(Path(path))
File "C:\Users\User\git\gguf\gguf\vocab.py", line 81, in _load
self._try_load_merges_txt(path)
File "C:\Users\User\git\gguf\gguf\vocab.py", line 95, in _try_load_merges_txt
for line in fp:
File "C:\Users\User\miniconda3\envs\gguf\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1415: character maps to <undefined>
```
2023-12-21 19:07:34 +02:00
bobqianic
66f35a2f48
cuda : better error message for ggml_get_rows ( #4561 )
...
* Update ggml-cuda.cu
* Update ggml-cuda.cu
* Update ggml-cuda.cu
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-21 19:06:44 +02:00
slaren
1398823922
cuda : replace asserts in wrong architecture checks with __trap ( #4556 )
...
* cuda : replace asserts in wrong architecture checks with __trap
* make bad_arch noreturn, remove returns
2023-12-21 18:02:30 +01:00
Johannes Gäßler
d3223afdad
llama : disable per-tensor info prints on model load ( #4562 )
2023-12-21 18:34:17 +02:00
Concedo
2378a29bde
better error handling, try to avoid segfault in sillytavern
2023-12-21 22:58:48 +08:00
Concedo
c05d195583
Merge branch 'concedo' into concedo_experimental
2023-12-21 20:08:54 +08:00
Concedo
ff4c2b18d7
testing workflow for windows cuda builds
...
(cherry picked from commit e1f013bbf8
)
2023-12-21 20:08:11 +08:00
Concedo
96c12cf395
Merge branch 'master' into concedo_experimental
2023-12-21 20:03:21 +08:00
Concedo
e1f013bbf8
testing workflow for windows cuda builds
2023-12-21 19:36:52 +08:00
LoganDark
1d7a1912ce
Fix access violation in ggml_cuda_free_data if tensor->extra is NULL ( #4554 )
2023-12-21 10:59:27 +01:00
Eugene Palmoff
a787ebe7cf
Handle broken pipe error ( #572 )
2023-12-21 17:51:36 +08:00
Johannes Gäßler
799fc22689
CUDA: Faster Mixtral prompt processing ( #4538 )
...
* CUDA: make MoE tensors contiguous for batch size>1
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-20 15:41:22 +01:00
Eric Sommerlade
328b83de23
ggml : fixed check for _MSC_VER ( #4535 )
...
Co-authored-by: Eric Sommerlade <ersomme@microsoft.com>
2023-12-19 18:17:01 +02:00
Concedo
3f863eed72
add presence penalty
2023-12-19 23:18:56 +08:00
Concedo
da2db0302c
Added support for ssl cert and key
2023-12-19 22:23:19 +08:00
Concedo
49a5dfc604
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
2023-12-19 16:07:48 +08:00
Concedo
1f77d2ad73
move multiprocessing import into function scope
2023-12-19 15:56:58 +08:00
ebolam
6948da5a0d
Fix for windows model unloading not releasing memory ( #569 )
...
* Add in model processes as a separate process so it can be killed when unloading to release memory on windows
* Fix from Henky
2023-12-19 15:55:41 +08:00
Concedo
4c274dc2fd
fix tools compilation
2023-12-19 15:53:22 +08:00
arlo-phoenix
a7aee47b98
ggml-cuda: Fix HIP build ( #4528 )
...
regression of #4490
Adds defines for two new datatypes
cublasComputeType_t, cudaDataType_t.
Currently using deprecated hipblasDatatype_t since newer ones very recent.
2023-12-18 22:33:45 +01:00
Georgi Gerganov
0e18b2e7d0
llama.swiftui : add tinyllama 1.1B F16
2023-12-18 20:17:43 +02:00
Georgi Gerganov
6ff39b129d
llama.swiftui : add more models
2023-12-18 20:05:12 +02:00
Ebey Abraham
b9e74f9bca
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec ( #4490 )
...
* phi2 implementation
* fix breaking change
* phi-2 : various fixes
* phi-2 : use layer norm eps
* py : whitespaces
* llama : fix meta KV override bug
* convert : phi don't add BOS token
* convert : revert "added_tokens_decoder" change
* phi-2 : scale Q instead of KQ for better precision
* ggml : fix NeoX rope to rotate just first n_dims
* cuda : less diff in the rope_neox kernel
* ggml : add ggml_mul_mat_set_prec
ggml-ci
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision
* cuda : remove oboslete comment
---------
Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-18 19:27:47 +02:00
hankcs
3c04bf6da8
llama : fix try_override for bool_value which always return true ( #4519 )
2023-12-18 15:14:58 +02:00
Jared Van Bortel
2994f0c5a2
decode : fix logits_valid for legacy API ( #4516 )
2023-12-17 19:39:02 -05:00
Georgi Gerganov
b1306c4394
readme : update hot topics
2023-12-17 20:16:23 +02:00
Georgi Gerganov
800a489e4a
llama.swiftui : add bench functionality ( #4483 )
...
* llama.swiftui : add bench button
* llama.swiftui : initial bench functionality
* force to use n_gpu_layers on simulator
* add download buttons & expose llamaState.loadModel
* update project.pbxproj
* comment #Preview & fix editorconfig check
* gitignore : xcode stuff
* llama.swiftui : UX improvements
* llama.swiftui : avoid data copy via "downloadTask"
* llama.swiftui : remove model from project
* llama : remove "mostly" from model infos
* llama.swiftui : improve bench
---------
Co-authored-by: jhen <developer@jhen.me>
2023-12-17 19:38:41 +02:00
Jared Van Bortel
f7f468a97d
gguf-py : fail fast on nonsensical special token IDs ( #4489 )
2023-12-17 10:45:46 -05:00
Matheus Gabriel Alves Silva
919c40660f
build : Check the ROCm installation location ( #4485 )
...
* build : Check the ROCm installation location
* more generic approach
* fixup! It was returning the path instead of the command output
* fixup! Trailing whitespace
2023-12-17 17:23:33 +02:00
slaren
45668633fd
finetune : keep allocs alive until all allocations are done ( #4486 )
2023-12-17 16:05:56 +01:00
olexiyb
0ffc92d2d2
server : disable llm logs if SERVER_VERBOSE is off ( #3792 )
2023-12-17 17:02:16 +02:00
AdithyanI
8edd2b40fd
server : fix grammar being ignored ( #4494 )
...
Fix bug in identifying the grammar.
2023-12-17 16:57:56 +02:00
Alexey Parfenov
eb16dae7e7
server : fix possible ambiguity in content type charset ( #4501 )
2023-12-17 16:56:09 +02:00
mzcu
62bd52b7bf
server : allow requests larger than 8K ( #4500 )
2023-12-17 16:54:37 +02:00
Bach Le
5daa5f54fd
Link to cublas dynamically on Windows even with LLAMA_STATIC ( #4506 )
2023-12-17 11:57:33 +01:00
Concedo
ec05230703
updated lite, up ver
2023-12-17 14:38:39 +08:00
Concedo
e8cf7f6ed3
Merge remote-tracking branch 'origin/master' into concedo_experimental
2023-12-17 14:37:14 +08:00
slaren
c6c4fc081c
lora : add support for non-llama models ( #3333 )
...
* lora : add support for non-llama models
ggml-ci
* avoid leaking ggml_context on failure
cleanup
ggml-ci
* lora : allow 1d tensors
* lora : include embd and output layers in size calculation
* fix style
2023-12-16 18:58:46 +01:00
Concedo
76a3ba42eb
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# ggml.c
# ggml.h
# requirements.txt
# tests/test-quantize-perf.cpp
2023-12-16 22:58:53 +08:00
Jared Van Bortel
8a5be3bd58
llama : sanity checks for access to logits ( #4274 )
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 22:16:15 -05:00
ShadovvBeast
88ae8952b6
server : add optional API Key Authentication example ( #4441 )
...
* Add API key authentication for enhanced server-client security
* server : to snake_case
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 13:49:01 +02:00
slaren
ee4725a686
ggml : group mul_mat_id rows by matrix (cpu only) ( #4480 )
...
* ggml : group mul_mat_id rows by matrix (cpu only)
* remove mmid parameters from mm forward
* store row groups in wdata and calculate only once in GGML_TASK_INIT
ggml-ci
2023-12-15 12:45:50 +01:00
slaren
6744dbe924
ggml : use ggml_row_size where possible ( #4472 )
...
* ggml : use ggml_row_size where possible
ggml-ci
* ggml : move ggml_nbytes_split to ggml-cuda.cu
2023-12-14 20:05:21 +01:00
slaren
cafcd4f895
ggml : remove n_dims from ggml_tensor ( #4469 )
...
ggml-ci
2023-12-14 16:52:08 +01:00
wonjun Jang
c50e400163
py : add protobuf dependency ( #4466 )
2023-12-14 14:44:49 +02:00
LostRuins
20a68a7030
ggml : add ggml_row_size() (fixes llama out of space) ( #4461 )
...
* Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values
* do not cast to size_t, instead just use doubles
* ggml : add ggml_row_size(), deprecate ggml_type_sizef()
* ggml : fix row size compute to avoid overflows
* tests : fix sizey -> sizez
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 14:13:33 +02:00
Concedo
7798587990
Workflow Build from experimental branch
2023-12-14 19:17:19 +08:00
Concedo
ae3d829d0c
manual workflow for generating builds instead
2023-12-14 19:00:58 +08:00
Concedo
aac7f0b944
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# ggml.c
2023-12-14 17:24:42 +08:00
Concedo
f0de4953ae
fixed length exceeding max ctx
2023-12-14 16:58:41 +08:00