Commit graph

2845 commits

Author SHA1 Message Date
Concedo
375003b458 always show reported arch 2023-12-22 11:15:07 +08:00
Concedo
2378a29bde better error handling, try to avoid segfault in sillytavern 2023-12-21 22:58:48 +08:00
Concedo
c05d195583 Merge branch 'concedo' into concedo_experimental 2023-12-21 20:08:54 +08:00
Concedo
ff4c2b18d7 testing workflow for windows cuda builds
(cherry picked from commit e1f013bbf8)
2023-12-21 20:08:11 +08:00
Concedo
96c12cf395 Merge branch 'master' into concedo_experimental 2023-12-21 20:03:21 +08:00
Concedo
e1f013bbf8 testing workflow for windows cuda builds 2023-12-21 19:36:52 +08:00
LoganDark
1d7a1912ce
Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554) 2023-12-21 10:59:27 +01:00
Eugene Palmoff
a787ebe7cf
Handle broken pipe error (#572) 2023-12-21 17:51:36 +08:00
Johannes Gäßler
799fc22689
CUDA: Faster Mixtral prompt processing (#4538)
* CUDA: make MoE tensors contiguous for batch size>1

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-12-20 15:41:22 +01:00
Eric Sommerlade
328b83de23
ggml : fixed check for _MSC_VER (#4535)
Co-authored-by: Eric Sommerlade <ersomme@microsoft.com>
2023-12-19 18:17:01 +02:00
Concedo
3f863eed72 add presence penalty 2023-12-19 23:18:56 +08:00
Concedo
da2db0302c Added support for ssl cert and key 2023-12-19 22:23:19 +08:00
Concedo
49a5dfc604 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
2023-12-19 16:07:48 +08:00
Concedo
1f77d2ad73 move multiprocessing import into function scope 2023-12-19 15:56:58 +08:00
ebolam
6948da5a0d
Fix for windows model unloading not releasing memory (#569)
* Add in model processes as a separate process so it can be killed when unloading to release memory on windows

* Fix from Henky
2023-12-19 15:55:41 +08:00
Concedo
4c274dc2fd fix tools compilation 2023-12-19 15:53:22 +08:00
arlo-phoenix
a7aee47b98
ggml-cuda: Fix HIP build (#4528)
regression of #4490
Adds defines for two new datatypes
cublasComputeType_t, cudaDataType_t.

Currently using deprecated hipblasDatatype_t since newer ones very recent.
2023-12-18 22:33:45 +01:00
Georgi Gerganov
0e18b2e7d0
llama.swiftui : add tinyllama 1.1B F16 2023-12-18 20:17:43 +02:00
Georgi Gerganov
6ff39b129d
llama.swiftui : add more models 2023-12-18 20:05:12 +02:00
Ebey Abraham
b9e74f9bca
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)
* phi2 implementation

* fix breaking change

* phi-2 : various fixes

* phi-2 : use layer norm eps

* py : whitespaces

* llama : fix meta KV override bug

* convert : phi don't add BOS token

* convert : revert "added_tokens_decoder" change

* phi-2 : scale Q instead of KQ for better precision

* ggml : fix NeoX rope to rotate just first n_dims

* cuda : less diff in the rope_neox kernel

* ggml : add ggml_mul_mat_set_prec

ggml-ci

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision

* cuda : remove oboslete comment

---------

Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-18 19:27:47 +02:00
hankcs
3c04bf6da8
llama : fix try_override for bool_value which always return true (#4519) 2023-12-18 15:14:58 +02:00
Jared Van Bortel
2994f0c5a2
decode : fix logits_valid for legacy API (#4516) 2023-12-17 19:39:02 -05:00
Georgi Gerganov
b1306c4394
readme : update hot topics 2023-12-17 20:16:23 +02:00
Georgi Gerganov
800a489e4a
llama.swiftui : add bench functionality (#4483)
* llama.swiftui : add bench button

* llama.swiftui : initial bench functionality

* force to use n_gpu_layers on simulator

* add download buttons & expose llamaState.loadModel

* update project.pbxproj

* comment #Preview & fix editorconfig check

* gitignore : xcode stuff

* llama.swiftui : UX improvements

* llama.swiftui : avoid data copy via "downloadTask"

* llama.swiftui : remove model from project

* llama : remove "mostly" from model infos

* llama.swiftui : improve bench

---------

Co-authored-by: jhen <developer@jhen.me>
2023-12-17 19:38:41 +02:00
Jared Van Bortel
f7f468a97d
gguf-py : fail fast on nonsensical special token IDs (#4489) 2023-12-17 10:45:46 -05:00
Matheus Gabriel Alves Silva
919c40660f
build : Check the ROCm installation location (#4485)
* build : Check the ROCm installation location

* more generic approach

* fixup! It was returning the path instead of the command output

* fixup! Trailing whitespace
2023-12-17 17:23:33 +02:00
slaren
45668633fd
finetune : keep allocs alive until all allocations are done (#4486) 2023-12-17 16:05:56 +01:00
olexiyb
0ffc92d2d2
server : disable llm logs if SERVER_VERBOSE is off (#3792) 2023-12-17 17:02:16 +02:00
AdithyanI
8edd2b40fd
server : fix grammar being ignored (#4494)
Fix bug in identifying the grammar.
2023-12-17 16:57:56 +02:00
Alexey Parfenov
eb16dae7e7
server : fix possible ambiguity in content type charset (#4501) 2023-12-17 16:56:09 +02:00
mzcu
62bd52b7bf
server : allow requests larger than 8K (#4500) 2023-12-17 16:54:37 +02:00
Bach Le
5daa5f54fd
Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506) 2023-12-17 11:57:33 +01:00
Concedo
ec05230703 updated lite, up ver 2023-12-17 14:38:39 +08:00
Concedo
e8cf7f6ed3 Merge remote-tracking branch 'origin/master' into concedo_experimental 2023-12-17 14:37:14 +08:00
slaren
c6c4fc081c
lora : add support for non-llama models (#3333)
* lora : add support for non-llama models

ggml-ci

* avoid leaking ggml_context on failure
cleanup

ggml-ci

* lora : allow 1d tensors

* lora : include embd and output layers in size calculation

* fix style
2023-12-16 18:58:46 +01:00
Concedo
76a3ba42eb Merge branch 'master' into concedo_experimental
# Conflicts:
#	ggml.c
#	ggml.h
#	requirements.txt
#	tests/test-quantize-perf.cpp
2023-12-16 22:58:53 +08:00
Jared Van Bortel
8a5be3bd58
llama : sanity checks for access to logits (#4274)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 22:16:15 -05:00
ShadovvBeast
88ae8952b6
server : add optional API Key Authentication example (#4441)
* Add API key authentication for enhanced server-client security

* server : to snake_case

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15 13:49:01 +02:00
slaren
ee4725a686
ggml : group mul_mat_id rows by matrix (cpu only) (#4480)
* ggml : group mul_mat_id rows by matrix (cpu only)

* remove mmid parameters from mm forward

* store row groups in wdata and calculate only once in GGML_TASK_INIT

ggml-ci
2023-12-15 12:45:50 +01:00
slaren
6744dbe924
ggml : use ggml_row_size where possible (#4472)
* ggml : use ggml_row_size where possible

ggml-ci

* ggml : move ggml_nbytes_split to ggml-cuda.cu
2023-12-14 20:05:21 +01:00
slaren
cafcd4f895
ggml : remove n_dims from ggml_tensor (#4469)
ggml-ci
2023-12-14 16:52:08 +01:00
wonjun Jang
c50e400163
py : add protobuf dependency (#4466) 2023-12-14 14:44:49 +02:00
LostRuins
20a68a7030
ggml : add ggml_row_size() (fixes llama out of space) (#4461)
* Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values

* do not cast to size_t, instead just use doubles

* ggml : add ggml_row_size(), deprecate ggml_type_sizef()

* ggml : fix row size compute to avoid overflows

* tests : fix sizey -> sizez

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 14:13:33 +02:00
Concedo
7798587990 Workflow Build from experimental branch 2023-12-14 19:17:19 +08:00
Concedo
ae3d829d0c manual workflow for generating builds instead 2023-12-14 19:00:58 +08:00
Concedo
aac7f0b944 Merge branch 'master' into concedo_experimental
# Conflicts:
#	ggml.c
2023-12-14 17:24:42 +08:00
Concedo
f0de4953ae fixed length exceeding max ctx 2023-12-14 16:58:41 +08:00
Concedo
04bd895311 Revert "Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values"
This reverts commit 34b3dac66d.
2023-12-14 16:46:29 +08:00
Concedo
53bbd1ee43 Merge branch 'pr_fix_buf_resize_type' into concedo_experimental 2023-12-14 16:45:18 +08:00
Concedo
05f7db4b29 do not cast to size_t, instead just use doubles 2023-12-14 16:43:34 +08:00