Concedo
fafe999ff9
update lite and colab (+1 squashed commits)
...
Squashed commits:
[06b6ca6d] updated lite and colab
2023-10-22 14:03:18 +08:00
Georgi Gerganov
22c69a2794
batched : add len CLI argument
2023-10-22 08:37:20 +03:00
Concedo
cff75061fe
fixed some old models failing due to tokenizer changes, update lite (+1 squashed commits)
...
Squashed commits:
[9dee81ec] fixed some old models failing due to tokenizer changes, update lite tooltip (+3 squashed commit)
Squashed commit:
[5ab95a79] fixes
[a561d5e2] fixed some old models failing due to tokenizer changes
[95e65daf] lite updates
2023-10-22 11:04:59 +08:00
Concedo
dd1d61ea6b
colab is fixed (+1 squashed commits)
...
Squashed commits:
[0b2a51f3] fix colab (+1 squashed commits)
Squashed commits:
[a6b832d0] fix colab (+1 squashed commits)
Squashed commits:
[8f88f210] updated colab (+1 squashed commits)
Squashed commits:
[75552e0d] try new colab
2023-10-21 10:08:32 +08:00
shibe2
465219b914
CLBlast: Add outer loops over src0 for broadcasting in mulmat
...
Reduce repeated dequantization of the same data.
2023-10-20 22:30:52 +04:00
Georgi Gerganov
d1031cf49c
sampling : refactor init to use llama_sampling_params ( #3696 )
...
* sampling : refactor init to use llama_sampling_params
* llama : combine repetition, frequency and presence penalties in 1 call
* examples : remove embd-input and gptneox-wip
* sampling : rename penalty params + reduce size of "prev" vector
* sampling : add llama_sampling_print helper
* sampling : hide prev behind API and apply #3661
ggml-ci
2023-10-20 21:07:23 +03:00
Concedo
6119a2b5b2
revert lite change
2023-10-20 22:13:56 +08:00
Concedo
6fa681b692
fixed a race condition with SSE streaming
2023-10-20 22:01:09 +08:00
Concedo
5f5d5f1d86
quick fix
2023-10-20 19:43:56 +08:00
Qin Yue Chen
8cf19d60dc
gguf : support big endian platform ( #3552 )
...
* check whether platform is 390x if yes->do not import immintrin.h
* support s390x big endian
* support --bigendian option for s390x
1. verified with baichuan7b-chat with float 16 on s390x
2. verified with baichuan7b-chat
3. verified with chinese-alpaca-2-13b-f16
* update format based on editor-config checker result
* Update convert-baichuan-hf-to-gguf.py
* 1. check in ggml.c if endianess is not match
2. update GGUF version
3. change get_pack_prefix to property
4. update information log
* always use "GGUF" as beginng of GGUF file
* Compare "GGUF" with file header char by char
1. Set GGUF_MAGIC to "GGUF" string instead of int value
2. Compare "GGUF" char by char to ensure its byte order
3. Move bytes swap code from convert.py to gguf.py write_tensor_data
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-20 14:19:40 +03:00
Concedo
012c53367d
minor lite fixes
2023-10-20 18:41:17 +08:00
Georgi Gerganov
a0edf73bda
server : fix uninitialized sampling context ( close #3685 )
2023-10-20 13:06:10 +03:00
Herman Semenov
f439e506e8
ggml : fix rope + llama minor optimizations ( #3560 )
...
* Minor fixes and fixed memleak
* Using const auto references in range-based loop C++17
2023-10-20 13:02:12 +03:00
Concedo
d3c7b7cc71
colab fix
2023-10-20 16:34:45 +08:00
Concedo
d5016fdc8f
updated lite bug
2023-10-20 16:03:06 +08:00
Concedo
ee93213218
updated lite
2023-10-20 15:44:52 +08:00
Concedo
cd3bb3ede2
update colab link
2023-10-20 13:49:34 +08:00
cebtenzzre
e78f3ef24a
convert : restore compat with old Falcon models ( #3680 )
2023-10-20 08:32:08 +03:00
Concedo
8947142c46
updated lite and colab
2023-10-20 11:35:44 +08:00
M. Yusuf Sarıgöz
f3b25e4043
multimodal : add BakLLaVA conversion support ( #3682 )
2023-10-19 19:40:41 +03:00
Concedo
8d31550d48
fix groupchat
2023-10-19 23:40:15 +08:00
Concedo
957e245285
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
2023-10-19 23:32:52 +08:00
kalomaze
ddce116ec9
Fix for Top K disabling ( #480 )
...
* Update gpttype_adapter.cpp
* use n_vocab instead of 32000 for when top k is off
2023-10-19 23:20:44 +08:00
Concedo
8c6001de2a
updated lite
2023-10-19 23:18:14 +08:00
Concedo
fd770bb105
patch
2023-10-19 23:04:26 +08:00
Concedo
4382e51719
updated lite and default horde ctx amount
2023-10-19 22:49:59 +08:00
M. Yusuf Sarıgöz
60abea9798
llava : avoid segfault in case of non-existent mmproj file ( #3674 )
2023-10-19 16:59:11 +03:00
Georgi Gerganov
004797f6ac
readme : update hot topics
2023-10-18 21:44:43 +03:00
Georgi Gerganov
4e82b2ea3f
speculative : bug fixes
2023-10-18 18:49:40 +03:00
Georgi Gerganov
0e89203b51
speculative : add tree-based sampling example ( #3624 )
...
* sampling : one sequence per sampling context
ggml-ci
* speculative : add tree-based sampling support
ggml-ci
* speculative : reuse the n_parallel CLI param
* speculative : refactor sampling
* examples : fix build after sampling refactoring
ggml-ci
* batched : fix n_seq_id
* sampling : fix malloc
ggml-ci
* swift : fix build
ggml-ci
* swift : try to fix build
ggml-ci
* prompts : add assistant.txt
* common : add llama_batch_add() and llama_batch_clear() helpers
* speculative : minor refactor
ggml-ci
* minor : comments + rename
ggml-ci
* speculative : fix off-by-one for n_drafted
* speculative : fix the n_drafted fix + p constants
2023-10-18 16:21:57 +03:00
Jhen-Jie Hong
c67fe68e41
metal : implement q5_0 and q5_1 kernels ( #3648 )
...
* metal : implement dequantize_q5_0
* metal : block_q_n_dot_y for block_q5_0 (broken)
* metal : revert unnecessary change
* metal : implement dequantize_q5_1
* metal : block_q_n_dot_y for q5_1 (broken)
* metal : fix block_q_n_dot_y
* minor : spaces / formatting
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-18 15:21:48 +03:00
shibe2
1117d06607
opencl : fix element-wise multiplication ( #3656 )
2023-10-18 15:09:22 +03:00
Concedo
c1ca1de2ac
fixed support for old falcon models
2023-10-18 17:20:44 +08:00
Concedo
700951dbd4
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
2023-10-18 16:33:09 +08:00
Concedo
53b7cdf8a3
Merge branch 'concedo' into concedo_experimental
2023-10-18 13:51:13 +08:00
slaren
cb33f43a2a
fix embeddings when using CUDA ( #3657 )
2023-10-17 22:24:50 +02:00
Georgi Gerganov
e1675d133c
llama : avoid fprintf in favor of LLAMA_LOG ( #3538 )
2023-10-17 22:34:26 +03:00
BarfingLemurs
8402566a7c
readme : update hot-topics & models, detail windows release in usage ( #3615 )
...
* Update README.md
* Update README.md
* Update README.md
* move "Running on Windows" section below "Prepare data and run"
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 21:13:21 +03:00
LostRuins
6e34d31c44
Update README.md ( #479 )
2023-10-18 01:24:14 +08:00
shibe2
40e5ce054f
CLBlast: Fix temporary buffer size for f16 conversion (wsize)
...
Fix buffer overflow.
Reduce the size to fit just one 2D slice.
Assert sufficient size.
2023-10-17 21:02:30 +04:00
slaren
a5e8c1d8c7
train-text-from-scratch : fix assert failure in ggml-alloc ( #3618 )
2023-10-17 20:00:58 +03:00
Georgi Gerganov
e74c705e15
editorconfig : remove trailing spaces
2023-10-17 19:52:53 +03:00
coezbek
3ad1e3f1a1
server : documentation of JSON return value of /completion endpoint ( #3632 )
...
* Added documentation of JSON return value of /completion endpoint
* Update examples/server/README.md
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 19:51:02 +03:00
Georgi Gerganov
1142013da4
save-load-state : fix example + add ci test ( #3655 )
...
* save-load-state : fix example (close #3606 )
* ci : add test for save-load-state example
ggml-ci
2023-10-17 19:12:46 +03:00
ldwang
5fe268a4d9
readme : add Aquila2 links ( #3610 )
...
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-10-17 18:52:33 +03:00
staviq
1a159553f9
tokenizer : special token handling ( #3538 )
...
* Rewrite special token handling from #1931
* shorten param name, add st verification by type
* use offsets instead of copy by substr
* formatting, remove copying iterator on delete
* llama : normalize code-style
* swift fix
* print pfx/sfx if verb, main: split pfx input sfx
* dont add space when using special tokens
* minor : comment + spacing
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 18:11:01 +03:00
Concedo
6f8fe88f10
fix for lite (+5 squashed commit)
...
Squashed commit:
[f9ce9855] catch more exceptions
[8cdaf149] tweaked horde worker timeouts, updated lite
[619ebef4] fixed abort no response if failed
[a54a66a2] fixed time overflow
[9affdc3e] updated lite
2023-10-17 23:04:32 +08:00
Georgi Gerganov
281ef73c25
k-quants : fix quantization ranges ( #3646 )
2023-10-17 09:19:28 +03:00
Georgi Gerganov
940efa95fe
llava : fix tokenization to not add bos between image embeddings and user prompt ( #3645 )
...
* llava : fix tokenization to not add bos after system prompt
* set seed
---------
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
2023-10-16 23:58:00 +03:00
Concedo
ee0681f0d9
convert some asserts into non-terminating since they are ovezealous
2023-10-15 16:12:20 +08:00