Commit graph

370 commits

Author SHA1 Message Date
Georgi Gerganov
ad2727d091
Merge branch 'master' into speculative-tree
ggml-ci
2023-10-18 10:50:58 +03:00
Georgi Gerganov
e1675d133c
llama : avoid fprintf in favor of LLAMA_LOG (#3538) 2023-10-17 22:34:26 +03:00
slaren
a5e8c1d8c7
train-text-from-scratch : fix assert failure in ggml-alloc (#3618) 2023-10-17 20:00:58 +03:00
Georgi Gerganov
e74c705e15
editorconfig : remove trailing spaces 2023-10-17 19:52:53 +03:00
coezbek
3ad1e3f1a1
server : documentation of JSON return value of /completion endpoint (#3632)
* Added documentation of JSON return value of /completion endpoint

* Update examples/server/README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 19:51:02 +03:00
Georgi Gerganov
bd9451ca2a
Merge branch 'master' into speculative-tree 2023-10-17 19:31:40 +03:00
Georgi Gerganov
1142013da4
save-load-state : fix example + add ci test (#3655)
* save-load-state : fix example (close #3606)

* ci : add test for save-load-state example

ggml-ci
2023-10-17 19:12:46 +03:00
staviq
1a159553f9
tokenizer : special token handling (#3538)
* Rewrite special token handling from #1931

* shorten param name, add st verification by type

* use offsets instead of copy by substr

* formatting, remove copying iterator on delete

* llama : normalize code-style

* swift fix

* print pfx/sfx if verb, main: split pfx input sfx

* dont add space when using special tokens

* minor : comment + spacing

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 18:11:01 +03:00
Georgi Gerganov
010c52ec59
Merge branch 'master' into speculative-tree 2023-10-17 17:24:11 +03:00
Georgi Gerganov
e6dd81f0bc
speculative : fix the n_drafted fix + p constants 2023-10-17 17:04:31 +03:00
Georgi Gerganov
f07cd35da4
speculative : fix off-by-one for n_drafted 2023-10-17 11:40:26 +03:00
Georgi Gerganov
940efa95fe
llava : fix tokenization to not add bos between image embeddings and user prompt (#3645)
* llava : fix tokenization to not add bos after system prompt

* set seed

---------

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
2023-10-16 23:58:00 +03:00
Georgi Gerganov
373d782d42
minor : comments + rename
ggml-ci
2023-10-16 18:17:31 +03:00
Georgi Gerganov
1c626e2fe1
speculative : minor refactor
ggml-ci
2023-10-16 12:47:37 +03:00
Georgi Gerganov
360a333145
common : add llama_batch_add() and llama_batch_clear() helpers 2023-10-16 12:41:33 +03:00
Georgi Gerganov
5b34bfa2e6
swift : try to fix build
ggml-ci
2023-10-16 00:39:57 +03:00
Georgi Gerganov
b8acb6c9b8
swift : fix build
ggml-ci
2023-10-16 00:26:16 +03:00
Georgi Gerganov
0d96efabb5
batched : fix n_seq_id 2023-10-16 00:03:52 +03:00
Georgi Gerganov
7e48e21b1f
examples : fix build after sampling refactoring
ggml-ci
2023-10-15 23:36:31 +03:00
Georgi Gerganov
4a7f43f28c
speculative : refactor sampling 2023-10-15 22:53:54 +03:00
Georgi Gerganov
32a67cbd16
speculative : reuse the n_parallel CLI param 2023-10-15 19:35:59 +03:00
Georgi Gerganov
4de5a2d473
speculative : add tree-based sampling support
ggml-ci
2023-10-14 17:54:02 +03:00
M. Yusuf Sarıgöz
11dc1091f6
Honor -ngl option for Cuda offloading in llava (#3621) 2023-10-14 04:52:44 -06:00
slaren
424b6381c4
ggml : add context enumeration functions (#3605)
finetune : fix assert failure in ggml-alloc
2023-10-13 12:23:10 +02:00
Georgi Gerganov
5261aee8d8
sampling : one sequence per sampling context
ggml-ci
2023-10-12 20:36:44 +03:00
M. Yusuf Sarıgöz
370359e5ba
examples: support LLaVA v1.5 (multimodal model) (#3436)
* WIP: start implementing LLaVA

* rm scratch buf for now, will revert after cleanup

* LLaVA image encoder is working. will combine with llama

* Add llava inference code, but it's buggy. debugging

* LLaVA is working e2e, needs to optimize memory allocation + cleanup

* Use ggml_allocr + rm unnecessary code

* fix: crlf -> lf

* fix: new line at EoF

* fix: trailing whitespace

* Add readme

* Update readme

* Some cleanup

* Are you happy editorconfig?

* rm unused batch image preprocessing

* rm unused import

* fix: rm designated initializers

* introduce pad-to-square mode for non-square images

* are you happy editorconfig?

* gitignore /llava

* Handle cases where image file does not exist

* add llava target to Makefile

* add support for 13b model variant

* Maybe seed is unlucky?

* Check if apples are compared to apples

* are you happy editorconfig?

* Use temperature = 0.1 by default

* command line: use gpt_params_parse()

* minor

* handle default n_predict

* fix typo

* llava : code formatting, rename files, fix compile warnings

* do not use Wno-cast-qual for MSVC

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-12 18:23:18 +03:00
Aarni Koskela
b016596d90
server : add completion mode (no chat) (#3582) 2023-10-12 09:51:53 +03:00
Georgi Gerganov
57dd55e2c7
server : fix kv cache management (#3588) 2023-10-12 09:29:04 +03:00
Georgi Gerganov
b8fe4b5cc9
main : fix session loading bug (#3400) 2023-10-11 23:55:41 +03:00
Michael Coppola
a8bdd65525
server : add parameter -tb N, --threads-batch N (#3584)
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2023-10-11 22:42:22 +03:00
Kerfuffle
70c29da118
common : fix mirostat state when using multiple sequences (#3543)
* Fix mirostat state when using multiple sequences

* Fix mirostat by completely refactoring sampling!

* Try to fix zig build.

* Export function to fetch/create default sampler states

Code formatting cleanups and add some comments

Silence a warning about id not being used when logging is disabled

* Apply some renaming suggestions.

Fix comments that were out of sync with the pull.

* Use more consistant naming convention for sampling contexts
2023-10-11 22:35:46 +03:00
Georgi Gerganov
8c70a5ff25
batched : add bench tool (#3545)
* batched : add bench tool

* batched : minor fix table

* batched-bench : add readme + n_kv_max is now configurable

* batched-bench : init warm-up batch

* batched-bench : pass custom set of PP, TG and PL

* batched-bench : add mmq CLI arg
2023-10-11 21:25:33 +03:00
Zane Shannon
24ba3d829e
examples : add batched.swift + improve CI for swift (#3562) 2023-10-11 06:14:05 -05:00
vvhg1
11ea5c7d96
infill. : fix tokenization (#3508)
* infill tokens correction

* serverinfill tokens correction

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* only rm when params.escape, rm space if possible which is added back or rm added space token

* only rm when params.escape, rm space if possible which is added back or rm added space token

* Revert "only rm when params.escape, rm space if possible which is added back or rm added space token"

This reverts commit 63ba0b621f.

* fix interactive prompt escaping and fix server infill leading space handling

* rm unnecessary bool check
2023-10-10 10:31:21 +03:00
Georgi Gerganov
fcca0a7004
refact : fix convert script + zero out KV cache to avoid nans (#3523)
* refact : fix convert script + zero out KV cache to avoid nans

* ggml : silu(-inf) should never happen

* metal : assert various kernel requirements
2023-10-09 14:32:17 +03:00
Ryder Wishart
8e6716a102
api_like_OAI.py : compat with Microsoft Guidance (#2746)
Check for None in addition to empty string check in all request params

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08 13:55:58 +03:00
arcrank
9c38d181d4
api_like_OAI.py : simplify function (#2796)
Simplify function
2023-10-08 13:52:57 +03:00
Mihai
cb13d73a72
server : docs fix default values and add n_probs (#3506) 2023-10-06 21:39:33 +03:00
pudepiedj
a8777ad84e
parallel : add option to load external prompt file (#3416)
* Enable external file and add datestamp

* Add name of external file at end

* Upload ToK2024

* Delete ToK2024.txt

* Experiments with jeopardy

* Move ParallelQuestions to /proimpts and rename

* Interim commit

* Interim commit

* Final revision

* Remove trailing whitespace

* remove cmake_all.sh

* Remove cmake_all.sh

* Changed .gitignore

* Improved reporting and new question files.

* Corrected typo

* More LLM questions

* Update LLM-questions.txt

* Yet more LLM-questions

* Remove jeopardy results file

* Reinstate original jeopardy.sh

* Update examples/parallel/parallel.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-06 16:16:38 +03:00
Jhen-Jie Hong
97af49fa39
server : reuse llama_sample_token common util (#3494)
* server : reuse llama_sample_token common function

* common : use n_probs for temperature sampling
2023-10-06 15:44:24 +03:00
Kenvix ⭐
45eba9369f
build : use std::make_tuple() for compatibility with older GCC versions (#3488) 2023-10-05 20:16:39 +03:00
Jhen-Jie Hong
e8b8d32e86
server : fix incorrect num_tokens_predicted (#3480) 2023-10-05 17:02:55 +03:00
Merrick Christensen
f72f8f22c9
finetune : readme fix typo (#3465)
Fix small typo
2023-10-04 09:33:13 +03:00
h-h-h-h
8186242b6d
main : consistent prefix/suffix coloring (#3425)
* Typo

* No `--in-prefix` coloring

The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
2023-10-03 21:16:15 +03:00
Georgi Gerganov
ac2219fef3
llama : fix session saving/loading (#3400)
* llama : fix session saving/loading

* llama : temp fix for clearing "future" tokens from the KV cache

* llama : fix handling of "future" tokens when loading sessions

* llama : fix comments for llama_kv_cache API
2023-10-03 21:04:01 +03:00
cebtenzzre
0fe321031a
gguf : general usability improvements (#3409) 2023-10-02 14:58:46 -04:00
xaedes
a03ce38455
finetune : fix #3404 (#3437)
the shapes for init model of gqa models was wrong
2023-10-02 16:15:45 +03:00
bandoti
095231dfd3
cmake : fix transient definitions in find pkg (#3411) 2023-10-02 12:51:49 +03:00
vvhg1
c97f01c362
infill : add new example + extend server API (#3296)
* vvhg-code-infill (#1)

* infill in separate example (#2)

* reverted changes to main and added infill example

* cleanup

* naming improvement

* make : add missing blank line

* fix missing semicolon

* brought infill up to current main code

* cleanup

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-02 10:42:02 +03:00
Georgi Gerganov
bc34dd4f5b
train : fix KQ_pos allocation (#3392)
* train : fix KQ_pos allocation

* make sure KQ_pos is not reallocated in finetune

---------

Co-authored-by: xaedes <xaedes@gmail.com>
2023-09-29 19:05:18 +03:00