Commit graph

1585 commits

Author SHA1 Message Date
mike dupont
297c26002c pivot to write data 2023-12-01 04:50:19 -05:00
mike dupont
6bd34720a6 dynet 2023-11-30 19:36:24 -05:00
mike dupont
66aecf596c starting to build the graph 2023-11-30 12:44:44 -05:00
mike dupont
9ed1a9fd16 update 2023-11-30 07:28:19 -05:00
mike dupont
cd5e1901d9 removed llama from common 2023-11-30 06:59:10 -05:00
mike dupont
b0024a6be2 linking 2023-11-29 16:21:40 -05:00
mike dupont
46d9bec698 adding in docs and notes
maybe junk but why not?
2023-11-29 15:50:55 -05:00
mike dupont
1807a6e280 now faster and smaller 2023-11-28 21:50:31 -05:00
mike dupont
d1d1cceda7 notebook 2023-11-27 18:53:10 -05:00
mike dupont
164ae84edf formatting with printf 2023-11-27 09:56:23 -05:00
mike dupont
3cd807d000 working better 2023-11-27 09:48:55 -05:00
mike dupont
7ac56bdc62 now crashing 2023-11-27 07:30:23 -05:00
mike dupont
b484674707 wip 2023-11-26 19:31:56 -05:00
mike dupont
f07f3ff61f now sampling lots of data 2023-11-26 16:23:28 -05:00
mike dupont
777871703d typeinfo\n\nnow printing out some type information (ugly) for each field, more work needed 2023-11-26 08:23:15 -05:00
mike dupont
ec2b03e504 now printing tensors 2023-11-25 20:06:00 -05:00
mike dupont
af698c6f27 now printing tokens 2023-11-25 13:02:51 -05:00
mike dupont
90568a6696 now server has it 2023-11-25 11:13:45 -05:00
mike dupont
e8e94f4f69 working 2023-11-25 09:25:19 -05:00
mike dupont
9fb2c73bc0 adding include for refl 2023-11-25 09:11:40 -05:00
mike dupont
bf019ef125 adding print statements to main.
This inserts the print probes at key points
2023-11-25 09:11:20 -05:00
mike dupont
f067d52bea Naming the unnamed ggml structures
Here we add names for the nested structures of ggml
2023-11-25 09:09:00 -05:00
mike dupont
3faef69427 still not working
ready to rebase

working
2023-11-25 07:38:09 -05:00
Georgi Gerganov
04814e718e
readme : update hot topics 2023-11-25 12:02:13 +02:00
Georgi Gerganov
af19d35734
server : OAI API compatibility (#4198)
* Add openai-compatible POST /v1/chat/completions API endpoint to server example

* fix code style

* Update server README.md

* Improve server README.md

* Fix server.cpp code style according to review

* server : some style changes

* server : indentation

* server : enable special tokens during tokenization by default

* server : minor code style

* server : change random string generator

* straightforward /v1/models endpoint

---------

Co-authored-by: kir-gadjello <111190790+kir-gadjello@users.noreply.github.com>
Co-authored-by: Tobi Lütke <tobi@Tobis-MacBook-Pro.local>
2023-11-25 11:29:06 +02:00
slaren
e9c13ff781
llama : set metal log callback correctly (#4204) 2023-11-24 18:10:01 +01:00
slaren
8a052c131e
ggml-cuda : support stablelm rope (#4156)
* ggml-cuda : support stablelm rope

* remove unused freq_base kernel parameter

* add n_dims parameter to llm_build_k_shift, default to n_rot via overload

* llama : fix llm_build_k_shift args

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-24 18:04:31 +01:00
Galunid
189d68446e
convert : fix tensors using grad in some models (#4173) 2023-11-24 15:02:49 +01:00
eastriver
2568a4bf54
main.swift : fix eos checking (#4197)
llama_token_eos(const struct llama_model *) is currently getting struct llama_context type variable context as a parameter.
2023-11-24 11:25:10 +02:00
Aaryaman Vasishta
b35f3d0def
readme : use PATH for Windows ROCm (#4195)
* Update README.md to use PATH for Windows ROCm

* Update README.md

* Update README.md
2023-11-24 09:52:39 +02:00
Haohui Mai
55978ce09b
Fix incorrect format strings and uninitialized variables. (#4133)
* Fix incorrect format strings and uninitialized variables.

* Address comments

* Add the missing include statement
2023-11-23 22:56:53 +01:00
Georgi Gerganov
6b0a7420d0
llama : KV cache view API + better KV cache management (#4170)
* llama : keep track of used KV cells + better KV cache management

* llama : zero KV cache used upon clear

ggml-ci

* llama : allow exporting a view of the KV cache (#4180)

* Allow exporting a view of the KV cache

* Allow dumping the sequences per cell in common

* Track max contiguous cells value and position as well

* Fix max contiguous empty cells index calculation

Make dump functions deal with lengths or sequences counts > 10 better

* Fix off by one error in dump_kv_cache_view

* Add doc comments for KV cache view functions

Eliminate cell sequence struct; use llama_seq_id directly

Minor cleanups

* common : add -dkvc arg for enabling kv cache dumps

---------

Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-11-23 19:07:56 +02:00
Georgi Gerganov
d103d935c0
readme : update hot topics 2023-11-23 13:51:22 +02:00
Daniel Bevenius
9d5949f04b
examples : fix typo in parallel example doc comment (#4181)
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-11-23 13:34:20 +02:00
Georgi Gerganov
ff8238f71d
docs : add llama-star arch idea 2023-11-23 11:35:04 +02:00
Galunid
8e672efe63
stablelm : simplify + speedup generation (#4153) 2023-11-21 16:22:30 +01:00
Galunid
0b871f1a04
finetune - update readme to mention llama support only (#4148) 2023-11-20 19:30:00 +01:00
Aaryaman Vasishta
dfc7cd48b1
readme : update ROCm Windows instructions (#4122)
* Update README.md

* Update README.md

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

---------

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-20 17:02:46 +02:00
Seb C
881800d1f0
main : Add ChatML functionality to main example (#4046)
Co-authored-by: Sebastian Cramond <sebby37@users.noreply.github.com>
2023-11-20 14:56:59 +01:00
Galunid
f23c0359a3
ci : add flake8 to github actions (python linting) (#4129)
Disabled rules:

* E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned

* E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned

* E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned

* E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard

* E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned

* E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned

* E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard

* E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard

* E266 Too many leading '#' for block comment - sometimes used as "section" separator

* E501 Line too long - disabled because it's broken so often it seems like a standard

* E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead)

* E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead)
2023-11-20 11:35:47 +01:00
Branden Butler
40a34fe8d0
speculative : fix prompt tokenization in speculative example (#4025)
* Support special tokens and not adding BOS to prompt in speculative

* Adapt to new should_add_bos function

* Ensure tgt and dft have same add_bos setting
2023-11-20 11:50:04 +02:00
Georgi Gerganov
dae06c06e5
Revert "finetune : add --n-gpu-layers flag info to --help (#4128)"
This reverts commit 05e8301e45.
2023-11-19 19:16:07 +02:00
Clark Saben
05e8301e45
finetune : add --n-gpu-layers flag info to --help (#4128) 2023-11-19 18:56:38 +02:00
SoftwareRenderer
936c79b227
server : relay error messages (#4131) 2023-11-19 18:54:10 +02:00
kchro3
262005ad9d
common : comma should be semicolon (#4137) 2023-11-19 18:52:57 +02:00
Georgi Gerganov
35985acffa
gitignore : tokenize 2023-11-19 18:50:49 +02:00
slaren
e937066420
gguf-py : export chat templates (#4125)
* gguf-py : export chat templates

* llama.cpp : escape new lines in gguf kv info prints

* gguf-py : bump version

* gguf-py : check chat_template type

* gguf-py : initialize chat_template
2023-11-19 11:10:52 +01:00
Kerfuffle
28a2e6e7d4
tokenize example: Respect normal add BOS token behavior (#4126)
Allow building with Makefile
2023-11-18 14:48:17 -07:00
Galunid
0b5c3b0457
scripts : Remove missed baichuan convert script (#4127) 2023-11-18 21:08:33 +01:00
Kerfuffle
2923f17f6f
Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124)
* ggml-cuda.cu: Clean up warnings when compiling with clang

* ggml-cuda.cu: Move static items into anonymous namespace

* ggml-cuda.cu: Fix use of namespace start macro

* Revert "ggml-cuda.cu: Fix use of namespace start macro"

This reverts commit 26c1149026.

* Revert "ggml-cuda.cu: Move static items into anonymous namespace"

This reverts commit e29757e0f7.
2023-11-18 08:11:18 -07:00