Commit graph

2338 commits

Author SHA1 Message Date
h-h-h-h
8186242b6d
main : consistent prefix/suffix coloring (#3425)
* Typo

* No `--in-prefix` coloring

The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
2023-10-03 21:16:15 +03:00
Georgi Gerganov
ac2219fef3
llama : fix session saving/loading (#3400)
* llama : fix session saving/loading

* llama : temp fix for clearing "future" tokens from the KV cache

* llama : fix handling of "future" tokens when loading sessions

* llama : fix comments for llama_kv_cache API
2023-10-03 21:04:01 +03:00
Alex Klinkhamer
48be797ffb
llama : expose model's rope_freq_scale in the API (#3418)
so it can be scaled further before creating a context.
2023-10-03 20:09:28 +03:00
Jiahao Li
f56e1baec3
metal : alibi for arbitrary number of heads (#3426) 2023-10-03 19:55:21 +03:00
Eve
017efe899d
cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273)
* fix LLAMA_NATIVE

* syntax

* alternate implementation

* my eyes must be getting bad...

* set cmake LLAMA_NATIVE=ON by default

* march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc

* revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile

* remove -DLLAMA_MPI=ON

---------

Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
2023-10-03 19:53:15 +03:00
Concedo
ea726fcffa cleanup threaded horde submit 2023-10-04 00:34:26 +08:00
Concedo
c249f7dbc5 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.dockerignore
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	tests/CMakeLists.txt
2023-10-03 23:51:30 +08:00
Concedo
0cc740115d updated lite, improve horde worker (+1 squashed commits)
Squashed commits:

[a7c25999] improve horde worker
2023-10-03 23:44:27 +08:00
Concedo
ae8ccdc1be Remove old tkinter gui (+1 squashed commits)
Squashed commits:

[0933c1da] Remove old tkinter gui
2023-10-03 22:05:44 +08:00
Concedo
d10470a1e3 Breaking Change: Remove deprecated commands 2023-10-03 17:16:09 +08:00
goerch
ff5a3f0c09
Work on the BPE tokenizer (#3252)
* Work on the BPE tokenizer

Tokenizer tests work for Falcon-7B

* Try to fix build problem

* Fix debug assertion failure

* Fix MSVC Unicode BOM problem

* Cleanup and an improvement

* Fix compiler warning

* Cleanup

* Test doesn't work over the full range of Unicodes

* Update .gitignore and Makefile

* Another Makefile rule

* Testing Aquila

* Moving byte decoding back to `token_to_piece` ...

... because everyone is using it.

* Guarding some unusable code pathes

* Streamlining code and adding some more assertions

Important change: I'm classifying added tokens as control tokens now for BPE.

* Adding a comment

* Adding another assertion

* Fixed vocabulary guarding assertions

* Fix PR for recent change

* Fix PR for recent change

* Fix for compiler warning

* Fix PR for recent change

* Fix PR for recent change

* Fix PR for recent change

* Fix for compiler warning

* Fixes for more compiler warnings

* Remove unused code

* Fix initialization of static maps

* Add scores and token types back, adapt gptneox

* Update llama.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update unicode.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update unicode.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Ported Starcoder and added some assertions

* Fix coding style

* Apply @jploski 's fix for missing tokens

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-03 09:16:26 +02:00
cebtenzzre
1c84003c08
convert : fix vocab size when not defined in hparams (#3421) 2023-10-02 18:07:24 -04:00
cebtenzzre
e78f0b0d05
cmake : increase minimum version for add_link_options (#3444) 2023-10-02 22:38:43 +03:00
shibe2
665018c749
CLBlast: Add broadcast support for matrix multiplication (#3402)
Broadcast src0 into src1 across dimensions 2 and 3 when needed.
This is required for models that use GQA.
2023-10-02 21:26:15 +02:00
cebtenzzre
29a404a951
gguf : add BERT, MPT, and GPT-J arch info (#3408) 2023-10-02 15:20:28 -04:00
cebtenzzre
0fe321031a
gguf : general usability improvements (#3409) 2023-10-02 14:58:46 -04:00
cebtenzzre
9476b01226
cmake : make CUDA flags more similar to the Makefile (#3420)
* cmake : fix misuse of cxx_flags

* cmake : make CUDA flags more similar to the Makefile

* cmake : fix MSVC build
2023-10-02 16:16:50 +03:00
xaedes
a03ce38455
finetune : fix #3404 (#3437)
the shapes for init model of gqa models was wrong
2023-10-02 16:15:45 +03:00
Concedo
5d3e142145 use_default_badwordsids defaults to false if the parameter is missing 2023-10-02 19:41:07 +08:00
Adrian
a847676984
metal : set log callback before initializing (#3427) 2023-10-02 13:49:59 +03:00
Concedo
1bc01cbcd4 update images (+3 squashed commit)
Squashed commit:

[4d5f17ad] update readme

[cd000215] resize image

[eca91721] try add more media
2023-10-02 17:53:59 +08:00
bandoti
095231dfd3
cmake : fix transient definitions in find pkg (#3411) 2023-10-02 12:51:49 +03:00
Concedo
5b4cef5a60 archived old unused file 2023-10-02 16:57:20 +08:00
Kevin Ji
ea55295a74
docker : ignore Git files (#3314) 2023-10-02 11:53:53 +03:00
vvhg1
c97f01c362
infill : add new example + extend server API (#3296)
* vvhg-code-infill (#1)

* infill in separate example (#2)

* reverted changes to main and added infill example

* cleanup

* naming improvement

* make : add missing blank line

* fix missing semicolon

* brought infill up to current main code

* cleanup

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-02 10:42:02 +03:00
Concedo
23b9d3af49 force oai endpoints to return json 2023-10-02 12:45:14 +08:00
Concedo
0c47e79537 updated the API routing path and fixed a bug with threads 2023-10-02 11:05:19 +08:00
Concedo
dffc6bee74 deprecate some launcher arguments. 2023-10-01 22:30:48 +08:00
Concedo
b49a5bc546 formatting of text 2023-10-01 18:38:32 +08:00
Concedo
bc841ec302 flag to retain grammar, fix makefile (+2 squashed commit)
Squashed commit:

[d5cd3f28] flag to retain grammar, fix makefile

[b3352963] updated lite to v73
2023-10-01 14:39:56 +08:00
Concedo
7ab01ee3c6 Merge branch 'master' into concedo_experimental 2023-10-01 10:22:05 +08:00
Concedo
2fc00fac8c fixed makefile 2023-10-01 10:17:23 +08:00
Concedo
4b45d880ba updated lite 2023-10-01 01:10:30 +08:00
slaren
f5ef5cfb18
ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)
* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16

* rename CC_TURING to CC_VOLTA

* disable fp16 mat mul completely with multi GPU
2023-09-30 18:12:57 +02:00
Concedo
191de1e8a3 allow launching with kcpps files 2023-09-30 19:35:03 +08:00
Concedo
202e28a76a do not offload rope for old cublas (+1 squashed commits)
Squashed commits:

[ca72a66f] fix allocr (+1 squashed commits)

Squashed commits:

[22a0e30e] updated lite
2023-09-30 18:18:36 +08:00
Concedo
5e6450161a functional merge 2023-09-30 12:31:57 +08:00
Concedo
b84e210f0d merge new rope param nonsense 2023-09-30 11:33:30 +08:00
slaren
40e07a60f9
llama.cpp : add documentation about rope_freq_base and scale values (#3401)
* llama.cpp : add documentation about rope_freq_base and scale values

* add notice to hot topics
2023-09-29 18:42:32 +02:00
Georgi Gerganov
bc34dd4f5b
train : fix KQ_pos allocation (#3392)
* train : fix KQ_pos allocation

* make sure KQ_pos is not reallocated in finetune

---------

Co-authored-by: xaedes <xaedes@gmail.com>
2023-09-29 19:05:18 +03:00
Cebtenzzre
2777a84be4
llama : quantize up to 31% faster on Linux and Windows with mmap (#3206)
* llama : enable mmap in quantize on Linux -> 31% faster

* also enable mmap on Windows

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-29 16:48:45 +03:00
BarfingLemurs
0a4a4a0982
readme : update hot topics + model links (#3399) 2023-09-29 15:50:35 +03:00
Andrew Duffy
569550df20
readme : add link to grammars app (#3388)
* Add link to grammars app per @ggernagov suggestion

Adding a sentence in the Grammars section of README to point to grammar app, per https://github.com/ggerganov/llama.cpp/discussions/2494#discussioncomment-7138211

* Update README.md
2023-09-29 14:15:57 +03:00
Jhen-Jie Hong
c71bf2c45c
swift : fix build on xcode 15 (#3387) 2023-09-29 08:25:13 +03:00
Concedo
033e3bf844 prepare to merge parallel 2023-09-29 10:30:45 +08:00
Cebtenzzre
bc39553c90
build : enable more non-default compiler warnings (#3200) 2023-09-28 17:41:44 -04:00
Hua Jiang
0ccfc62a96
ggml_tensor: update the structure comments. (#3283)
* ggml_tensor: update the structure comments.

* remove semicolon

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml.h

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-09-28 23:06:18 +03:00
Qu Zongfu
7f1a0fe709
ggml : release the requested thread pool resource (#3292)
* Release the requested thread pool resource

* Release the requested thread pool resource 2

---------

Co-authored-by: Zongfu ZF3 Qu <quzf3@Lenovo.com>
2023-09-28 22:51:52 +03:00
slaren
16bc66d947
llama.cpp : split llama_context_params into model and context params (#3301)
* llama.cpp : split llama_context_params into model and context params

ggml-ci

* fix metal build

* fix freq_base/scale default to model value

* llama-bench : keep the same model between tests when possible

* move n_threads to llama_context_params, add n_threads_batch

* fix mpi build

* remove kv_size(), cuda scratch fixes

* remove low-vram option

* add n_threads_batch to system info, refactor to get_system_info()

* add documentation about --threads-batch to the READMEs

* llama-bench fix

* main : fix rope freq/scale warning

* llama.cpp : add llama_get_model
common : add llama_tokenize from model

* remove duplicated ctx/model functions

ggml-ci

* cuda : print total VRAM used
2023-09-28 22:42:38 +03:00
Eve
0512d66670
ci : multithreaded builds (#3311)
* mac and linux threads

* windows

* Update build.yml

* Update build.yml

* Update build.yml

* automatically get thread count

* windows syntax

* try to fix freebsd

* Update build.yml

* Update build.yml

* Update build.yml
2023-09-28 22:31:04 +03:00