Commit graph

92 commits

Author SHA1 Message Date
Johnman
bb5e8ec79a Never exit the main loop in interactive mode.
If the end of stream token mark is found, when in interactive mode, ask
for user input instead of exiting the main loop.

In case of running out of token budget, reset it and ask for user input.

With these changes, embd can end up empty and cause a crash in the next
iteration of the loop, so we check for its size as well.
2023-03-19 18:01:28 +01:00
Ronsor
d7def1a752
Warn user if a context size greater than 2048 tokens is specified (#274)
LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.
2023-03-18 20:10:47 -04:00
Pavol Rusnak
6f61c18ec9 Fix typo in readme 2023-03-18 23:18:04 +01:00
Pavol Rusnak
1e5a6d088d Add note about Python 3.11 to readme 2023-03-18 22:25:35 +01:00
Pavol Rusnak
554b541521 Add memory/disk requirements to readme 2023-03-18 22:25:35 +01:00
Alex Nguyen
d3f202d57b
Remove unused code since n_vocab is model.hparams.n_vocab (#262) 2023-03-18 13:51:49 +00:00
Justin Suess
e03e359730
fixed warning with std::ignore about unused function result (#151)
fixed warning with std::ignore about unused function result
2023-03-18 11:44:09 +00:00
Gary Linscott
a81d0c2a17
Fix n^2 loop in tokenization (#254)
This causes long prompts to parse very slowly.
2023-03-18 11:17:19 +00:00
anzz1
b2de7f18df
CI Improvements (#230)
* CI Improvements

Manual build feature, autoreleases for Windows

* better CI naming convention

use branch name in releases and tags
2023-03-18 09:27:12 +02:00
Niklas Korz
a292747893
Nix flake (#40)
* Nix flake

* Nix: only add Accelerate framework on macOS

* Nix: development shel, direnv and compatibility

* Nix: use python packages supplied by withPackages

* Nix: remove channel compatibility

* Nix: fix ARM neon dotproduct on macOS

---------

Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-03-17 23:03:48 +01:00
thement
c9f670a177
Implement non-greedy tokenizer that tries to maximize token lengths (#242)
* Implement non-greedy tokenizer that tries to maximize token lengths

* Insert single space in front of the prompt

- this is to match original llama tokenizer behavior

---------

Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
2023-03-17 21:05:58 +01:00
Georgi Gerganov
4f54609110
Default to 4 threads (#243) 2023-03-17 21:46:46 +02:00
Georgi Gerganov
e81b9c81c1
Update Contributing section 2023-03-17 20:30:04 +02:00
Stephan Walter
367946c668
Don't tell users to use a bad number of threads (#243)
The readme tells people to use the command line option "-t 8", causing 8
threads to be started. On systems with fewer than 8 cores, this causes a
significant slowdown. Remove the option from the example command lines
and use /proc/cpuinfo on Linux to determine a sensible default.
2023-03-17 19:47:35 +02:00
mmyjona
6b0df5ccf3
add ptread link to fix cmake build under linux (#114)
* add ptread link to fix cmake build under linux

* add cmake to linux and macos platform

* separate make and cmake workflow

---------

Co-authored-by: Sebastián A <sebastian.aedo29@gmail.com>
2023-03-17 13:38:24 -03:00
Bernat Vadell
2af23d3043
🚀 Dockerize llamacpp (#132)
* feat: dockerize llamacpp

* feat: split build & runtime stages

* split dockerfile into main & tools

* add quantize into tool docker image

* Update .devops/tools.sh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* add docker action pipeline

* change CI to publish at github docker registry

* fix name runs-on macOS-latest is macos-latest (lowercase)

* include docker versioned images

* fix github action docker

* fix docker.yml

* feat: include all-in-one command tool & update readme.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-17 10:47:06 +01:00
Matvey Soloviev
904d2a8d6a
Q4_1 quantization (#193)
* Add AVX2 version of ggml_vec_dot_q4_1

* Small optimisations to q4_1 dot product (@Const-me)

* Rearrange Q4_1 quantization to work for multipart models. (Fix #152)

* Fix ggml_vec_mad_q4_1 too

* Fix non-vectorised q4_1 vec mul
2023-03-17 06:48:39 +02:00
Georgi Gerganov
721311070e
Update README.md 2023-03-16 15:00:09 +02:00
Georgi Gerganov
ac15de7895
Expand "Contributing" section 2023-03-16 08:55:13 +02:00
Georgi Gerganov
273abc47ff
Update hot topics - RMSnorm 2023-03-16 07:12:12 +02:00
Nebula
9b4a15b17d
Fix RMS norm in GGML (#191) 2023-03-15 19:29:25 -04:00
hoangmit
6eac39ba95
Add RMS norm and use it (#187)
* add ggml_rms_norm

* update op num
2023-03-16 00:41:38 +02:00
moritzbrantner
27944c4206
fixed typo (#178) 2023-03-15 22:35:25 +02:00
Rickey Bowers Jr
2d15d6c9a9
add SIGINT support for _WIN32 environments (#120)
* add SIGINT support for _WIN32 environments

* perhaps more consistent
2023-03-15 21:56:24 +02:00
Justin Suess
2d64715ad4
added ctx_size parameter (#148)
* added ctx_size parameter

* added it in more places

* Apply suggestions from code review

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-15 21:42:40 +02:00
Justin Suess
16b2c61a22
fixed color reset on exit (#149)
* fixed color reset on exit

* added sigint handler for ansi_color_reset

* Update main.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-15 21:39:38 +02:00
Musab Gultekin
977295c700
Fix potential licensing issue (#126)
* Update README.md

* Update README.md

remove facebook
2023-03-15 21:39:06 +02:00
Ronsor
956dfda8ad
Use tokenizer.vocab_size() instead of hardcoding 32000 in convert-pth-to-ggml.py (#142)
There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
2023-03-15 21:37:50 +02:00
hoangmit
113e685d18
inline -> static inline for "bytesFromNibbles" (#161)
Without "static" prefix, it fails to compile in clang
2023-03-15 21:05:14 +02:00
Ronsor
47857e564c
Don't use vdotq_s32 if it's not available (#139)
* Don't use vdotq_s32 if it's not available

`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available.

Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined.

* Update ggml.c

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-14 21:34:37 +02:00
Radoslav Gerganov
60f819a2b1
Add section to README on how to run the project on Android (#130) 2023-03-14 15:30:08 +02:00
Georgi Gerganov
97ab2b2578
Add Misc section + update hot topics + minor fixes 2023-03-14 09:43:52 +02:00
Sebastián A
2f700a2738
Add windows to the CI (#98) 2023-03-13 22:29:10 +02:00
Georgi Gerganov
c09a9cfb06
CMake build in Release by default (#75) 2023-03-13 21:22:15 +02:00
Georgi Gerganov
7ec903d3c1
Update contribution section, hot topics, limitations, etc. 2023-03-13 19:21:51 +02:00
Georgi Gerganov
4497ad819c
Print system information 2023-03-13 19:15:08 +02:00
Sebastián A
ed6849cc07
Initial support for CMake (#75) 2023-03-13 19:12:33 +02:00
Thomas Klausner
41be0a3b3d
Add NetBSD support. (#90) 2023-03-13 18:40:54 +02:00
Pavol Rusnak
671d5cac15
Use fprintf for diagnostic output (#48)
keep printf only for printing model output

one can now use ./main ... 2>dev/null to suppress any diagnostic output
2023-03-13 18:39:56 +02:00
Georgi Gerganov
84d9015c4a
Use vdotq_s32 to improve performance (#67)
* 10% performance boost on ARM

* Back to original change
2023-03-13 18:36:44 +02:00
uint256_t
63fd76fbb0
Reduce model loading time (#43)
* Use buffering

* Use vector

* Minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13 18:33:43 +02:00
Val Kharitonov
2a20f48efa
Fix UTF-8 handling (including colors) (#79) 2023-03-13 18:24:18 +02:00
Pavol Rusnak
d1f224712d
Add quantize script for batch quantization (#92)
* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13 18:15:20 +02:00
Georgi Gerganov
1808ee0500
Add initial contribution guidelines 2023-03-13 09:42:26 +02:00
Matvey Soloviev
a169bb889c Gate signal support on being on a unixoid system. (#74) 2023-03-13 04:08:01 +01:00
Matvey Soloviev
460c482540 Fix token count accounting 2023-03-13 01:04:41 +01:00
Georgi Gerganov
c80e2a8f2a
Revert "10% performance boost on ARM"
This reverts commit 113a9e83eb.

There are some reports for illegal instruction.
Moved this stuff to vdotq_s32 branch until resolve
2023-03-13 01:28:08 +02:00
Georgi Gerganov
54a0e66ea0
Check for vdotq_s32 availability 2023-03-13 01:21:03 +02:00
Georgi Gerganov
543c57e991
Ammend to previous commit - forgot to update non-QRDMX branch 2023-03-13 01:05:24 +02:00
Georgi Gerganov
113a9e83eb
10% performance boost on ARM 2023-03-13 00:56:10 +02:00