If the end of stream token mark is found, when in interactive mode, ask
for user input instead of exiting the main loop.
In case of running out of token budget, reset it and ask for user input.
With these changes, embd can end up empty and cause a crash in the next
iteration of the loop, so we check for its size as well.
* Nix flake
* Nix: only add Accelerate framework on macOS
* Nix: development shel, direnv and compatibility
* Nix: use python packages supplied by withPackages
* Nix: remove channel compatibility
* Nix: fix ARM neon dotproduct on macOS
---------
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
* Implement non-greedy tokenizer that tries to maximize token lengths
* Insert single space in front of the prompt
- this is to match original llama tokenizer behavior
---------
Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
The readme tells people to use the command line option "-t 8", causing 8
threads to be started. On systems with fewer than 8 cores, this causes a
significant slowdown. Remove the option from the example command lines
and use /proc/cpuinfo on Linux to determine a sensible default.
* add ptread link to fix cmake build under linux
* add cmake to linux and macos platform
* separate make and cmake workflow
---------
Co-authored-by: Sebastián A <sebastian.aedo29@gmail.com>
* Add AVX2 version of ggml_vec_dot_q4_1
* Small optimisations to q4_1 dot product (@Const-me)
* Rearrange Q4_1 quantization to work for multipart models. (Fix#152)
* Fix ggml_vec_mad_q4_1 too
* Fix non-vectorised q4_1 vec mul
* added ctx_size parameter
* added it in more places
* Apply suggestions from code review
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fixed color reset on exit
* added sigint handler for ansi_color_reset
* Update main.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
* Don't use vdotq_s32 if it's not available
`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available.
Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined.
* Update ggml.c
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Add quantize script for batch quantization
* Indentation
* README for new quantize.sh
* Fix script name
* Fix file list on Mac OS
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>