tjohnman
368d0c8a9e
Respect the maximum number of tokens in interactive. ( #298 )
...
Co-authored-by: Johnman <johnman@github>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-19 20:31:17 +02:00
slaren
50fae10d03
Add --ignore-eos parameter ( #181 )
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-19 20:22:48 +02:00
Qingyou Meng
084e2f0ec0
interactive mode: print '\n' in sigint_handler, this flush stdout thus ensure color reset. ( #283 )
2023-03-19 20:10:00 +02:00
Erik Scholz
0b366e7357
Command line switch to use F16 for memory_k and memory_v (refactor of #154 ) ( #294 )
...
* Use F16 for memory_k and memory_v
* add command line switch to use f16 instead of f32 for memory k+v
---------
Co-authored-by: Ty Everett <ty@tyweb.us>
2023-03-19 19:57:00 +02:00
Georgi Gerganov
160bfb217d
Update hot topics to mention Alpaca support
2023-03-19 19:51:55 +02:00
Georgi Gerganov
c494ed5b94
Fix off-by-one bug ( #115 )
2023-03-19 19:46:32 +02:00
Georgi Gerganov
c1c7026b47
Fix python stuff ( #109 )
2023-03-19 19:33:18 +02:00
Concedo
474f760411
updated binaries
2023-03-20 01:19:15 +08:00
Concedo
a097703ec4
Merge branch 'master' into concedo
2023-03-20 01:18:42 +08:00
Concedo
29054a2bee
explicit buffer allocation from python
2023-03-20 01:18:34 +08:00
qunash
467b149761
Refactoring convert-pth-to-ggml.py
: more concise and readable ( #109 )
...
* Refactor get_n_parts function to simplify code and improve readability
* Use f-strings instead of concatenation
* Refactoring: more concise and readable
* modularize
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-19 19:17:39 +02:00
Georgi Gerganov
70f01cb863
Drop trailing new line from file prompts ( #80 )
2023-03-19 19:05:04 +02:00
Concedo
356c1b87ba
bugfixes and support for persistent states
2023-03-20 00:59:45 +08:00
Georgi Gerganov
a4e63b73df
Add instruction for using Alpaca ( #240 )
2023-03-19 18:49:50 +02:00
Georgi Gerganov
9e1707218a
Add "--instruct" argument for usage with Alpaca ( #240 )
...
Also start adding prompts in "./prompts"
2023-03-19 18:37:02 +02:00
Georgi Gerganov
22213a17b5
Change RMSNorm eps to 1e-6 ( #173 )
...
I think this is what is used in the Python code
2023-03-19 17:30:00 +02:00
Concedo
f952b7c613
Removed junk, fixed some bugs and support dynamic number of sharded files
...
Merge remote-tracking branch 'origin/master' into concedo
# Conflicts:
# README.md
2023-03-19 11:13:00 +08:00
Ronsor
d7def1a752
Warn user if a context size greater than 2048 tokens is specified ( #274 )
...
LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.
2023-03-18 20:10:47 -04:00
Pavol Rusnak
6f61c18ec9
Fix typo in readme
2023-03-18 23:18:04 +01:00
Pavol Rusnak
1e5a6d088d
Add note about Python 3.11 to readme
2023-03-18 22:25:35 +01:00
Pavol Rusnak
554b541521
Add memory/disk requirements to readme
2023-03-18 22:25:35 +01:00
LostRuins
c21c89edca
Update README.md
2023-03-19 00:50:03 +08:00
LostRuins
42f307ef6a
Update README.md
2023-03-19 00:21:59 +08:00
LostRuins
2b188521a1
Merge branch 'ggerganov:master' into concedo
2023-03-19 00:20:09 +08:00
Concedo
5a6f3b01bd
update readme
2023-03-19 00:19:34 +08:00
Concedo
0dc3ab930c
Updated binaries
2023-03-19 00:09:00 +08:00
Concedo
e3d85aa08b
Merge branch 'master' into concedo
2023-03-19 00:07:32 +08:00
Concedo
2c8f870f53
Created a python bindings for llama.cpp and emulated a simple Kobold HTTP API Endpoint
2023-03-19 00:07:11 +08:00
Alex Nguyen
d3f202d57b
Remove unused code since n_vocab is model.hparams.n_vocab ( #262 )
2023-03-18 13:51:49 +00:00
Justin Suess
e03e359730
fixed warning with std::ignore about unused function result ( #151 )
...
fixed warning with std::ignore about unused function result
2023-03-18 11:44:09 +00:00
Gary Linscott
a81d0c2a17
Fix n^2 loop in tokenization ( #254 )
...
This causes long prompts to parse very slowly.
2023-03-18 11:17:19 +00:00
anzz1
b2de7f18df
CI Improvements ( #230 )
...
* CI Improvements
Manual build feature, autoreleases for Windows
* better CI naming convention
use branch name in releases and tags
2023-03-18 09:27:12 +02:00
Concedo
a19b5a4adc
Merge remote-tracking branch 'origin/master' into concedo
2023-03-18 10:52:54 +08:00
Niklas Korz
a292747893
Nix flake ( #40 )
...
* Nix flake
* Nix: only add Accelerate framework on macOS
* Nix: development shel, direnv and compatibility
* Nix: use python packages supplied by withPackages
* Nix: remove channel compatibility
* Nix: fix ARM neon dotproduct on macOS
---------
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-03-17 23:03:48 +01:00
thement
c9f670a177
Implement non-greedy tokenizer that tries to maximize token lengths ( #242 )
...
* Implement non-greedy tokenizer that tries to maximize token lengths
* Insert single space in front of the prompt
- this is to match original llama tokenizer behavior
---------
Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
2023-03-17 21:05:58 +01:00
Georgi Gerganov
4f54609110
Default to 4 threads ( #243 )
2023-03-17 21:46:46 +02:00
Georgi Gerganov
e81b9c81c1
Update Contributing section
2023-03-17 20:30:04 +02:00
Stephan Walter
367946c668
Don't tell users to use a bad number of threads ( #243 )
...
The readme tells people to use the command line option "-t 8", causing 8
threads to be started. On systems with fewer than 8 cores, this causes a
significant slowdown. Remove the option from the example command lines
and use /proc/cpuinfo on Linux to determine a sensible default.
2023-03-17 19:47:35 +02:00
mmyjona
6b0df5ccf3
add ptread link to fix cmake build under linux ( #114 )
...
* add ptread link to fix cmake build under linux
* add cmake to linux and macos platform
* separate make and cmake workflow
---------
Co-authored-by: Sebastián A <sebastian.aedo29@gmail.com>
2023-03-17 13:38:24 -03:00
Bernat Vadell
2af23d3043
🚀 Dockerize llamacpp ( #132 )
...
* feat: dockerize llamacpp
* feat: split build & runtime stages
* split dockerfile into main & tools
* add quantize into tool docker image
* Update .devops/tools.sh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add docker action pipeline
* change CI to publish at github docker registry
* fix name runs-on macOS-latest is macos-latest (lowercase)
* include docker versioned images
* fix github action docker
* fix docker.yml
* feat: include all-in-one command tool & update readme.md
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-17 10:47:06 +01:00
Matvey Soloviev
904d2a8d6a
Q4_1 quantization ( #193 )
...
* Add AVX2 version of ggml_vec_dot_q4_1
* Small optimisations to q4_1 dot product (@Const-me)
* Rearrange Q4_1 quantization to work for multipart models. (Fix #152 )
* Fix ggml_vec_mad_q4_1 too
* Fix non-vectorised q4_1 vec mul
2023-03-17 06:48:39 +02:00
Concedo
3d4854455c
ban eos token
2023-03-17 11:02:11 +08:00
oKatanaaa
27990d54ed
minor change (+1 squashed commits)
...
Squashed commits:
[7252a2b
] refactor: make weights load faster
2023-03-17 11:02:11 +08:00
Ty Everett
197020deee
Use F16 for memory_k and memory_v
2023-03-17 11:02:10 +08:00
hx507
7b8858415e
Scale buf_size linearly with n_ctx
...
This appear to solve https://github.com/ggerganov/llama.cpp/issues/153
where error of "ggml_new_tensor_impl: not enough space in the context's memory pool" is thrown in interactive mode.
At least the out of memory error come from `ctx0` used here. Although I am not familiar with the code base enough to tell if this is indeed the cause.
2023-03-17 05:11:49 +08:00
Georgi Gerganov
721311070e
Update README.md
2023-03-16 15:00:09 +02:00
Georgi Gerganov
ac15de7895
Expand "Contributing" section
2023-03-16 08:55:13 +02:00
Georgi Gerganov
273abc47ff
Update hot topics - RMSnorm
2023-03-16 07:12:12 +02:00
Nebula
9b4a15b17d
Fix RMS norm in GGML ( #191 )
2023-03-15 19:29:25 -04:00
hoangmit
6eac39ba95
Add RMS norm and use it ( #187 )
...
* add ggml_rms_norm
* update op num
2023-03-16 00:41:38 +02:00