llama.cpp

Author	SHA1	Message	Date
tjohnman	368d0c8a9e	Respect the maximum number of tokens in interactive. (#298 ) Co-authored-by: Johnman <johnman@github> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-19 20:31:17 +02:00
slaren	50fae10d03	Add --ignore-eos parameter (#181 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-19 20:22:48 +02:00
Qingyou Meng	084e2f0ec0	interactive mode: print '\n' in sigint_handler, this flush stdout thus ensure color reset. (#283 )	2023-03-19 20:10:00 +02:00
Erik Scholz	0b366e7357	Command line switch to use F16 for memory_k and memory_v (refactor of #154 ) (#294 ) * Use F16 for memory_k and memory_v * add command line switch to use f16 instead of f32 for memory k+v --------- Co-authored-by: Ty Everett <ty@tyweb.us>	2023-03-19 19:57:00 +02:00
Georgi Gerganov	160bfb217d	Update hot topics to mention Alpaca support	2023-03-19 19:51:55 +02:00
Georgi Gerganov	c494ed5b94	Fix off-by-one bug (#115 )	2023-03-19 19:46:32 +02:00
Georgi Gerganov	c1c7026b47	Fix python stuff (#109 )	2023-03-19 19:33:18 +02:00
Concedo	474f760411	updated binaries	2023-03-20 01:19:15 +08:00
Concedo	a097703ec4	Merge branch 'master' into concedo	2023-03-20 01:18:42 +08:00
Concedo	29054a2bee	explicit buffer allocation from python	2023-03-20 01:18:34 +08:00
qunash	467b149761	Refactoring `convert-pth-to-ggml.py`: more concise and readable (#109 ) * Refactor get_n_parts function to simplify code and improve readability * Use f-strings instead of concatenation * Refactoring: more concise and readable * modularize --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-19 19:17:39 +02:00
Georgi Gerganov	70f01cb863	Drop trailing new line from file prompts (#80 )	2023-03-19 19:05:04 +02:00
Concedo	356c1b87ba	bugfixes and support for persistent states	2023-03-20 00:59:45 +08:00
Georgi Gerganov	a4e63b73df	Add instruction for using Alpaca (#240 )	2023-03-19 18:49:50 +02:00
Georgi Gerganov	9e1707218a	Add "--instruct" argument for usage with Alpaca (#240 ) Also start adding prompts in "./prompts"	2023-03-19 18:37:02 +02:00
Georgi Gerganov	22213a17b5	Change RMSNorm eps to 1e-6 (#173 ) I think this is what is used in the Python code	2023-03-19 17:30:00 +02:00
Concedo	f952b7c613	Removed junk, fixed some bugs and support dynamic number of sharded files Merge remote-tracking branch 'origin/master' into concedo # Conflicts: # README.md	2023-03-19 11:13:00 +08:00
Ronsor	d7def1a752	Warn user if a context size greater than 2048 tokens is specified (#274 ) LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.	2023-03-18 20:10:47 -04:00
Pavol Rusnak	6f61c18ec9	Fix typo in readme	2023-03-18 23:18:04 +01:00
Pavol Rusnak	1e5a6d088d	Add note about Python 3.11 to readme	2023-03-18 22:25:35 +01:00
Pavol Rusnak	554b541521	Add memory/disk requirements to readme	2023-03-18 22:25:35 +01:00
LostRuins	c21c89edca	Update README.md	2023-03-19 00:50:03 +08:00
LostRuins	42f307ef6a	Update README.md	2023-03-19 00:21:59 +08:00
LostRuins	2b188521a1	Merge branch 'ggerganov:master' into concedo	2023-03-19 00:20:09 +08:00
Concedo	5a6f3b01bd	update readme	2023-03-19 00:19:34 +08:00
Concedo	0dc3ab930c	Updated binaries	2023-03-19 00:09:00 +08:00
Concedo	e3d85aa08b	Merge branch 'master' into concedo	2023-03-19 00:07:32 +08:00
Concedo	2c8f870f53	Created a python bindings for llama.cpp and emulated a simple Kobold HTTP API Endpoint	2023-03-19 00:07:11 +08:00
Alex Nguyen	d3f202d57b	Remove unused code since n_vocab is model.hparams.n_vocab (#262 )	2023-03-18 13:51:49 +00:00
Justin Suess	e03e359730	fixed warning with std::ignore about unused function result (#151 ) fixed warning with std::ignore about unused function result	2023-03-18 11:44:09 +00:00
Gary Linscott	a81d0c2a17	Fix n^2 loop in tokenization (#254 ) This causes long prompts to parse very slowly.	2023-03-18 11:17:19 +00:00
anzz1	b2de7f18df	CI Improvements (#230 ) * CI Improvements Manual build feature, autoreleases for Windows * better CI naming convention use branch name in releases and tags	2023-03-18 09:27:12 +02:00
Concedo	a19b5a4adc	Merge remote-tracking branch 'origin/master' into concedo	2023-03-18 10:52:54 +08:00
Niklas Korz	a292747893	Nix flake (#40 ) * Nix flake * Nix: only add Accelerate framework on macOS * Nix: development shel, direnv and compatibility * Nix: use python packages supplied by withPackages * Nix: remove channel compatibility * Nix: fix ARM neon dotproduct on macOS --------- Co-authored-by: Pavol Rusnak <pavol@rusnak.io>	2023-03-17 23:03:48 +01:00
thement	c9f670a177	Implement non-greedy tokenizer that tries to maximize token lengths (#242 ) * Implement non-greedy tokenizer that tries to maximize token lengths * Insert single space in front of the prompt - this is to match original llama tokenizer behavior --------- Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>	2023-03-17 21:05:58 +01:00
Georgi Gerganov	4f54609110	Default to 4 threads (#243 )	2023-03-17 21:46:46 +02:00
Georgi Gerganov	e81b9c81c1	Update Contributing section	2023-03-17 20:30:04 +02:00
Stephan Walter	367946c668	Don't tell users to use a bad number of threads (#243 ) The readme tells people to use the command line option "-t 8", causing 8 threads to be started. On systems with fewer than 8 cores, this causes a significant slowdown. Remove the option from the example command lines and use /proc/cpuinfo on Linux to determine a sensible default.	2023-03-17 19:47:35 +02:00
mmyjona	6b0df5ccf3	add ptread link to fix cmake build under linux (#114 ) * add ptread link to fix cmake build under linux * add cmake to linux and macos platform * separate make and cmake workflow --------- Co-authored-by: Sebastián A <sebastian.aedo29@gmail.com>	2023-03-17 13:38:24 -03:00
Bernat Vadell	2af23d3043	🚀 Dockerize llamacpp (#132 ) * feat: dockerize llamacpp * feat: split build & runtime stages * split dockerfile into main & tools * add quantize into tool docker image * Update .devops/tools.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add docker action pipeline * change CI to publish at github docker registry * fix name runs-on macOS-latest is macos-latest (lowercase) * include docker versioned images * fix github action docker * fix docker.yml * feat: include all-in-one command tool & update readme.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-17 10:47:06 +01:00
Matvey Soloviev	904d2a8d6a	Q4_1 quantization (#193 ) * Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul	2023-03-17 06:48:39 +02:00
Concedo	3d4854455c	ban eos token	2023-03-17 11:02:11 +08:00
oKatanaaa	27990d54ed	minor change (+1 squashed commits) Squashed commits: [`7252a2b`] refactor: make weights load faster	2023-03-17 11:02:11 +08:00
Ty Everett	197020deee	Use F16 for memory_k and memory_v	2023-03-17 11:02:10 +08:00
hx507	7b8858415e	Scale buf_size linearly with n_ctx This appear to solve https://github.com/ggerganov/llama.cpp/issues/153 where error of "ggml_new_tensor_impl: not enough space in the context's memory pool" is thrown in interactive mode. At least the out of memory error come from `ctx0` used here. Although I am not familiar with the code base enough to tell if this is indeed the cause.	2023-03-17 05:11:49 +08:00
Georgi Gerganov	721311070e	Update README.md	2023-03-16 15:00:09 +02:00
Georgi Gerganov	ac15de7895	Expand "Contributing" section	2023-03-16 08:55:13 +02:00
Georgi Gerganov	273abc47ff	Update hot topics - RMSnorm	2023-03-16 07:12:12 +02:00
Nebula	9b4a15b17d	Fix RMS norm in GGML (#191 )	2023-03-15 19:29:25 -04:00
hoangmit	6eac39ba95	Add RMS norm and use it (#187 ) * add ggml_rms_norm * update op num	2023-03-16 00:41:38 +02:00

... 2 3 4 5 6

270 commits