llama.cpp

Author	SHA1	Message	Date
Concedo	f952b7c613	Removed junk, fixed some bugs and support dynamic number of sharded files Merge remote-tracking branch 'origin/master' into concedo # Conflicts: # README.md	2023-03-19 11:13:00 +08:00
Ronsor	d7def1a752	Warn user if a context size greater than 2048 tokens is specified (#274 ) LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.	2023-03-18 20:10:47 -04:00
Pavol Rusnak	6f61c18ec9	Fix typo in readme	2023-03-18 23:18:04 +01:00
Pavol Rusnak	1e5a6d088d	Add note about Python 3.11 to readme	2023-03-18 22:25:35 +01:00
Pavol Rusnak	554b541521	Add memory/disk requirements to readme	2023-03-18 22:25:35 +01:00
LostRuins	c21c89edca	Update README.md	2023-03-19 00:50:03 +08:00
LostRuins	42f307ef6a	Update README.md	2023-03-19 00:21:59 +08:00
LostRuins	2b188521a1	Merge branch 'ggerganov:master' into concedo	2023-03-19 00:20:09 +08:00
Concedo	5a6f3b01bd	update readme	2023-03-19 00:19:34 +08:00
Concedo	0dc3ab930c	Updated binaries	2023-03-19 00:09:00 +08:00
Concedo	e3d85aa08b	Merge branch 'master' into concedo	2023-03-19 00:07:32 +08:00
Concedo	2c8f870f53	Created a python bindings for llama.cpp and emulated a simple Kobold HTTP API Endpoint	2023-03-19 00:07:11 +08:00
Alex Nguyen	d3f202d57b	Remove unused code since n_vocab is model.hparams.n_vocab (#262 )	2023-03-18 13:51:49 +00:00
Justin Suess	e03e359730	fixed warning with std::ignore about unused function result (#151 ) fixed warning with std::ignore about unused function result	2023-03-18 11:44:09 +00:00
Gary Linscott	a81d0c2a17	Fix n^2 loop in tokenization (#254 ) This causes long prompts to parse very slowly.	2023-03-18 11:17:19 +00:00
anzz1	b2de7f18df	CI Improvements (#230 ) * CI Improvements Manual build feature, autoreleases for Windows * better CI naming convention use branch name in releases and tags	2023-03-18 09:27:12 +02:00
Concedo	a19b5a4adc	Merge remote-tracking branch 'origin/master' into concedo	2023-03-18 10:52:54 +08:00
Niklas Korz	a292747893	Nix flake (#40 ) * Nix flake * Nix: only add Accelerate framework on macOS * Nix: development shel, direnv and compatibility * Nix: use python packages supplied by withPackages * Nix: remove channel compatibility * Nix: fix ARM neon dotproduct on macOS --------- Co-authored-by: Pavol Rusnak <pavol@rusnak.io>	2023-03-17 23:03:48 +01:00
thement	c9f670a177	Implement non-greedy tokenizer that tries to maximize token lengths (#242 ) * Implement non-greedy tokenizer that tries to maximize token lengths * Insert single space in front of the prompt - this is to match original llama tokenizer behavior --------- Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>	2023-03-17 21:05:58 +01:00
Georgi Gerganov	4f54609110	Default to 4 threads (#243 )	2023-03-17 21:46:46 +02:00
Georgi Gerganov	e81b9c81c1	Update Contributing section	2023-03-17 20:30:04 +02:00
Stephan Walter	367946c668	Don't tell users to use a bad number of threads (#243 ) The readme tells people to use the command line option "-t 8", causing 8 threads to be started. On systems with fewer than 8 cores, this causes a significant slowdown. Remove the option from the example command lines and use /proc/cpuinfo on Linux to determine a sensible default.	2023-03-17 19:47:35 +02:00
mmyjona	6b0df5ccf3	add ptread link to fix cmake build under linux (#114 ) * add ptread link to fix cmake build under linux * add cmake to linux and macos platform * separate make and cmake workflow --------- Co-authored-by: Sebastián A <sebastian.aedo29@gmail.com>	2023-03-17 13:38:24 -03:00
Bernat Vadell	2af23d3043	🚀 Dockerize llamacpp (#132 ) * feat: dockerize llamacpp * feat: split build & runtime stages * split dockerfile into main & tools * add quantize into tool docker image * Update .devops/tools.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add docker action pipeline * change CI to publish at github docker registry * fix name runs-on macOS-latest is macos-latest (lowercase) * include docker versioned images * fix github action docker * fix docker.yml * feat: include all-in-one command tool & update readme.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-17 10:47:06 +01:00
Matvey Soloviev	904d2a8d6a	Q4_1 quantization (#193 ) * Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul	2023-03-17 06:48:39 +02:00
Concedo	3d4854455c	ban eos token	2023-03-17 11:02:11 +08:00
oKatanaaa	27990d54ed	minor change (+1 squashed commits) Squashed commits: [`7252a2b`] refactor: make weights load faster	2023-03-17 11:02:11 +08:00
Ty Everett	197020deee	Use F16 for memory_k and memory_v	2023-03-17 11:02:10 +08:00
hx507	7b8858415e	Scale buf_size linearly with n_ctx This appear to solve https://github.com/ggerganov/llama.cpp/issues/153 where error of "ggml_new_tensor_impl: not enough space in the context's memory pool" is thrown in interactive mode. At least the out of memory error come from `ctx0` used here. Although I am not familiar with the code base enough to tell if this is indeed the cause.	2023-03-17 05:11:49 +08:00
Georgi Gerganov	721311070e	Update README.md	2023-03-16 15:00:09 +02:00
Georgi Gerganov	ac15de7895	Expand "Contributing" section	2023-03-16 08:55:13 +02:00
Georgi Gerganov	273abc47ff	Update hot topics - RMSnorm	2023-03-16 07:12:12 +02:00
Nebula	9b4a15b17d	Fix RMS norm in GGML (#191 )	2023-03-15 19:29:25 -04:00
hoangmit	6eac39ba95	Add RMS norm and use it (#187 ) * add ggml_rms_norm * update op num	2023-03-16 00:41:38 +02:00
moritzbrantner	27944c4206	fixed typo (#178 )	2023-03-15 22:35:25 +02:00
Rickey Bowers Jr	2d15d6c9a9	add SIGINT support for _WIN32 environments (#120 ) * add SIGINT support for _WIN32 environments * perhaps more consistent	2023-03-15 21:56:24 +02:00
Justin Suess	2d64715ad4	added ctx_size parameter (#148 ) * added ctx_size parameter * added it in more places * Apply suggestions from code review --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-15 21:42:40 +02:00
Justin Suess	16b2c61a22	fixed color reset on exit (#149 ) * fixed color reset on exit * added sigint handler for ansi_color_reset * Update main.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-15 21:39:38 +02:00
Musab Gultekin	977295c700	Fix potential licensing issue (#126 ) * Update README.md * Update README.md remove facebook	2023-03-15 21:39:06 +02:00
Ronsor	956dfda8ad	Use `tokenizer.vocab_size()` instead of hardcoding 32000 in convert-pth-to-ggml.py (#142 ) There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.	2023-03-15 21:37:50 +02:00
hoangmit	113e685d18	inline -> static inline for "bytesFromNibbles" (#161 ) Without "static" prefix, it fails to compile in clang	2023-03-15 21:05:14 +02:00
Ronsor	47857e564c	Don't use vdotq_s32 if it's not available (#139 ) * Don't use vdotq_s32 if it's not available `dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available. Reintroduces the code removed in `84d9015` if `__ARM_FEATURE_DOTPROD` isn't defined. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-14 21:34:37 +02:00
Radoslav Gerganov	60f819a2b1	Add section to README on how to run the project on Android (#130 )	2023-03-14 15:30:08 +02:00
Georgi Gerganov	97ab2b2578	Add Misc section + update hot topics + minor fixes	2023-03-14 09:43:52 +02:00
Sebastián A	2f700a2738	Add windows to the CI (#98 )	2023-03-13 22:29:10 +02:00
Georgi Gerganov	c09a9cfb06	CMake build in Release by default (#75 )	2023-03-13 21:22:15 +02:00
Georgi Gerganov	7ec903d3c1	Update contribution section, hot topics, limitations, etc.	2023-03-13 19:21:51 +02:00
Georgi Gerganov	4497ad819c	Print system information	2023-03-13 19:15:08 +02:00
Sebastián A	ed6849cc07	Initial support for CMake (#75 )	2023-03-13 19:12:33 +02:00
Thomas Klausner	41be0a3b3d	Add NetBSD support. (#90 )	2023-03-13 18:40:54 +02:00

1 2 3

104 commits