llama.cpp

Author	SHA1	Message	Date
mike dupont	7025832f4d	now working v1	2023-12-10 17:54:51 -05:00
mike dupont	ad7fc5450e	remove bad cast	2023-12-10 17:04:51 -05:00
mike dupont	9592bc5676	fixing first bug	2023-12-10 15:31:08 -05:00
mike dupont	fa49f64e28	cmake not working yet	2023-12-10 15:07:58 -05:00
mike dupont	d239bc94d2	makefile now building and exec crashing	2023-12-10 15:07:39 -05:00
mike dupont	d739470198	now linking and crashing	2023-12-10 15:07:04 -05:00
mike dupont	da5bbd73a8	linker error	2023-12-09 17:46:25 -05:00
mike dupont	1f3a501e9b	working	2023-12-09 11:32:20 -05:00
mike dupont	1f522319c5	wip	2023-12-09 11:29:16 -05:00
mike dupont	da1d8459be	dont forget the code	2023-12-09 09:47:55 -05:00
mike dupont	34cf9d6fb6	not executing	2023-12-09 09:34:22 -05:00
mike dupont	11937662ef	for metacall add the cmake	2023-12-09 08:58:10 -05:00
mike dupont	9ec7eb1c3b	nodejs	2023-12-08 13:44:39 -05:00
mike dupont	593985dad3	update	2023-12-08 13:04:24 -05:00
mike dupont	09a48ec2ae	linking, loading, segfaulting	2023-12-08 13:01:59 -05:00
mike dupont	ac69c93ca9	linking	2023-12-07 18:40:52 -05:00
mike dupont	d6244ff813	adding missing files	2023-12-06 10:05:12 -05:00
mike dupont	7eb27b3443	now it is letting the llm control the output	2023-12-06 10:03:45 -05:00
mike dupont	7972929a3b	now getting response from python	2023-12-06 09:37:04 -05:00
mike dupont	1c861466dc	working calling python	2023-12-06 07:26:30 -05:00
mike dupont	2f3ea04010	starting boost	2023-12-06 07:20:19 -05:00
mike dupont	5ea96cc710	rebased	2023-12-05 11:06:00 -05:00
mike dupont	2b6ff2ec54	rebased and trimmed down now compiling again, now it might even run adding debug notes update diff improvement update update working simplify running moving to using refl-cpp for llama as well now working compiling and running debugging adding the print module with type information the first type names are being printed adding binding generator bindings refl now working, not on pointers but on the types update now has a model adding new header for llama internal demonstrate crash remove crash now starting to refactor the code now the debug print is working	2023-12-05 09:07:36 -05:00
MaggotHATE	52c8bc3cf3	sampling : custom samplers order (#4285 ) * Samplers sequence order w parameter * Cleaned commented code * Fixed formatting * Rewrote with unordered_map * Revert and rewrite, too many problems and safeguards would be needed * Fixed code style * Code style fixes according to review * More readable samplers input string, fixed help * Style fix in sampler_queue * Formatting fixes * Fixing whitespaces	2023-12-05 12:05:51 +02:00
kchro3	e4b76bbe31	swift : revert compiler checks for swift package (#4332 )	2023-12-05 09:29:46 +02:00
Daniel Bevenius	23b5e12eb5	simple : update error message for KV cache check (#4324 ) This commit updates the error message that is printed when the KV cache is not big enough to hold all the prompt and generated tokens. Specifically it removes the reference to n_parallel and replaces it with n_len. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2023-12-04 18:04:21 +02:00
Miwa / Ensan	d208995c6d	swift : fix concatenation method to avoid invalid UTF8 stringfication (#4325 )	2023-12-04 18:03:49 +02:00
Miwa / Ensan	5c9f90cba1	swift : fix prompt tokenization logic (#4321 )	2023-12-04 15:43:45 +02:00
Ikko Eltociear Ashimine	4fa44e84ad	grammar-parser : fix typo (#4318 ) preceeding -> preceding	2023-12-04 09:57:35 +02:00
Georgi Gerganov	fbbc42827b	ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (#4308 ) * ggml : fix soft max out-of-bounds access ggml-ci * ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() ggml-ci	2023-12-03 15:56:35 +02:00
Georgi Gerganov	adf3de4f69	ggml : fix soft max out-of-bounds access (#4307 ) ggml-ci	2023-12-03 15:56:22 +02:00
Ed Lee	33e171d1e9	server : fix OpenAI API `stop` field to be optional (#4299 ) (cherry picked from commit Mozilla-Ocho/llamafile@e8c92bcb84)	2023-12-03 11:10:43 +02:00
Rickard Edén	6949b50df5	py : add grammar to oai like api (#4294 )	2023-12-03 11:03:25 +02:00
Georgi Gerganov	d7b800b8bc	llama : pad KV cache size (#4280 ) * llama : pad KV cache size to 32 * metal : try to improve batched decoding	2023-12-03 10:58:16 +02:00
Georgi Gerganov	5a7d3125e7	llama : avoid using "optional" keyword (#4283 )	2023-12-01 20:39:12 +02:00
Georgi Gerganov	d5a1cbde60	llama : support optional tensors (#4283 )	2023-12-01 20:35:47 +02:00
Miwa / Ensan	b220222a64	swift : fix token_to_piece implementation (#4278 ) * Fix token_to_piece implementation in Swift * Fix errors	2023-12-01 20:19:45 +02:00
Jared Van Bortel	511f52c334	build : enable libstdc++ assertions for debug builds (#4275 )	2023-12-01 20:18:35 +02:00
CausalLM	03562f3a86	llama : support attention bias on LLaMA architecture (#4283 ) * Support attention_bias on LLaMA architecture QKVO bias, should fix InternLM (https://github.com/ggerganov/llama.cpp/issues/3133) and works for LLaMAfied Qwen models (https://github.com/ggerganov/llama.cpp/pull/3743#issuecomment-1825923608). * check existence of qkvo bias while loading llama models Tested on LLaMA2, CUDA and CPU. * Update llama.cpp	2023-12-01 20:17:06 +02:00
Shijie	37c746d687	llama : add Qwen support (#4281 ) * enable qwen to llama.cpp * llama : do not GPU split bias tensors --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-01 20:16:31 +02:00
Georgi Gerganov	880f57973b	llama : fix integer overflow during quantization (#4284 ) happens with multi-threaded quantization of Qwen-72B ggml-ci	2023-12-01 18:42:11 +02:00
Daniel Bevenius	8d6d9f033b	py : add requirements file for convert-hf-to-gguf.py (#4277 ) This commit adds a requirements file for the convert-hf-to-gguf.py script, and also add the torch and transformers packages to it. The motivation for this is that currently running convert-hf-to-gguf.py will produce the following error: ```console $ python3 -m venv venv $ source venv/bin/activate (venv) $ pip install -r requirements.txt Collecting numpy==1.24.4 Collecting sentencepiece==0.1.98 Collecting gguf>=0.1.0 Installing collected packages: sentencepiece, numpy, gguf Successfully installed gguf-0.5.1 numpy-1.24.4 sentencepiece-0.1.98 (venv) $ python convert-hf-to-gguf.py --help Traceback (most recent call last): File "llama.cpp/convert-hf-to-gguf.py", line 16, in <module> import torch ModuleNotFoundError: No module named 'torch' ``` With this commit, and using requirements-hf-to-gguf.txt instead of requirements.txt, the script can be run and shows the help output. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2023-12-01 11:41:56 +02:00
Georgi Gerganov	ef47ec18da	ggml : add ggml_soft_max_ext (#4256 ) * metal : implement soft_max_ext * cuda : implement soft_max_ext * ggml : implement soft_max_ext (CPU) * batched-bench : print threads ggml-ci * metal : simplify soft_max encoding ggml-ci * cuda : use 512 threads for soft_max instead of 32 * ggml : update soft max cpu * cuda : do warp-based block reduce * cuda : increase max block size to 1024 * cuda : fix warp reduction initialization of shared mem * metal : warp-based reduction for soft max kernel * metal : warp-based reduce for rms_norm * metal : simplify soft max kernel ggml-ci * alloc : fix build with debug	2023-12-01 10:51:24 +02:00
Ziad Ben Hadj-Alouane	1d144112c0	server : add --log-disable to disable logging to file (#4260 ) * * add --log-disable to disable logging to file in the server example * * typo fix	2023-12-01 00:25:49 +02:00
Ziad Ben Hadj-Alouane	f43f09366d	server : add single-client multi-prompt support (#4232 ) * * add multiprompt support * * cleanup * * more cleanup * * remove atomicity of id_gen, and change lock_guard to unique_lock on completion requests * * remove all references to mutex_multitasks * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * * change to set --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-12-01 00:25:04 +02:00
WillCorticesAI	d2809a3ba2	make : fix Apple clang determination bug (#4272 ) Co-authored-by: Will Findley <findley@gmail.com>	2023-12-01 00:23:44 +02:00
Jared Van Bortel	15f5d96037	build : fix build info generation and cleanup Makefile (#3920 ) * cmake : fix joining of REAL_GIT_DIR * fix includes with help from include-what-you-use * make : remove unneeded deps and add test-rope target * fix C includes in C++ source files * Revert "fix includes with help from include-what-you-use" This reverts commit `635e9fadfd`.	2023-12-01 00:23:08 +02:00
John	33c9892af5	llava : ShareGPT4V compatibility (vision encoder only loading) (#4172 ) * ShareGPT4 compatibility (vision encoder only loading) Load only a CLIP vision encoder (as supplied by ShareGPT finetunes) Corrects the argument parsing for --img_mean and --img_std (which were previously not parsed but attempted to access) Defines defaults for img_mean and img_std which are equal to the llava 1.5 CLIP encoder, so you do not have to provide them * Update convert-image-encoder-to-gguf.py	2023-11-30 23:11:14 +01:00
Andrew Godfrey	8efa0f6ebe	main : pass LOG_TEE callback to llama.cpp log (#4033 ) * main : Call llama_log_set to use LOG_TEE * tabs to spaces	2023-11-30 23:56:19 +02:00
vodkaslime	524907aa76	readme : fix (#4135 ) * fix: readme * chore: resolve comments * chore: resolve comments	2023-11-30 23:49:21 +02:00

1 2 3 4 5 ...

1635 commits