llama.cpp

Author	SHA1	Message	Date
Jhen-Jie Hong	e8b8d32e86	server : fix incorrect num_tokens_predicted (#3480 )	2023-10-05 17:02:55 +03:00
Jhen-Jie Hong	8f3a642ec1	swift : disable ACCELERATE_NEW_LAPACK (#3481 )	2023-10-05 17:00:07 +03:00
Jhen-Jie Hong	0745384449	ci : add swift build via xcodebuild (#3482 )	2023-10-05 16:56:21 +03:00
Concedo	a0c1ba7747	Merge branch 'concedo_experimental' of https://github.com/LostRuins/llamacpp-for-kobold into concedo_experimental # Conflicts: # koboldcpp.py	2023-10-05 21:20:21 +08:00
Concedo	b4b5c35074	add documentation for koboldcpp	2023-10-05 21:17:36 +08:00
teddybear082	f9f4cdf3c0	Implement basic chat/completions openai endpoint (#461 ) * Implement basic chat/completions openai endpoint -Basic support for openai chat/completions endpoint documented at: https://platform.openai.com/docs/api-reference/chat/create -Tested with example code from openai for chat/completions and chat/completions with stream=True parameter found here: https://cookbook.openai.com/examples/how_to_stream_completions. -Tested with Mantella, the skyrim mod that turns all the NPC's into AI chattable characters, which uses openai's acreate / async competions method: https://github.com/art-from-the-machine/Mantella/blob/main/src/output_manager.py -Tested default koboldcpp api behavior with streaming and non-streaming generate endpoints and running GUI and seems to be fine. -Still TODO / evaluate before merging: (1) implement rest of openai chat/completion parameters to the extent possible, mapping to koboldcpp parameters (2) determine if there is a way to use kobold's prompt formats for certain models when translating openai messages format into a prompt string. (Not sure if possible or where these are in the code) (3) have chat/completions responses include the actual local model the user is using instead of just koboldcpp (Not sure if this is possible) Note I am a python noob, so if there is a more elegant way of doing this at minimum hopefully I have done some of the grunt work for you to implement on your own. * Fix typographical error on deleted streaming argument -Mistakenly left code relating to streaming argument from main branch in experimental. * add additional openai chat completions parameters -support stop parameter mapped to koboldai stop_sequence parameter -make default max_length / max_tokens parameter consistent with default 80 token length in generate function -add support for providing name of local model in openai responses * Revert "add additional openai chat completions parameters" This reverts commit 443a6f7ff6346f41c78b0a6ff59c063999542327. * add additional openai chat completions parameters -support stop parameter mapped to koboldai stop_sequence parameter -make default max_length / max_tokens parameter consistent with default 80 token length in generate function -add support for providing name of local model in openai responses * add /n after formatting prompts from openaiformat to conform with alpaca standard used as default in lite.koboldai.net * tidy up and simplify code, do not set globals for streaming * oai endpoints must start with v1 --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2023-10-05 20:13:10 +08:00
Concedo	5beb773320	Merge branch 'master' into concedo_experimental # Conflicts: # README.md # tests/test-grad0.cpp # tests/test-opt.cpp # tests/test-quantize-perf.cpp	2023-10-05 11:44:35 +08:00
Concedo	ce065d39d0	allow drag and drop kcpps file and openwith	2023-10-05 11:38:37 +08:00
Kerfuffle	019ba1dcd0	convert : fix Baichuan2 models by using vocab size in config.json (#3299 ) Use local GGUF package when possible in Baichuan converter	2023-10-04 17:20:28 +03:00
Georgi Gerganov	beabc8cfb0	readme : add project status link	2023-10-04 16:50:44 +03:00
Georgi Gerganov	0d152b37fe	ggml : fix build after #3329	2023-10-04 16:25:41 +03:00
ds5t5	f8c90cdbaa	llm : add Refact model (#3329 ) * add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-04 16:23:39 +03:00
Georgi Gerganov	f93af02488	sync : ggml (conv 1d + 2d updates, UB fixes) (#3468 ) * sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf	2023-10-04 15:29:58 +03:00
Merrick Christensen	f72f8f22c9	finetune : readme fix typo (#3465 ) Fix small typo	2023-10-04 09:33:13 +03:00
Concedo	47f7ebb632	adjust horde worker and debugmode	2023-10-04 14:00:07 +08:00
Concedo	c7660ab6e6	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # flake.nix	2023-10-04 12:54:55 +08:00
Tameem	79f34abddb	ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453 ) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>	2023-10-03 21:38:19 +03:00
h-h-h-h	8186242b6d	main : consistent prefix/suffix coloring (#3425 ) * Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.	2023-10-03 21:16:15 +03:00
Georgi Gerganov	ac2219fef3	llama : fix session saving/loading (#3400 ) * llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API	2023-10-03 21:04:01 +03:00
Alex Klinkhamer	48be797ffb	llama : expose model's rope_freq_scale in the API (#3418 ) so it can be scaled further before creating a context.	2023-10-03 20:09:28 +03:00
Jiahao Li	f56e1baec3	metal : alibi for arbitrary number of heads (#3426 )	2023-10-03 19:55:21 +03:00
Eve	017efe899d	cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273 ) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert `8283237` and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>	2023-10-03 19:53:15 +03:00
Concedo	ea726fcffa	cleanup threaded horde submit	2023-10-04 00:34:26 +08:00
Concedo	c249f7dbc5	Merge branch 'master' into concedo_experimental # Conflicts: # .dockerignore # .gitignore # CMakeLists.txt # Makefile # tests/CMakeLists.txt	2023-10-03 23:51:30 +08:00
Concedo	0cc740115d	updated lite, improve horde worker (+1 squashed commits) Squashed commits: [a7c25999] improve horde worker	2023-10-03 23:44:27 +08:00
Concedo	ae8ccdc1be	Remove old tkinter gui (+1 squashed commits) Squashed commits: [0933c1da] Remove old tkinter gui	2023-10-03 22:05:44 +08:00
Concedo	d10470a1e3	Breaking Change: Remove deprecated commands	2023-10-03 17:16:09 +08:00
goerch	ff5a3f0c09	Work on the BPE tokenizer (#3252 ) * Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-03 09:16:26 +02:00
cebtenzzre	1c84003c08	convert : fix vocab size when not defined in hparams (#3421 )	2023-10-02 18:07:24 -04:00
cebtenzzre	e78f0b0d05	cmake : increase minimum version for add_link_options (#3444 )	2023-10-02 22:38:43 +03:00
shibe2	665018c749	CLBlast: Add broadcast support for matrix multiplication (#3402 ) Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.	2023-10-02 21:26:15 +02:00
cebtenzzre	29a404a951	gguf : add BERT, MPT, and GPT-J arch info (#3408 )	2023-10-02 15:20:28 -04:00
cebtenzzre	0fe321031a	gguf : general usability improvements (#3409 )	2023-10-02 14:58:46 -04:00
cebtenzzre	9476b01226	cmake : make CUDA flags more similar to the Makefile (#3420 ) * cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build	2023-10-02 16:16:50 +03:00
xaedes	a03ce38455	finetune : fix #3404 (#3437 ) the shapes for init model of gqa models was wrong	2023-10-02 16:15:45 +03:00
Concedo	5d3e142145	use_default_badwordsids defaults to false if the parameter is missing	2023-10-02 19:41:07 +08:00
Adrian	a847676984	metal : set log callback before initializing (#3427 )	2023-10-02 13:49:59 +03:00
Concedo	1bc01cbcd4	update images (+3 squashed commit) Squashed commit: [4d5f17ad] update readme [cd000215] resize image [eca91721] try add more media	2023-10-02 17:53:59 +08:00
bandoti	095231dfd3	cmake : fix transient definitions in find pkg (#3411 )	2023-10-02 12:51:49 +03:00
Concedo	5b4cef5a60	archived old unused file	2023-10-02 16:57:20 +08:00
Kevin Ji	ea55295a74	docker : ignore Git files (#3314 )	2023-10-02 11:53:53 +03:00
vvhg1	c97f01c362	infill : add new example + extend server API (#3296 ) * vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-10-02 10:42:02 +03:00
Concedo	23b9d3af49	force oai endpoints to return json	2023-10-02 12:45:14 +08:00
Concedo	0c47e79537	updated the API routing path and fixed a bug with threads	2023-10-02 11:05:19 +08:00
Concedo	dffc6bee74	deprecate some launcher arguments.	2023-10-01 22:30:48 +08:00
Concedo	b49a5bc546	formatting of text	2023-10-01 18:38:32 +08:00
Concedo	bc841ec302	flag to retain grammar, fix makefile (+2 squashed commit) Squashed commit: [d5cd3f28] flag to retain grammar, fix makefile [b3352963] updated lite to v73	2023-10-01 14:39:56 +08:00
Concedo	7ab01ee3c6	Merge branch 'master' into concedo_experimental	2023-10-01 10:22:05 +08:00
Concedo	2fc00fac8c	fixed makefile	2023-10-01 10:17:23 +08:00
Concedo	4b45d880ba	updated lite	2023-10-01 01:10:30 +08:00

1 2 3 4 5 ...

2355 commits