llama.cpp

Author	SHA1	Message	Date
Concedo	678f31f2fd	Merge branch 'master' into concedo_experimental # Conflicts: # .gitignore # llama.cpp	2023-10-07 22:00:09 +08:00
Concedo	ca4a8c5dc8	updated lite	2023-10-07 21:50:24 +08:00
Tom C	c47066d833	py : change version of numpy requirement to 1.24.4 (#3515 ) Co-authored-by: Lyjia <me@lyjia.us>	2023-10-07 12:56:15 +03:00
cebtenzzre	f1782c68de	quantize : fail fast on write errors (#3521 )	2023-10-07 11:41:52 +03:00
Jhen-Jie Hong	c26765a0a1	metal : support default.metallib load & reuse code for swift package (#3522 ) * metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT	2023-10-07 11:40:27 +03:00
Phillip Kravtsov	0e797c2fc5	llm : support Adept Persimmon 8B (#3410 ) * Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci	2023-10-07 10:12:43 +03:00
goerch	3a716b4dae	Fix for #3454 (#3455 ) Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion	2023-10-07 06:57:01 +02:00
Concedo	6b282271b1	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-10-07 10:24:34 +08:00
Concedo	07a114de63	force debugmode to be indicated on horde, allow 64k context for gguf	2023-10-07 10:23:33 +08:00
BarfingLemurs	1faaae8c2b	readme : update models, cuda + ppl instructions (#3510 )	2023-10-06 22:13:36 +03:00
Mihai	cb13d73a72	server : docs fix default values and add n_probs (#3506 )	2023-10-06 21:39:33 +03:00
Concedo	d8f7a7077a	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml	2023-10-07 01:36:14 +08:00
Concedo	120695ddf7	add update link	2023-10-07 01:33:18 +08:00
Kerfuffle	9ca79d5cbb	kv cache slot search improvements (#3493 ) * kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.	2023-10-06 10:10:13 -06:00
Concedo	9db21757ef	update docs	2023-10-06 23:40:21 +08:00
Concedo	2a36c85558	abort has multiuser support via genkey too	2023-10-06 23:27:00 +08:00
Concedo	84eeecb889	updated lite	2023-10-06 23:15:11 +08:00
Georgi Gerganov	0c731ca403	prompts : fix editorconfig checks after #3416	2023-10-06 16:36:32 +03:00
pudepiedj	a8777ad84e	parallel : add option to load external prompt file (#3416 ) * Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-06 16:16:38 +03:00
Jhen-Jie Hong	97af49fa39	server : reuse llama_sample_token common util (#3494 ) * server : reuse llama_sample_token common function * common : use n_probs for temperature sampling	2023-10-06 15:44:24 +03:00
l3utterfly	16820a5a0d	llama : correct hparams comparison (#3446 ) * fixed floating point comparison issues * updated implementation for hparam comparison to handle inf and NaN * fixed code review comments * minor simplification * rename is_float_eq -> is_float_close --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-10-06 13:47:59 +03:00
Concedo	1d1232ffbc	show horde job count	2023-10-06 18:42:59 +08:00
Jhen-Jie Hong	04b2f4386e	ci : fix xcodebuild destinations (#3491 ) * ci : fix xcodebuild destinations * ci : add .swift to paths	2023-10-06 13:36:43 +03:00
Concedo	b5cd935cdb	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # ggml-opencl.cpp	2023-10-06 17:58:08 +08:00
Concedo	9d2a25b12b	updated lite, fixed fancy quotes	2023-10-06 15:44:37 +08:00
Concedo	efd0567f10	Merge branch 'concedo' into concedo_experimental # Conflicts: # koboldcpp.py	2023-10-06 11:22:01 +08:00
Concedo	b8f0576c7b	updated docs	2023-10-06 11:19:04 +08:00
grawity	9d0dd7ab11	avoid leaving a zombie process for --onready (#462 ) Popen() needs to be used with 'with' or have .wait() called or be destroyed, otherwise there is a zombie child that sticks around until the object is GC'd.	2023-10-06 11:06:37 +08:00
cebtenzzre	48edda30ee	convert : update Falcon script for new HF config (#3448 ) Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>	2023-10-05 15:00:34 -04:00
Kenvix ⭐	45eba9369f	build : use std::make_tuple() for compatibility with older GCC versions (#3488 )	2023-10-05 20:16:39 +03:00
staviq	acec9eaaa9	common : process escape sequences in reverse prompts (#3461 )	2023-10-05 19:17:29 +03:00
shibe2	e2583cbc29	CLBlast: Fix handling of on-device tensor data Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.	2023-10-05 18:25:23 +04:00
Concedo	da8a09ba10	use filename as default model name	2023-10-05 22:24:20 +08:00
Jhen-Jie Hong	e8b8d32e86	server : fix incorrect num_tokens_predicted (#3480 )	2023-10-05 17:02:55 +03:00
Jhen-Jie Hong	8f3a642ec1	swift : disable ACCELERATE_NEW_LAPACK (#3481 )	2023-10-05 17:00:07 +03:00
Jhen-Jie Hong	0745384449	ci : add swift build via xcodebuild (#3482 )	2023-10-05 16:56:21 +03:00
Concedo	a0c1ba7747	Merge branch 'concedo_experimental' of https://github.com/LostRuins/llamacpp-for-kobold into concedo_experimental # Conflicts: # koboldcpp.py	2023-10-05 21:20:21 +08:00
Concedo	b4b5c35074	add documentation for koboldcpp	2023-10-05 21:17:36 +08:00
teddybear082	f9f4cdf3c0	Implement basic chat/completions openai endpoint (#461 ) * Implement basic chat/completions openai endpoint -Basic support for openai chat/completions endpoint documented at: https://platform.openai.com/docs/api-reference/chat/create -Tested with example code from openai for chat/completions and chat/completions with stream=True parameter found here: https://cookbook.openai.com/examples/how_to_stream_completions. -Tested with Mantella, the skyrim mod that turns all the NPC's into AI chattable characters, which uses openai's acreate / async competions method: https://github.com/art-from-the-machine/Mantella/blob/main/src/output_manager.py -Tested default koboldcpp api behavior with streaming and non-streaming generate endpoints and running GUI and seems to be fine. -Still TODO / evaluate before merging: (1) implement rest of openai chat/completion parameters to the extent possible, mapping to koboldcpp parameters (2) determine if there is a way to use kobold's prompt formats for certain models when translating openai messages format into a prompt string. (Not sure if possible or where these are in the code) (3) have chat/completions responses include the actual local model the user is using instead of just koboldcpp (Not sure if this is possible) Note I am a python noob, so if there is a more elegant way of doing this at minimum hopefully I have done some of the grunt work for you to implement on your own. * Fix typographical error on deleted streaming argument -Mistakenly left code relating to streaming argument from main branch in experimental. * add additional openai chat completions parameters -support stop parameter mapped to koboldai stop_sequence parameter -make default max_length / max_tokens parameter consistent with default 80 token length in generate function -add support for providing name of local model in openai responses * Revert "add additional openai chat completions parameters" This reverts commit 443a6f7ff6346f41c78b0a6ff59c063999542327. * add additional openai chat completions parameters -support stop parameter mapped to koboldai stop_sequence parameter -make default max_length / max_tokens parameter consistent with default 80 token length in generate function -add support for providing name of local model in openai responses * add /n after formatting prompts from openaiformat to conform with alpaca standard used as default in lite.koboldai.net * tidy up and simplify code, do not set globals for streaming * oai endpoints must start with v1 --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2023-10-05 20:13:10 +08:00
Concedo	5beb773320	Merge branch 'master' into concedo_experimental # Conflicts: # README.md # tests/test-grad0.cpp # tests/test-opt.cpp # tests/test-quantize-perf.cpp	2023-10-05 11:44:35 +08:00
Concedo	ce065d39d0	allow drag and drop kcpps file and openwith	2023-10-05 11:38:37 +08:00
Kerfuffle	019ba1dcd0	convert : fix Baichuan2 models by using vocab size in config.json (#3299 ) Use local GGUF package when possible in Baichuan converter	2023-10-04 17:20:28 +03:00
Georgi Gerganov	beabc8cfb0	readme : add project status link	2023-10-04 16:50:44 +03:00
Georgi Gerganov	0d152b37fe	ggml : fix build after #3329	2023-10-04 16:25:41 +03:00
ds5t5	f8c90cdbaa	llm : add Refact model (#3329 ) * add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-04 16:23:39 +03:00
Georgi Gerganov	f93af02488	sync : ggml (conv 1d + 2d updates, UB fixes) (#3468 ) * sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf	2023-10-04 15:29:58 +03:00
Merrick Christensen	f72f8f22c9	finetune : readme fix typo (#3465 ) Fix small typo	2023-10-04 09:33:13 +03:00
Concedo	47f7ebb632	adjust horde worker and debugmode	2023-10-04 14:00:07 +08:00
Concedo	c7660ab6e6	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # flake.nix	2023-10-04 12:54:55 +08:00
Tameem	79f34abddb	ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453 ) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>	2023-10-03 21:38:19 +03:00

1 2 3 4 5 ...

2338 commits