llama.cpp

Author	SHA1	Message	Date
pudepiedj	84b43bb718	Merge branch 'load-parallel-prompt-file' of https://github.com/pudepiedj/llama.cpp into load-parallel-prompt-file	2023-10-06 09:54:38 +01:00
pudepiedj	8b7d88afff	Reinstate original jeopardy.sh	2023-10-06 09:54:32 +01:00
pudepiedj	1c4c8cd801	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-06 09:51:26 +01:00
cebtenzzre	48edda30ee	convert : update Falcon script for new HF config (#3448 ) Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>	2023-10-05 15:00:34 -04:00
Kenvix ⭐	45eba9369f	build : use std::make_tuple() for compatibility with older GCC versions (#3488 )	2023-10-05 20:16:39 +03:00
staviq	acec9eaaa9	common : process escape sequences in reverse prompts (#3461 )	2023-10-05 19:17:29 +03:00
pudepiedj	db44b469d3	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-05 15:43:26 +01:00
pudepiedj	325fcb75ad	Remove jeopardy results file	2023-10-05 15:41:02 +01:00
shibe2	e2583cbc29	CLBlast: Fix handling of on-device tensor data Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.	2023-10-05 18:25:23 +04:00
Jhen-Jie Hong	e8b8d32e86	server : fix incorrect num_tokens_predicted (#3480 )	2023-10-05 17:02:55 +03:00
Jhen-Jie Hong	8f3a642ec1	swift : disable ACCELERATE_NEW_LAPACK (#3481 )	2023-10-05 17:00:07 +03:00
Jhen-Jie Hong	0745384449	ci : add swift build via xcodebuild (#3482 )	2023-10-05 16:56:21 +03:00
pudepiedj	e9aa6e9a08	Yet more LLM-questions	2023-10-05 11:17:28 +01:00
pudepiedj	8394762237	Merge branch 'load-parallel-prompt-file' of https://github.com/pudepiedj/llama.cpp into load-parallel-prompt-file	2023-10-04 15:54:38 +01:00
pudepiedj	b505cfb3bc	Update LLM-questions.txt	2023-10-04 15:54:32 +01:00
pudepiedj	f630096c35	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-04 15:51:41 +01:00
Kerfuffle	019ba1dcd0	convert : fix Baichuan2 models by using vocab size in config.json (#3299 ) Use local GGUF package when possible in Baichuan converter	2023-10-04 17:20:28 +03:00
Georgi Gerganov	beabc8cfb0	readme : add project status link	2023-10-04 16:50:44 +03:00
Georgi Gerganov	0d152b37fe	ggml : fix build after #3329	2023-10-04 16:25:41 +03:00
ds5t5	f8c90cdbaa	llm : add Refact model (#3329 ) * add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-04 16:23:39 +03:00
Georgi Gerganov	f93af02488	sync : ggml (conv 1d + 2d updates, UB fixes) (#3468 ) * sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf	2023-10-04 15:29:58 +03:00
pudepiedj	000c4681e4	More LLM questions	2023-10-04 12:38:50 +01:00
pudepiedj	a02e042eb9	Corrected typo	2023-10-04 11:01:57 +01:00
pudepiedj	f75fe38770	Improved reporting and new question files.	2023-10-04 10:56:30 +01:00
pudepiedj	b805ec2899	Merge branch 'load-parallel-prompt-file' of https://github.com/pudepiedj/llama.cpp into load-parallel-prompt-file	2023-10-04 08:33:17 +01:00
pudepiedj	2f0181bd29	Changed .gitignore	2023-10-04 08:32:54 +01:00
pudepiedj	bbfec95e3c	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-04 08:28:27 +01:00
pudepiedj	53663759b1	Remove cmake_all.sh	2023-10-04 08:27:10 +01:00
Merrick Christensen	f72f8f22c9	finetune : readme fix typo (#3465 ) Fix small typo	2023-10-04 09:33:13 +03:00
pudepiedj	18b342dbbb	remove cmake_all.sh	2023-10-03 20:50:02 +01:00
pudepiedj	028681835b	Remove trailing whitespace	2023-10-03 20:40:30 +01:00
Tameem	79f34abddb	ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453 ) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>	2023-10-03 21:38:19 +03:00
h-h-h-h	8186242b6d	main : consistent prefix/suffix coloring (#3425 ) * Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.	2023-10-03 21:16:15 +03:00
Georgi Gerganov	ac2219fef3	llama : fix session saving/loading (#3400 ) * llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API	2023-10-03 21:04:01 +03:00
pudepiedj	bf8c4dfde8	Merge branch 'Update-load-parallel-prompt-file' of https://github.com/pudepiedj/llama.cpp into Update-load-parallel-prompt-file	2023-10-03 18:13:24 +01:00
pudepiedj	fc1ba35b09	Merge remote-tracking branch 'origin/load-parallel-prompt-file' into Update-load-parallel-prompt-file with requested changes	2023-10-03 18:12:21 +01:00
Alex Klinkhamer	48be797ffb	llama : expose model's rope_freq_scale in the API (#3418 ) so it can be scaled further before creating a context.	2023-10-03 20:09:28 +03:00
Jiahao Li	f56e1baec3	metal : alibi for arbitrary number of heads (#3426 )	2023-10-03 19:55:21 +03:00
Eve	017efe899d	cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273 ) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert `8283237` and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>	2023-10-03 19:53:15 +03:00
pudepiedj	af2fbb82e1	Merge branch 'ggerganov:master' into Update-load-parallel-prompt-file	2023-10-03 16:03:05 +01:00
pudepiedj	ce10861214	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-03 16:02:44 +01:00
pudepiedj	b343833720	Final revision	2023-10-03 14:45:31 +01:00
pudepiedj	2e3dad3a9c	Interim commit	2023-10-03 10:10:00 +01:00
pudepiedj	51196a44dc	Interim commit	2023-10-03 09:46:53 +01:00
goerch	ff5a3f0c09	Work on the BPE tokenizer (#3252 ) * Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-03 09:16:26 +02:00
cebtenzzre	1c84003c08	convert : fix vocab size when not defined in hparams (#3421 )	2023-10-02 18:07:24 -04:00
pudepiedj	e293ebd68e	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-02 21:14:15 +01:00
cebtenzzre	e78f0b0d05	cmake : increase minimum version for add_link_options (#3444 )	2023-10-02 22:38:43 +03:00
shibe2	665018c749	CLBlast: Add broadcast support for matrix multiplication (#3402 ) Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.	2023-10-02 21:26:15 +02:00
cebtenzzre	29a404a951	gguf : add BERT, MPT, and GPT-J arch info (#3408 )	2023-10-02 15:20:28 -04:00

1 2 3 4 5 ...

1364 commits