llama.cpp

Author	SHA1	Message	Date
pudepiedj	f630096c35	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-04 15:51:41 +01:00
Kerfuffle	019ba1dcd0	convert : fix Baichuan2 models by using vocab size in config.json (#3299 ) Use local GGUF package when possible in Baichuan converter	2023-10-04 17:20:28 +03:00
Georgi Gerganov	beabc8cfb0	readme : add project status link	2023-10-04 16:50:44 +03:00
Georgi Gerganov	0d152b37fe	ggml : fix build after #3329	2023-10-04 16:25:41 +03:00
ds5t5	f8c90cdbaa	llm : add Refact model (#3329 ) * add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-04 16:23:39 +03:00
Georgi Gerganov	f93af02488	sync : ggml (conv 1d + 2d updates, UB fixes) (#3468 ) * sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf	2023-10-04 15:29:58 +03:00
pudepiedj	000c4681e4	More LLM questions	2023-10-04 12:38:50 +01:00
pudepiedj	a02e042eb9	Corrected typo	2023-10-04 11:01:57 +01:00
pudepiedj	f75fe38770	Improved reporting and new question files.	2023-10-04 10:56:30 +01:00
pudepiedj	b805ec2899	Merge branch 'load-parallel-prompt-file' of https://github.com/pudepiedj/llama.cpp into load-parallel-prompt-file	2023-10-04 08:33:17 +01:00
pudepiedj	2f0181bd29	Changed .gitignore	2023-10-04 08:32:54 +01:00
pudepiedj	bbfec95e3c	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-04 08:28:27 +01:00
pudepiedj	53663759b1	Remove cmake_all.sh	2023-10-04 08:27:10 +01:00
Merrick Christensen	f72f8f22c9	finetune : readme fix typo (#3465 ) Fix small typo	2023-10-04 09:33:13 +03:00
pudepiedj	18b342dbbb	remove cmake_all.sh	2023-10-03 20:50:02 +01:00
pudepiedj	028681835b	Remove trailing whitespace	2023-10-03 20:40:30 +01:00
Tameem	79f34abddb	ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453 ) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>	2023-10-03 21:38:19 +03:00
h-h-h-h	8186242b6d	main : consistent prefix/suffix coloring (#3425 ) * Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.	2023-10-03 21:16:15 +03:00
Georgi Gerganov	ac2219fef3	llama : fix session saving/loading (#3400 ) * llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API	2023-10-03 21:04:01 +03:00
pudepiedj	bf8c4dfde8	Merge branch 'Update-load-parallel-prompt-file' of https://github.com/pudepiedj/llama.cpp into Update-load-parallel-prompt-file	2023-10-03 18:13:24 +01:00
pudepiedj	fc1ba35b09	Merge remote-tracking branch 'origin/load-parallel-prompt-file' into Update-load-parallel-prompt-file with requested changes	2023-10-03 18:12:21 +01:00
Alex Klinkhamer	48be797ffb	llama : expose model's rope_freq_scale in the API (#3418 ) so it can be scaled further before creating a context.	2023-10-03 20:09:28 +03:00
Jiahao Li	f56e1baec3	metal : alibi for arbitrary number of heads (#3426 )	2023-10-03 19:55:21 +03:00
Eve	017efe899d	cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273 ) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert `8283237` and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>	2023-10-03 19:53:15 +03:00
pudepiedj	af2fbb82e1	Merge branch 'ggerganov:master' into Update-load-parallel-prompt-file	2023-10-03 16:03:05 +01:00
pudepiedj	ce10861214	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-03 16:02:44 +01:00
pudepiedj	b343833720	Final revision	2023-10-03 14:45:31 +01:00
pudepiedj	2e3dad3a9c	Interim commit	2023-10-03 10:10:00 +01:00
pudepiedj	51196a44dc	Interim commit	2023-10-03 09:46:53 +01:00
goerch	ff5a3f0c09	Work on the BPE tokenizer (#3252 ) * Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-03 09:16:26 +02:00
cebtenzzre	1c84003c08	convert : fix vocab size when not defined in hparams (#3421 )	2023-10-02 18:07:24 -04:00
pudepiedj	e293ebd68e	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-02 21:14:15 +01:00
cebtenzzre	e78f0b0d05	cmake : increase minimum version for add_link_options (#3444 )	2023-10-02 22:38:43 +03:00
shibe2	665018c749	CLBlast: Add broadcast support for matrix multiplication (#3402 ) Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.	2023-10-02 21:26:15 +02:00
cebtenzzre	29a404a951	gguf : add BERT, MPT, and GPT-J arch info (#3408 )	2023-10-02 15:20:28 -04:00
cebtenzzre	0fe321031a	gguf : general usability improvements (#3409 )	2023-10-02 14:58:46 -04:00
cebtenzzre	9476b01226	cmake : make CUDA flags more similar to the Makefile (#3420 ) * cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build	2023-10-02 16:16:50 +03:00
xaedes	a03ce38455	finetune : fix #3404 (#3437 ) the shapes for init model of gqa models was wrong	2023-10-02 16:15:45 +03:00
pudepiedj	d673691619	Move ParallelQuestions to /proimpts and rename	2023-10-02 13:15:15 +01:00
pudepiedj	2fd71e27d8	Merge branch 'load-parallel-prompt-file' of https://github.com/pudepiedj/llama.cpp into load-parallel-prompt-file	2023-10-02 12:35:43 +01:00
pudepiedj	3e41cbabd1	Experiments with jeopardy	2023-10-02 12:33:05 +01:00
pudepiedj	3c2d677abd	Merge branch 'ggerganov:master' into load-parallel-prompt-file	2023-10-02 12:30:24 +01:00
Adrian	a847676984	metal : set log callback before initializing (#3427 )	2023-10-02 13:49:59 +03:00
bandoti	095231dfd3	cmake : fix transient definitions in find pkg (#3411 )	2023-10-02 12:51:49 +03:00
Kevin Ji	ea55295a74	docker : ignore Git files (#3314 )	2023-10-02 11:53:53 +03:00
vvhg1	c97f01c362	infill : add new example + extend server API (#3296 ) * vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-10-02 10:42:02 +03:00
pudepiedj	9d6533baed	Delete ToK2024.txt	2023-09-30 22:42:02 +01:00
pudepiedj	0dde56c15d	Upload ToK2024	2023-09-30 18:44:49 +01:00
slaren	f5ef5cfb18	ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412 ) * ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU	2023-09-30 18:12:57 +02:00
pudepiedj	f71068fd98	Add name of external file at end	2023-09-30 17:07:31 +01:00

1 2 3 4 5 ...

1349 commits