llama.cpp

Author	SHA1	Message	Date
Phillip Kravtsov	5d259d358c	Merge branch 'master' of github.com:ggerganov/llama.cpp into phillip-kravtsov/support-adept-persimmon-8b. ggml-ci	2023-10-05 11:03:30 -07:00
shibe2	e2583cbc29	CLBlast: Fix handling of on-device tensor data Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.	2023-10-05 18:25:23 +04:00
Jhen-Jie Hong	e8b8d32e86	server : fix incorrect num_tokens_predicted (#3480 )	2023-10-05 17:02:55 +03:00
Jhen-Jie Hong	8f3a642ec1	swift : disable ACCELERATE_NEW_LAPACK (#3481 )	2023-10-05 17:00:07 +03:00
Jhen-Jie Hong	0745384449	ci : add swift build via xcodebuild (#3482 )	2023-10-05 16:56:21 +03:00
Kerfuffle	019ba1dcd0	convert : fix Baichuan2 models by using vocab size in config.json (#3299 ) Use local GGUF package when possible in Baichuan converter	2023-10-04 17:20:28 +03:00
Georgi Gerganov	beabc8cfb0	readme : add project status link	2023-10-04 16:50:44 +03:00
Georgi Gerganov	0d152b37fe	ggml : fix build after #3329	2023-10-04 16:25:41 +03:00
ds5t5	f8c90cdbaa	llm : add Refact model (#3329 ) * add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-04 16:23:39 +03:00
Georgi Gerganov	f93af02488	sync : ggml (conv 1d + 2d updates, UB fixes) (#3468 ) * sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf	2023-10-04 15:29:58 +03:00
Merrick Christensen	f72f8f22c9	finetune : readme fix typo (#3465 ) Fix small typo	2023-10-04 09:33:13 +03:00
Phillip Kravtsov	c90ed9f16b	Fix editorconfig formatting	2023-10-03 13:18:23 -07:00
Tameem	79f34abddb	ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453 ) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>	2023-10-03 21:38:19 +03:00
h-h-h-h	8186242b6d	main : consistent prefix/suffix coloring (#3425 ) * Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.	2023-10-03 21:16:15 +03:00
Georgi Gerganov	ac2219fef3	llama : fix session saving/loading (#3400 ) * llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API	2023-10-03 21:04:01 +03:00
Alex Klinkhamer	48be797ffb	llama : expose model's rope_freq_scale in the API (#3418 ) so it can be scaled further before creating a context.	2023-10-03 20:09:28 +03:00
Jiahao Li	f56e1baec3	metal : alibi for arbitrary number of heads (#3426 )	2023-10-03 19:55:21 +03:00
Eve	017efe899d	cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273 ) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert `8283237` and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>	2023-10-03 19:53:15 +03:00
goerch	ff5a3f0c09	Work on the BPE tokenizer (#3252 ) * Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-03 09:16:26 +02:00
cebtenzzre	1c84003c08	convert : fix vocab size when not defined in hparams (#3421 )	2023-10-02 18:07:24 -04:00
Phillip Kravtsov	7a279fe5a8	Remove old script	2023-10-02 14:25:41 -07:00
Phillip Kravtsov	5a0990c1c3	Merge branch 'master' of github.com:ggerganov/llama.cpp into phillip-kravtsov/support-adept-persimmon-8b	2023-10-02 14:00:14 -07:00
cebtenzzre	e78f0b0d05	cmake : increase minimum version for add_link_options (#3444 )	2023-10-02 22:38:43 +03:00
shibe2	665018c749	CLBlast: Add broadcast support for matrix multiplication (#3402 ) Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.	2023-10-02 21:26:15 +02:00
cebtenzzre	29a404a951	gguf : add BERT, MPT, and GPT-J arch info (#3408 )	2023-10-02 15:20:28 -04:00
cebtenzzre	0fe321031a	gguf : general usability improvements (#3409 )	2023-10-02 14:58:46 -04:00
Phillip Kravtsov	422b110841	Minor changes to conversion script	2023-10-02 10:56:31 -07:00
Phillip Kravtsov	cd4d3df820	Formatting changes	2023-10-02 10:26:39 -07:00
Phillip Kravtsov	e6bf87f785	Small changes from review	2023-10-02 10:21:16 -07:00
cebtenzzre	9476b01226	cmake : make CUDA flags more similar to the Makefile (#3420 ) * cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build	2023-10-02 16:16:50 +03:00
xaedes	a03ce38455	finetune : fix #3404 (#3437 ) the shapes for init model of gqa models was wrong	2023-10-02 16:15:45 +03:00
Adrian	a847676984	metal : set log callback before initializing (#3427 )	2023-10-02 13:49:59 +03:00
bandoti	095231dfd3	cmake : fix transient definitions in find pkg (#3411 )	2023-10-02 12:51:49 +03:00
Kevin Ji	ea55295a74	docker : ignore Git files (#3314 )	2023-10-02 11:53:53 +03:00
vvhg1	c97f01c362	infill : add new example + extend server API (#3296 ) * vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-10-02 10:42:02 +03:00
Phillip Kravtsov	2b565916dd	Support sqr and concat on metal, persimmon-8b-q4 runs correctly	2023-09-30 14:11:52 -07:00
Phillip Kravtsov	574a9e12cc	Merge branch 'master' of github.com:ggerganov/llama.cpp into phillip-kravtsov/support-adept-persimmon-8b	2023-09-30 13:24:13 -07:00
slaren	f5ef5cfb18	ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412 ) * ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU	2023-09-30 18:12:57 +02:00
Phillip Kravtsov	d93cf1eab1	Merge branch 'master' of github.com:ggerganov/llama.cpp into phillip-kravtsov/support-adept-persimmon-8b	2023-09-29 15:49:54 -07:00
Phillip Kravtsov	f28f52c6d0	Fix norm eps bug	2023-09-29 15:25:25 -07:00
Phillip Kravtsov	3db04db2b8	update conversion script to directly take adept artifacts rather than .saftensors file	2023-09-29 14:59:51 -07:00
Phillip Kravtsov	ec0ce978ff	Add offload funcs	2023-09-29 14:17:39 -07:00
slaren	40e07a60f9	llama.cpp : add documentation about rope_freq_base and scale values (#3401 ) * llama.cpp : add documentation about rope_freq_base and scale values * add notice to hot topics	2023-09-29 18:42:32 +02:00
Georgi Gerganov	bc34dd4f5b	train : fix KQ_pos allocation (#3392 ) * train : fix KQ_pos allocation * make sure KQ_pos is not reallocated in finetune --------- Co-authored-by: xaedes <xaedes@gmail.com>	2023-09-29 19:05:18 +03:00
Cebtenzzre	2777a84be4	llama : quantize up to 31% faster on Linux and Windows with mmap (#3206 ) * llama : enable mmap in quantize on Linux -> 31% faster * also enable mmap on Windows --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-29 16:48:45 +03:00
BarfingLemurs	0a4a4a0982	readme : update hot topics + model links (#3399 )	2023-09-29 15:50:35 +03:00
Andrew Duffy	569550df20	readme : add link to grammars app (#3388 ) * Add link to grammars app per @ggernagov suggestion Adding a sentence in the Grammars section of README to point to grammar app, per https://github.com/ggerganov/llama.cpp/discussions/2494#discussioncomment-7138211 * Update README.md	2023-09-29 14:15:57 +03:00
Phillip Kravtsov	d904aff040	trivial cleanups	2023-09-28 22:36:23 -07:00
Phillip Kravtsov	7473773c0b	Merge branch 'master' of github.com:ggerganov/llama.cpp into phillip-kravtsov/support-adept-persimmon-8b	2023-09-28 22:36:14 -07:00
Phillip Kravtsov	47dcb9fcf5	remove prints from llama.cpp & fix merge	2023-09-28 22:32:37 -07:00

1 2 3 4 5 ...

1354 commits