llama.cpp

Author	SHA1	Message	Date
h-h-h-h	8186242b6d	main : consistent prefix/suffix coloring (#3425 ) * Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.	2023-10-03 21:16:15 +03:00
Georgi Gerganov	ac2219fef3	llama : fix session saving/loading (#3400 ) * llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API	2023-10-03 21:04:01 +03:00
Alex Klinkhamer	48be797ffb	llama : expose model's rope_freq_scale in the API (#3418 ) so it can be scaled further before creating a context.	2023-10-03 20:09:28 +03:00
Jiahao Li	f56e1baec3	metal : alibi for arbitrary number of heads (#3426 )	2023-10-03 19:55:21 +03:00
Eve	017efe899d	cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273 ) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert `8283237` and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>	2023-10-03 19:53:15 +03:00
Concedo	ea726fcffa	cleanup threaded horde submit	2023-10-04 00:34:26 +08:00
Concedo	c249f7dbc5	Merge branch 'master' into concedo_experimental # Conflicts: # .dockerignore # .gitignore # CMakeLists.txt # Makefile # tests/CMakeLists.txt	2023-10-03 23:51:30 +08:00
Concedo	0cc740115d	updated lite, improve horde worker (+1 squashed commits) Squashed commits: [a7c25999] improve horde worker	2023-10-03 23:44:27 +08:00
Concedo	ae8ccdc1be	Remove old tkinter gui (+1 squashed commits) Squashed commits: [0933c1da] Remove old tkinter gui	2023-10-03 22:05:44 +08:00
Concedo	d10470a1e3	Breaking Change: Remove deprecated commands	2023-10-03 17:16:09 +08:00
goerch	ff5a3f0c09	Work on the BPE tokenizer (#3252 ) * Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-03 09:16:26 +02:00
cebtenzzre	1c84003c08	convert : fix vocab size when not defined in hparams (#3421 )	2023-10-02 18:07:24 -04:00
cebtenzzre	e78f0b0d05	cmake : increase minimum version for add_link_options (#3444 )	2023-10-02 22:38:43 +03:00
shibe2	665018c749	CLBlast: Add broadcast support for matrix multiplication (#3402 ) Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.	2023-10-02 21:26:15 +02:00
cebtenzzre	29a404a951	gguf : add BERT, MPT, and GPT-J arch info (#3408 )	2023-10-02 15:20:28 -04:00
cebtenzzre	0fe321031a	gguf : general usability improvements (#3409 )	2023-10-02 14:58:46 -04:00
cebtenzzre	9476b01226	cmake : make CUDA flags more similar to the Makefile (#3420 ) * cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build	2023-10-02 16:16:50 +03:00
xaedes	a03ce38455	finetune : fix #3404 (#3437 ) the shapes for init model of gqa models was wrong	2023-10-02 16:15:45 +03:00
Concedo	5d3e142145	use_default_badwordsids defaults to false if the parameter is missing	2023-10-02 19:41:07 +08:00
Adrian	a847676984	metal : set log callback before initializing (#3427 )	2023-10-02 13:49:59 +03:00
Concedo	1bc01cbcd4	update images (+3 squashed commit) Squashed commit: [4d5f17ad] update readme [cd000215] resize image [eca91721] try add more media	2023-10-02 17:53:59 +08:00
bandoti	095231dfd3	cmake : fix transient definitions in find pkg (#3411 )	2023-10-02 12:51:49 +03:00
Concedo	5b4cef5a60	archived old unused file	2023-10-02 16:57:20 +08:00
Kevin Ji	ea55295a74	docker : ignore Git files (#3314 )	2023-10-02 11:53:53 +03:00
vvhg1	c97f01c362	infill : add new example + extend server API (#3296 ) * vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-10-02 10:42:02 +03:00
Concedo	23b9d3af49	force oai endpoints to return json	2023-10-02 12:45:14 +08:00
Concedo	0c47e79537	updated the API routing path and fixed a bug with threads	2023-10-02 11:05:19 +08:00
Concedo	dffc6bee74	deprecate some launcher arguments.	2023-10-01 22:30:48 +08:00
Concedo	b49a5bc546	formatting of text	2023-10-01 18:38:32 +08:00
Concedo	bc841ec302	flag to retain grammar, fix makefile (+2 squashed commit) Squashed commit: [d5cd3f28] flag to retain grammar, fix makefile [b3352963] updated lite to v73	2023-10-01 14:39:56 +08:00
Concedo	7ab01ee3c6	Merge branch 'master' into concedo_experimental	2023-10-01 10:22:05 +08:00
Concedo	2fc00fac8c	fixed makefile	2023-10-01 10:17:23 +08:00
Concedo	4b45d880ba	updated lite	2023-10-01 01:10:30 +08:00
slaren	f5ef5cfb18	ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412 ) * ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU	2023-09-30 18:12:57 +02:00
Concedo	191de1e8a3	allow launching with kcpps files	2023-09-30 19:35:03 +08:00
Concedo	202e28a76a	do not offload rope for old cublas (+1 squashed commits) Squashed commits: [ca72a66f] fix allocr (+1 squashed commits) Squashed commits: [22a0e30e] updated lite	2023-09-30 18:18:36 +08:00
Concedo	5e6450161a	functional merge	2023-09-30 12:31:57 +08:00
Concedo	b84e210f0d	merge new rope param nonsense	2023-09-30 11:33:30 +08:00
slaren	40e07a60f9	llama.cpp : add documentation about rope_freq_base and scale values (#3401 ) * llama.cpp : add documentation about rope_freq_base and scale values * add notice to hot topics	2023-09-29 18:42:32 +02:00
Georgi Gerganov	bc34dd4f5b	train : fix KQ_pos allocation (#3392 ) * train : fix KQ_pos allocation * make sure KQ_pos is not reallocated in finetune --------- Co-authored-by: xaedes <xaedes@gmail.com>	2023-09-29 19:05:18 +03:00
Cebtenzzre	2777a84be4	llama : quantize up to 31% faster on Linux and Windows with mmap (#3206 ) * llama : enable mmap in quantize on Linux -> 31% faster * also enable mmap on Windows --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-29 16:48:45 +03:00
BarfingLemurs	0a4a4a0982	readme : update hot topics + model links (#3399 )	2023-09-29 15:50:35 +03:00
Andrew Duffy	569550df20	readme : add link to grammars app (#3388 ) * Add link to grammars app per @ggernagov suggestion Adding a sentence in the Grammars section of README to point to grammar app, per https://github.com/ggerganov/llama.cpp/discussions/2494#discussioncomment-7138211 * Update README.md	2023-09-29 14:15:57 +03:00
Jhen-Jie Hong	c71bf2c45c	swift : fix build on xcode 15 (#3387 )	2023-09-29 08:25:13 +03:00
Concedo	033e3bf844	prepare to merge parallel	2023-09-29 10:30:45 +08:00
Cebtenzzre	bc39553c90	build : enable more non-default compiler warnings (#3200 )	2023-09-28 17:41:44 -04:00
Hua Jiang	0ccfc62a96	ggml_tensor: update the structure comments. (#3283 ) * ggml_tensor: update the structure comments. * remove semicolon Co-authored-by: slaren <slarengh@gmail.com> * Update ggml.h --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2023-09-28 23:06:18 +03:00
Qu Zongfu	7f1a0fe709	ggml : release the requested thread pool resource (#3292 ) * Release the requested thread pool resource * Release the requested thread pool resource 2 --------- Co-authored-by: Zongfu ZF3 Qu <quzf3@Lenovo.com>	2023-09-28 22:51:52 +03:00
slaren	16bc66d947	llama.cpp : split llama_context_params into model and context params (#3301 ) * llama.cpp : split llama_context_params into model and context params ggml-ci * fix metal build * fix freq_base/scale default to model value * llama-bench : keep the same model between tests when possible * move n_threads to llama_context_params, add n_threads_batch * fix mpi build * remove kv_size(), cuda scratch fixes * remove low-vram option * add n_threads_batch to system info, refactor to get_system_info() * add documentation about --threads-batch to the READMEs * llama-bench fix * main : fix rope freq/scale warning * llama.cpp : add llama_get_model common : add llama_tokenize from model * remove duplicated ctx/model functions ggml-ci * cuda : print total VRAM used	2023-09-28 22:42:38 +03:00
Eve	0512d66670	ci : multithreaded builds (#3311 ) * mac and linux threads * windows * Update build.yml * Update build.yml * Update build.yml * automatically get thread count * windows syntax * try to fix freebsd * Update build.yml * Update build.yml * Update build.yml	2023-09-28 22:31:04 +03:00

1 2 3 4 5 ...

2338 commits