llama.cpp

Author	SHA1	Message	Date
Robert Washbourne	7e8607d097	fix handler	2023-11-24 02:22:51 -05:00
Robert Washbourne	7e06600b38	disable	2023-11-24 01:58:58 -05:00
Robert Washbourne	1cde43fbb4	disable	2023-11-24 01:56:36 -05:00
Robert Washbourne	7280bb217a	make	2023-11-24 01:54:21 -05:00
Robert Washbourne	af5a4371da	fix pip	2023-11-24 01:50:07 -05:00
Robert Washbourne	63961c0e75	copy handler	2023-11-24 01:33:35 -05:00
Robert Washbourne	9036005e51	syntax	2023-11-24 01:08:27 -05:00
Robert Washbourne	0507037432	move python	2023-11-24 01:05:31 -05:00
Robert Washbourne	f571ed512a	ws	2023-11-24 00:43:37 -05:00
Robert Washbourne	4468d96aec	add handler	2023-11-24 00:41:33 -05:00
Robert Washbourne	8ddd5cb916	change prefix	2023-11-24 00:17:06 -05:00
Robert Washbourne	819d9f1258	cuda	2023-11-24 00:10:14 -05:00
Robert Washbourne	0e2c422b11	remove flags	2023-11-23 23:58:40 -05:00
Robert Washbourne	22804439d2	make	2023-11-23 23:51:34 -05:00
Robert Washbourne	4b6e344bad	pip later	2023-11-23 23:43:37 -05:00
Robert Washbourne	93f86c98d1	Merge branch 'ggerganov:master' into master	2023-11-23 23:36:09 -05:00
Robert Washbourne	c06162ba94	from build	2023-11-23 23:30:56 -05:00
Robert Washbourne	1b703db0e1	change entrypoint llama.cpp server dockerfile	2023-11-23 23:27:19 -05:00
Haohui Mai	55978ce09b	Fix incorrect format strings and uninitialized variables. (#4133 ) * Fix incorrect format strings and uninitialized variables. * Address comments * Add the missing include statement	2023-11-23 22:56:53 +01:00
Georgi Gerganov	6b0a7420d0	llama : KV cache view API + better KV cache management (#4170 ) * llama : keep track of used KV cells + better KV cache management * llama : zero KV cache used upon clear ggml-ci * llama : allow exporting a view of the KV cache (#4180) * Allow exporting a view of the KV cache * Allow dumping the sequences per cell in common * Track max contiguous cells value and position as well * Fix max contiguous empty cells index calculation Make dump functions deal with lengths or sequences counts > 10 better * Fix off by one error in dump_kv_cache_view * Add doc comments for KV cache view functions Eliminate cell sequence struct; use llama_seq_id directly Minor cleanups * common : add -dkvc arg for enabling kv cache dumps --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>	2023-11-23 19:07:56 +02:00
Georgi Gerganov	d103d935c0	readme : update hot topics	2023-11-23 13:51:22 +02:00
Daniel Bevenius	9d5949f04b	examples : fix typo in parallel example doc comment (#4181 ) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2023-11-23 13:34:20 +02:00
Georgi Gerganov	ff8238f71d	docs : add llama-star arch idea	2023-11-23 11:35:04 +02:00
Galunid	8e672efe63	stablelm : simplify + speedup generation (#4153 )	2023-11-21 16:22:30 +01:00
Galunid	0b871f1a04	finetune - update readme to mention llama support only (#4148 )	2023-11-20 19:30:00 +01:00
Aaryaman Vasishta	dfc7cd48b1	readme : update ROCm Windows instructions (#4122 ) * Update README.md * Update README.md Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-11-20 17:02:46 +02:00
Seb C	881800d1f0	main : Add ChatML functionality to main example (#4046 ) Co-authored-by: Sebastian Cramond <sebby37@users.noreply.github.com>	2023-11-20 14:56:59 +01:00
Galunid	f23c0359a3	ci : add flake8 to github actions (python linting) (#4129 ) Disabled rules: * E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned * E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned * E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned * E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard * E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned * E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned * E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard * E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard * E266 Too many leading '#' for block comment - sometimes used as "section" separator * E501 Line too long - disabled because it's broken so often it seems like a standard * E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead) * E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead)	2023-11-20 11:35:47 +01:00
Branden Butler	40a34fe8d0	speculative : fix prompt tokenization in speculative example (#4025 ) * Support special tokens and not adding BOS to prompt in speculative * Adapt to new should_add_bos function * Ensure tgt and dft have same add_bos setting	2023-11-20 11:50:04 +02:00
Georgi Gerganov	dae06c06e5	Revert "finetune : add --n-gpu-layers flag info to --help (#4128 )" This reverts commit `05e8301e45`.	2023-11-19 19:16:07 +02:00
Clark Saben	05e8301e45	finetune : add --n-gpu-layers flag info to --help (#4128 )	2023-11-19 18:56:38 +02:00
SoftwareRenderer	936c79b227	server : relay error messages (#4131 )	2023-11-19 18:54:10 +02:00
kchro3	262005ad9d	common : comma should be semicolon (#4137 )	2023-11-19 18:52:57 +02:00
Georgi Gerganov	35985acffa	gitignore : tokenize	2023-11-19 18:50:49 +02:00
slaren	e937066420	gguf-py : export chat templates (#4125 ) * gguf-py : export chat templates * llama.cpp : escape new lines in gguf kv info prints * gguf-py : bump version * gguf-py : check chat_template type * gguf-py : initialize chat_template	2023-11-19 11:10:52 +01:00
Kerfuffle	28a2e6e7d4	tokenize example: Respect normal add BOS token behavior (#4126 ) Allow building with Makefile	2023-11-18 14:48:17 -07:00
Galunid	0b5c3b0457	scripts : Remove missed baichuan convert script (#4127 )	2023-11-18 21:08:33 +01:00
Kerfuffle	2923f17f6f	Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124 ) * ggml-cuda.cu: Clean up warnings when compiling with clang * ggml-cuda.cu: Move static items into anonymous namespace * ggml-cuda.cu: Fix use of namespace start macro * Revert "ggml-cuda.cu: Fix use of namespace start macro" This reverts commit `26c1149026`. * Revert "ggml-cuda.cu: Move static items into anonymous namespace" This reverts commit `e29757e0f7`.	2023-11-18 08:11:18 -07:00
slaren	bbecf3f415	llama : increase max nodes (#4115 )	2023-11-17 21:39:11 +02:00
Roger Meier	8e9361089d	build : support ppc64le build for make and CMake (#3963 ) * build: support ppc64le build for make and CMake * build: keep __POWER9_VECTOR__ ifdef and extend with __powerpc64__ Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-17 18:11:23 +02:00
Georgi Gerganov	5ad387e994	tokenize : fix trailing whitespace	2023-11-17 18:01:38 +02:00
zakkor	2fa02b4b3d	examples : add tokenize (#4039 )	2023-11-17 17:36:44 +02:00
Don Mahurin	2ab0707acb	convert : use 'model' value if it exists. This allows karpathy/tinyllamas to load (#4089 ) Co-authored-by: Don Mahurin <@>	2023-11-17 17:32:34 +02:00
John	11173c92d6	py : Falcon HF compatibility (#4104 ) Falcon HF compatibility	2023-11-17 17:24:30 +02:00
Jannis Schönleber	9e87ef60e1	common : improve yaml log escaping (#4080 ) * logging: improve escaping in yaml output * logging: include review feedback	2023-11-17 17:24:07 +02:00
Huawei Lin	c7cce1246e	llava : fix compilation warning that fread return value is not used (#4069 )	2023-11-17 17:22:56 +02:00
Jiří Podivín	f7d5e97542	py : remove superfluous import statements (#4076 ) Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Jiri Podivin <jpodivin@redhat.com>	2023-11-17 17:20:53 +02:00
Jiří Podivín	ba4cf5c0bf	train : move number of gpu layers argument parsing to common/train.cpp (#4074 ) - introduces help entry for the argument - cuts '--gpu-layers' form in order to simplify usage and documentation. Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Jiri Podivin <jpodivin@redhat.com>	2023-11-17 17:19:16 +02:00
slaren	e85bb1a8e7	llama : add functions to get the model's metadata (#4013 ) * llama : add functions to get the model's metadata * format -> std::to_string * better documentation	2023-11-17 17:17:37 +02:00
gwjr	3e916a07ac	finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079 ) * Remove logically superfluous assertions and order by dimension * Use cblas_sgemm() to implement ggml_compute_forward_out_prod() * Remove ggml_compute_forward_out_prod_use_blas(), fix compiling errors on cmake/zig, remove trailing whitespace * Add openBLAS support for sgemm() in compute_forward_out_prod()	2023-11-17 16:48:19 +02:00

1 2 3 4 5 ...

1573 commits