llama.cpp

Author	SHA1	Message	Date
xaedes	f6828cba9e	remove GGML_ALIGNED_REALLOC and use normal malloc/realloc/free for gguf ctx->kv & ctx->infos	2023-08-28 20:21:03 +02:00
xaedes	440d221c62	add missing blank line at end of file	2023-08-28 19:17:47 +02:00
xaedes	a925e9304a	fix non-windows GGML_ALIGNED_REALLOC	2023-08-28 19:16:27 +02:00
xaedes	12c4e5b50f	Merge branch 'master' into pr-train-mem-usage-improvements	2023-08-28 19:14:18 +02:00
xaedes	17ab46dffc	update train-text-from-scratch README.md	2023-08-28 19:13:20 +02:00
xaedes	3e7dfd08c4	remove prediction related code use main for prediction, it is better optimized	2023-08-28 19:11:27 +02:00
xaedes	3155019b53	remove trailing whitespace	2023-08-28 18:39:50 +02:00
xaedes	63bf200b87	remove code used to verify correctness of checkpoint file conversion	2023-08-28 18:38:52 +02:00
xaedes	31c093c2cc	bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints with opt_version=0	2023-08-28 18:33:00 +02:00
xaedes	e8df9e6815	temporarily add code to write old checkpoint files used to verify that old checkpoint files are correctly converted to gguf	2023-08-28 18:17:51 +02:00
Johannes Gäßler	6b73ef1201	YAML result logging + preset script (#2657 )	2023-08-28 17:59:39 +02:00
xaedes	5f27ade48e	bug fixes for convert-train-checkpoint-to-gguf	2023-08-28 17:57:10 +02:00
alonfaraj	75fafcbccc	make : fix tests build (#2855 ) * makefile: - fix test name - add missing tests build * editorconfig : fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-28 18:38:35 +03:00
grahameth	be475f60af	llama.cpp : fix wrong vsnprintf call in MS compiler (#2856 ) Co-authored-by: grahameth <->	2023-08-28 18:38:12 +03:00
xaedes	c690c20362	print data checksums before saving and after loading to verify correctness	2023-08-28 16:09:53 +02:00
xaedes	f97f92bce5	remove trailing whitespace	2023-08-28 15:28:19 +02:00
xaedes	daa0b6c6a4	set name of tensors with empty name from what was read from gguf	2023-08-28 15:27:26 +02:00
xaedes	e86b3e3257	avoid printing lots of spaced on the unusual case that loss gets nan	2023-08-28 15:26:44 +02:00
xaedes	3d8d884049	bug fix in load_opt_context_gguf	2023-08-28 15:07:00 +02:00
Ronny Brendel	3af6b86301	ggml : tiny ggml_vec_dot_q4_K_q8_K AVX2 improvement (#2819 )	2023-08-28 15:51:08 +03:00
Georgi Gerganov	35feac6560	ggml : sync (mem align to header + conv_transpose_2d fixes + ggml_alloc) (#2852 ) * ggml : sync (mem align to header + conv_transpose_2d fixes) ggml-ci * ggml-alloc : minor fix * ggml-alloc : sync more fixes	2023-08-28 14:24:53 +03:00
Johannes Gäßler	92b1bbd2ec	CUDA: fix RoPE asserts, block sizes (#2833 )	2023-08-28 14:23:55 +03:00
igarnier	dd0dc366da	llama.h : add missing struct keyword for C compat in callback type (#2847 )	2023-08-28 11:19:59 +03:00
Georgi Gerganov	f55538c3cc	metal : fix memory leak (#2762 ) * metal : fix memory leak * metal : fix encoders memory leak * metal : clean up more memory resources * metal : fix more leaks * metal : reuse dispatch queue + autoreleasepool * metal : reuse array for command buffers and encoders * ggml : assert for odd number of blocks on ARM 15M tinyllama is an example	2023-08-28 10:59:08 +03:00
Cebtenzzre	ebcee207b6	quantize : make output filename optional again (#2823 ) * quantize : make output filename optional again * quantize : fix path parsing on Windows suggested by @slaren	2023-08-28 09:32:25 +03:00
JohnnyB	3e8ff47af6	devops : added systemd units and set versioning to use date. (#2835 ) * Corrections and systemd units * Missing dependency clblast	2023-08-28 09:31:24 +03:00
xaedes	1f83343498	bug fix in read_tensor_by_name	2023-08-28 02:02:05 +02:00
xaedes	152cfaac36	bug fix: init model when no checkpoint was loaded	2023-08-28 01:49:18 +02:00
xaedes	4882ff0c59	bug fixes in load_llama_model_gguf	2023-08-28 01:49:17 +02:00
xaedes	76d2794e11	bug fixes in tokenize_file	2023-08-28 01:49:17 +02:00
xaedes	5d94997a09	add gguf example cmake file	2023-08-28 01:49:17 +02:00
xaedes	ca5b344fb1	fix memory corruption bug in gguf ctx->kv and ctx->infos was reallocated using not-aligned realloc, but freed with aligned free. to fix this a GGML_ALIGNED_REALLOC was added, but there is no posix_memalign_realloc function. so on non-windows and non-mingw32 platforms we fall back to aligned malloc, followed by copying and freeing the old data.	2023-08-28 01:49:17 +02:00
xaedes	0b2c85b025	use norm_rms_eps, and rope parameters and command line options to set them	2023-08-27 23:39:21 +02:00
xaedes	91a4ccaf96	use same GGUF_GET_KEY macro as in llama.cpp	2023-08-27 23:32:49 +02:00
xaedes	d71069c4fb	add layer_norm_rms_eps to checkpoint convert script	2023-08-27 23:25:41 +02:00
xaedes	ef899fbe89	add gguf key and tensor names for optimizer and training	2023-08-27 23:21:59 +02:00
xaedes	495a62a142	save opt parameter counter as uint64	2023-08-27 23:21:08 +02:00
xaedes	cb42324d6a	add gguf arch and ftype	2023-08-27 23:20:18 +02:00
xaedes	a6f3a47c39	Merge branch 'master' into pr-train-mem-usage-improvements	2023-08-27 23:11:47 +02:00
xaedes	3a91c975a6	add first draft for checkpoint conversion script	2023-08-27 22:05:36 +02:00
xaedes	0c494cc60e	save & load opt->just_initialized value	2023-08-27 22:05:24 +02:00
Georgi Gerganov	103cfafc77	gguf : fix strings to not be null-terminated (#2839 ) * gguf : fix strings to not be null-terminated ggml-ci * gguf : fix gguf_add_tensor name	2023-08-27 21:50:22 +03:00
Georgi Gerganov	c10704d01e	llama : fix MPI threads (close #2827 )	2023-08-27 18:55:41 +03:00
Olivier Chafik	230d46c723	examples : update llama2.c converter to read vocab and write models in GGUF format (#2751 ) * llama2.c: direct gguf output (WIP) * Simplify vector building logic * llama2.c gguf conversion: fix token types in converter * llama2.c: support copying vocab from a llama gguf model file * llama2.c: update default path for vocab model + readme * llama2.c: use defines for gguf keys * llama2.c: escape whitespaces w/ U+2581 in vocab converter the llama.cpp way * llama2.c converter: cleanups + take n_ff from config	2023-08-27 17:13:31 +03:00
Kawrakow	463173a6c0	llama : speedup tokenization (#2831 ) * Speedup tokenization On current master it takes ~3.2 seconds to tokenize Wikitext. With this change it becomes ~525 ms. * Fixit: it was missing the piece after the last found occurence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-27 16:50:33 +03:00
Georgi Gerganov	eaa13a48ff	falcon : fix CUDA inference by making K and Q contiguous (#2830 ) * falcon : fix CUDA inference by making K and Q contiguous ggml-ci * cuda : add assert to guard from non-cont ropes	2023-08-27 16:40:48 +03:00
Georgi Gerganov	da7455d046	readme : fix headings	2023-08-27 15:52:34 +03:00
Georgi Gerganov	25423e9185	scripts : helper convert script	2023-08-27 15:24:58 +03:00
Kawrakow	a6d1189fdd	k_quants tuning for Falcon-7b (#2816 ) * Make ggml-cuda.cu build with QK_K = 64 Using LLAMA_CUDA_FORCE_DMMV = ON and -nommq it runs and produces a meaningful result. * k_quants tuning for Falcon-7b --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-27 15:19:59 +03:00
Georgi Gerganov	c48c5bb0b0	readme : update hot topics	2023-08-27 14:44:35 +03:00

1 2 3 4 5 ...

1209 commits