llama.cpp

Author	SHA1	Message	Date
xaedes	8d982c8fd9	bug fix, probably solves the 'ggml_allocr_alloc: not enough space in the buffer' issue	2023-09-02 20:53:14 +02:00
xaedes	ded6382961	add some more allocator debug prints	2023-09-02 20:52:25 +02:00
Kerfuffle	3358c381f6	logging: Fix creating empty file even when disabled (#2966 ) * logging: Fix creating empty file even when disabled * Minor formatting fix Co-authored-by: staviq <staviq@gmail.com> --------- Co-authored-by: staviq <staviq@gmail.com>	2023-09-02 11:53:55 -06:00
xaedes	cfe217f1ca	fix README.md	2023-09-02 16:11:31 +02:00
xaedes	6ee12b158b	increase measured alloc size by tensor_alignment ggml_allocr_reset will reduce the given size by up to tensor_alignment-1	2023-09-02 15:59:14 +02:00
bandoti	52315a4216	readme : update clblast instructions (#2903 ) * Update Windows CLBlast instructions * Update Windows CLBlast instructions * Remove trailing whitespace	2023-09-02 15:53:18 +03:00
Karsten Weiss	8b56b4f2c3	metal : show all Metal device instances in the system (#2952 ) * ggml_metal_init: Show all Metal device instances in the system Also show the default Metal device that was picked. * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-02 15:29:09 +03:00
Jhen-Jie Hong	21f3d1be86	k-quants : fix build on armv7 (android only) (#2920 ) * k-quants : fix build on armv7 * ggml : cleanup unused arm32 specific impl * k-quants : avoid some unused vzero / mzero define * ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm	2023-09-02 15:23:45 +03:00
Jhen-Jie Hong	571083f508	server : avoid aniprompt in probabilities of final response (#2849 )	2023-09-02 08:31:46 +08:00
Engininja2	f04d002844	cuda : vsubss4 for older versions of ROCm/clang (#2942 )	2023-09-01 23:33:19 +02:00
xaedes	c32ad44f84	print time per iteration and estimate remaining time	2023-09-01 17:03:36 +02:00
xaedes	6809eb7de9	Merge branch 'master' into finetune-lora # Conflicts: # Makefile	2023-09-01 16:07:05 +02:00
ZHAOKAI WANG	69fdbb9abc	readme : quick start command fix (#2908 ) * quick start command fix * quick start win command fix	2023-09-01 17:06:44 +03:00
xaedes	7acb1241c6	update README.md	2023-09-01 16:04:08 +02:00
Kerfuffle	5d6f19f16b	Allow quantize to only copy tensors, some other improvements (#2931 ) * Allow quantize tool to only copy tensors to allow repackaging models. * Slightly better logic when requantizing. * Change help message to go to `stdout`.	2023-09-01 08:02:48 -06:00
xaedes	6cbf55a64b	add finetune to Makefile	2023-09-01 16:02:45 +02:00
Georgi Gerganov	0d58936686	llama2c : rename function	2023-09-01 17:01:11 +03:00
xaedes	5bba329e58	finetune: automatically allocate all memory and changes to command line options remove '--n_examples N' parameter, as it no longer makes sense to call optimization process multiple times in a loop. add '--only_write_lora' command line option: will skip tokenization and training, to only write a llama.cpp comptabile LORA adapter. remove memory buffer related command line options. improve iteration console output.	2023-09-01 15:58:52 +02:00
Cebtenzzre	6c9c23429b	make : use unaligned vector moves on MinGW (#2945 ) Fixes #2922	2023-09-01 16:53:14 +03:00
m3ndax	ee8654bcd0	minor : add const qualifiers (#2853 ) * made the methods const # Conflicts: # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp * made method const * Update convert-llama2c-to-ggml.cpp removed write_raw and write_u32 * llama2c : remove misleading const --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-01 16:47:27 +03:00
xaedes	7e01d11a28	add ggml-alloc API function 'ggml_allocr_max_size' to get max size of alloc GGML_API size_t ggml_allocr_max_size(struct ggml_allocr * alloc);	2023-09-01 15:42:40 +02:00
xaedes	d554a70f11	initialize opt ggml context if none was provided	2023-09-01 15:41:57 +02:00
Konstantin Herud	49bb9cbe0f	docs : add java-llama.cpp to README.md (#2935 )	2023-09-01 16:36:14 +03:00
Cebtenzzre	ef15649972	build : fix most gcc and clang warnings (#2861 ) * fix most gcc and clang warnings * baby-llama : remove commented opt_params_adam * fix some MinGW warnings * fix more MinGW warnings	2023-09-01 16:34:50 +03:00
Ben Siraphob	d8d6977f48	examples : add C grammar (#2357 )	2023-09-01 16:32:14 +03:00
Tameem	5aec2cfaac	ggml : add RISC-V vector intrinsics support (#2929 ) * added support for RISCV CFLAGS & native compile + cross compile options * Add RISC-V Vector Intrinsics Support Added RVV intrinsics for following ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 ggml_vec_dot_q8_0_q8_0 Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai> Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai> Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>	2023-09-01 16:27:40 +03:00
Georgi Gerganov	13268c5331	metal : slight speed-up for add and mul kernels (#2917 )	2023-09-01 13:42:41 +03:00
staviq	4dcd47d71d	logs : fix mingw-like builds (fixes #2898 ) (#2911 ) * fix mingw-like builds * formatting * make LOG_COMPAT easier to override and extend * simplify win detection * fix for #2940	2023-09-01 12:07:06 +03:00
Cebtenzzre	18705a30ef	llama2c : fix segfault and alloc-dealloc-mismatch (#2913 ) * llama2c : fix segfault if vocab is not found * llama2c : fix mismatch between new[] and delete * llama2c : fix basename on Windows * llama2c : use a destructor to prevent memory leaks	2023-09-01 12:03:49 +03:00
Kawrakow	e8d9158925	metal: somewhat faster f16 x f32 matrix multiply kernel (#2951 ) * Somewhat faster f16 x f32 matrix multiply kernel * Better use 32 thread groups for f16 x f32 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-09-01 11:15:57 +03:00
Cebtenzzre	bce1fef328	convert : fix another python 3.8 issue (#2949 )	2023-08-31 22:13:51 -04:00
slaren	528134dd02	remove convert-llama-7b-pth-to-gguf.py and convert-llama-hf-to-gguf.py (#2906 )	2023-09-01 01:32:09 +02:00
Kerfuffle	aeefac4ff7	scripts: Use local gguf package when running from repo (#2927 ) * scripts: Use local gguf when running from repo	2023-08-31 16:49:24 -06:00
xaedes	4914f855c7	add tensor checkpoints only when gradient checkpointing is enabled	2023-08-31 16:46:21 +02:00
xaedes	e0da1684db	remove finetune option to disable allocator the allocator should always be used. by making sure that it is always used it gets easier to implement automatic memory requirements computation	2023-08-31 16:45:47 +02:00
DannyDaemonic	e8422de39e	@vxiiduu's fix for PrefetchVirtualMemory (#2930 ) Reimplement fix for `PrefetchVirtualMemory`. Co-authored-by: vxiiduu <73044267+vxiiduu@users.noreply.github.com>	2023-08-31 04:21:45 -07:00
Cebtenzzre	92d0b751a7	convert : fix python 3.8 support, modernize type annotations (#2916 ) * convert : fix python 3.8 support * convert : sort imports * convert : fix required parameters in convert-llama-ggmlv3-to-gguf * convert : fix mypy errors in convert-llama-ggmlv3-to-gguf * convert : use PEP 585 generics and PEP 604 unions Now that we have `from __future__ import annotations`, we can use this modern syntax in Python 3.7 instead of restricting support to Python 3.9 or 3.10 respectively. * gguf.py : a tuple is already a tuple * add mypy.ini * convert : add necessary `type: ignore` comments * gguf-py: bump version	2023-08-31 08:02:23 +03:00
Johannes Gäßler	8afe228000	CUDA: mul_mat_q=true llama_context_params default (#2912 )	2023-08-30 21:46:19 +02:00
Henri Vasserman	71d6975559	[Docker] fix tools.sh argument passing. (#2884 ) * [Docker] fix tools.sh argument passing. This should allow passing multiple arguments to containers with the full image that are using the tools.sh frontend. Fix from https://github.com/ggerganov/llama.cpp/issues/2535#issuecomment-1697091734	2023-08-30 19:14:53 +03:00
xaedes	4fd51c4616	fix warnings	2023-08-30 17:12:23 +02:00
xaedes	0c57f9f0b3	fix warnings	2023-08-30 16:55:49 +02:00
xaedes	4e986ac4bc	update README.md	2023-08-30 16:29:09 +02:00
xaedes	b26bd4c34c	add option to save train-text-from-scratch output every N iterations	2023-08-30 16:26:05 +02:00
xaedes	f3590ad8d9	remove trailing whitespace	2023-08-30 16:01:08 +02:00
xaedes	fc456edda6	train-text-from-scratch can train (full finetune) gguf models just pass the gguf model via `--checkpoint-in FN`. after this, to continue training, pass the generated checkpoint instead of the original gguf model. tested with smaller models, bigger models may exceed available memory. use (LORA) finetune for those.	2023-08-30 15:57:17 +02:00
xaedes	e6b7158123	replace custom data getters and setters by ggml functions	2023-08-30 15:21:27 +02:00
xaedes	d487e0531f	move gradient checkpointing code into ggml, new API function: // build gradient checkpointing backward graph gb for gf using provided checkpoints // gb_tmp will contain original backward graph with rewritten backward process nodes, // but without the second forward pass nodes. GGML_API void ggml_build_backward_gradient_checkpointing( struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, struct ggml_cgraph * gb_tmp, struct ggml_tensor * * checkpoints, int n_checkpoints);	2023-08-30 15:21:27 +02:00
xaedes	2392b6725b	use tensor->view_src instead of ggml_is_view and get_view_source	2023-08-30 14:46:12 +02:00
xaedes	b1709f2d25	Merge branch 'master' into finetune-lora	2023-08-30 13:28:29 +02:00
Georgi Gerganov	b532a69b2f	convert.py : use dir name to name the llama	2023-08-30 13:29:40 +03:00

1 2 3 4 5 ...

1360 commits