llama.cpp

Author	SHA1	Message	Date
xaedes	1ce7023eed	revert last commit "bug fix, probably solves the 'ggml_allocr_alloc: not enough space in the buffer' issue" "alloc was freeing an externally allocated tensor, because it calculated the end of allocator memory as alloc->data + alloc->max_size instead of alloc->data + alloc->size." This is intentional to reduce the risk of freeing external tensors when measuring. Unless max_size is not properly calculated, I don't see why this is an issue.	2023-09-02 21:27:12 +02:00
xaedes	8d982c8fd9	bug fix, probably solves the 'ggml_allocr_alloc: not enough space in the buffer' issue	2023-09-02 20:53:14 +02:00
xaedes	ded6382961	add some more allocator debug prints	2023-09-02 20:52:25 +02:00
xaedes	cfe217f1ca	fix README.md	2023-09-02 16:11:31 +02:00
xaedes	6ee12b158b	increase measured alloc size by tensor_alignment ggml_allocr_reset will reduce the given size by up to tensor_alignment-1	2023-09-02 15:59:14 +02:00
xaedes	c32ad44f84	print time per iteration and estimate remaining time	2023-09-01 17:03:36 +02:00
xaedes	6809eb7de9	Merge branch 'master' into finetune-lora # Conflicts: # Makefile	2023-09-01 16:07:05 +02:00
xaedes	7acb1241c6	update README.md	2023-09-01 16:04:08 +02:00
xaedes	6cbf55a64b	add finetune to Makefile	2023-09-01 16:02:45 +02:00
xaedes	5bba329e58	finetune: automatically allocate all memory and changes to command line options remove '--n_examples N' parameter, as it no longer makes sense to call optimization process multiple times in a loop. add '--only_write_lora' command line option: will skip tokenization and training, to only write a llama.cpp comptabile LORA adapter. remove memory buffer related command line options. improve iteration console output.	2023-09-01 15:58:52 +02:00
xaedes	7e01d11a28	add ggml-alloc API function 'ggml_allocr_max_size' to get max size of alloc GGML_API size_t ggml_allocr_max_size(struct ggml_allocr * alloc);	2023-09-01 15:42:40 +02:00
xaedes	d554a70f11	initialize opt ggml context if none was provided	2023-09-01 15:41:57 +02:00
xaedes	4914f855c7	add tensor checkpoints only when gradient checkpointing is enabled	2023-08-31 16:46:21 +02:00
xaedes	e0da1684db	remove finetune option to disable allocator the allocator should always be used. by making sure that it is always used it gets easier to implement automatic memory requirements computation	2023-08-31 16:45:47 +02:00
DannyDaemonic	e8422de39e	@vxiiduu's fix for PrefetchVirtualMemory (#2930 ) Reimplement fix for `PrefetchVirtualMemory`. Co-authored-by: vxiiduu <73044267+vxiiduu@users.noreply.github.com>	2023-08-31 04:21:45 -07:00
Cebtenzzre	92d0b751a7	convert : fix python 3.8 support, modernize type annotations (#2916 ) * convert : fix python 3.8 support * convert : sort imports * convert : fix required parameters in convert-llama-ggmlv3-to-gguf * convert : fix mypy errors in convert-llama-ggmlv3-to-gguf * convert : use PEP 585 generics and PEP 604 unions Now that we have `from __future__ import annotations`, we can use this modern syntax in Python 3.7 instead of restricting support to Python 3.9 or 3.10 respectively. * gguf.py : a tuple is already a tuple * add mypy.ini * convert : add necessary `type: ignore` comments * gguf-py: bump version	2023-08-31 08:02:23 +03:00
Johannes Gäßler	8afe228000	CUDA: mul_mat_q=true llama_context_params default (#2912 )	2023-08-30 21:46:19 +02:00
Henri Vasserman	71d6975559	[Docker] fix tools.sh argument passing. (#2884 ) * [Docker] fix tools.sh argument passing. This should allow passing multiple arguments to containers with the full image that are using the tools.sh frontend. Fix from https://github.com/ggerganov/llama.cpp/issues/2535#issuecomment-1697091734	2023-08-30 19:14:53 +03:00
xaedes	4fd51c4616	fix warnings	2023-08-30 17:12:23 +02:00
xaedes	0c57f9f0b3	fix warnings	2023-08-30 16:55:49 +02:00
xaedes	4e986ac4bc	update README.md	2023-08-30 16:29:09 +02:00
xaedes	b26bd4c34c	add option to save train-text-from-scratch output every N iterations	2023-08-30 16:26:05 +02:00
xaedes	f3590ad8d9	remove trailing whitespace	2023-08-30 16:01:08 +02:00
xaedes	fc456edda6	train-text-from-scratch can train (full finetune) gguf models just pass the gguf model via `--checkpoint-in FN`. after this, to continue training, pass the generated checkpoint instead of the original gguf model. tested with smaller models, bigger models may exceed available memory. use (LORA) finetune for those.	2023-08-30 15:57:17 +02:00
xaedes	e6b7158123	replace custom data getters and setters by ggml functions	2023-08-30 15:21:27 +02:00
xaedes	d487e0531f	move gradient checkpointing code into ggml, new API function: // build gradient checkpointing backward graph gb for gf using provided checkpoints // gb_tmp will contain original backward graph with rewritten backward process nodes, // but without the second forward pass nodes. GGML_API void ggml_build_backward_gradient_checkpointing( struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, struct ggml_cgraph * gb_tmp, struct ggml_tensor * * checkpoints, int n_checkpoints);	2023-08-30 15:21:27 +02:00
xaedes	2392b6725b	use tensor->view_src instead of ggml_is_view and get_view_source	2023-08-30 14:46:12 +02:00
xaedes	b1709f2d25	Merge branch 'master' into finetune-lora	2023-08-30 13:28:29 +02:00
Georgi Gerganov	b532a69b2f	convert.py : use dir name to name the llama	2023-08-30 13:29:40 +03:00
Georgi Gerganov	c90d135eb4	examples : fix underscore in beam-search + .gitignore (close #2900 )	2023-08-30 12:53:24 +03:00
M. Yusuf Sarıgöz	0d1c706181	gguf : add workflow for Pypi publishing (#2896 ) * gguf : add workflow for Pypi publishing * gguf : add workflow for Pypi publishing * fix trailing whitespace	2023-08-30 12:47:40 +03:00
alonfaraj	9509294420	make : add test and update CI (#2897 ) * build ci: run make test * makefile: - add all - add test * enable tests/test-tokenizer-0-llama * fix path to model * remove gcc-8 from macos build test * Update Makefile * Update Makefile	2023-08-30 12:42:51 +03:00
Gilad S	35092fb547	docs : add `node-llama-cpp` to `README.md` (#2885 )	2023-08-30 11:40:12 +03:00
Kerfuffle	dc07dc492e	convert : various script cleanups/fixes + merges and special token handling (#2842 ) * convert: Fix permute calls and method/func definitions * Cleanups for gguf-py * Minor types cleanups. * Initial implementation of handling merges and special tokens * convert: Handle special tokens and merges in vocab only mode convert: Vocab only mode no longer requires loading model tensors * gguf: Refactor tensor name mapping * convert: Fix type hint for special_token_types in SpecialVocab * Use common special vocab handling in various conversion scripts * First pass at implementing suggested changes * Second pass * gguf: SpecialVocab: Fix issue with special token content not in a dict gguf: SpecialVocab: Allow skipping handling of merges * convert-falcon-hf-to-gguf: Support --vocab-only option, bail out if no tokenizer.json * convert-gptneox-hf-to-gguf and convert: Only handle merges for BPE tokenizer * gguf: SpecialVocab: Actually set load_merges in object * Uniform args parsing and vocab only mode for convert examples * convert.py: Set gpt2 as tokenizer model when using BPE * Squish last type warning in gguf.py - yay!	2023-08-30 11:25:50 +03:00
chaihahaha	ad9ddcff6e	llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )	2023-08-30 09:50:55 +03:00
staviq	8341a25957	main : log file (#2748 ) * initial, base LOG macro * add .log to .gitignore added basic log file handler * reverted log auto endline to better mimic printf * remove atomics and add dynamic log target * log_enable/disable, LOG_TEE, basic usage doc * update .gitignore * mv include to common, params, help msg * log tostring helpers, token vectors pretty prints * main: replaced fprintf/LOG_TEE, some trace logging * LOG_DISABLE_LOGS compile flag, wrapped f in macros * fix LOG_TEELN and configchecker * stub LOG_DUMP_CMDLINE for WIN32 for now * fix msvc * cleanup main.cpp:273 * fix stray whitespace after master sync * log : fix compile warnings - do not use C++20 stuff - use PRIu64 to print uint64_t - avoid string copies by using const ref - fix ", ##__VA_ARGS__" warnings - compare strings with == and != * log : do not append to existing log + disable file line func by default * log : try to fix Windows build * main : wip logs * main : add trace log * review: macro f lowercase, str append to sstream * review: simplify ifs and str comparisons * fix MSVC, formatting, FMT/VAL placeholders * review: if/else cleanup * review: if/else cleanup (2) * replace _ prefix with _impl suffix --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-30 09:29:32 +03:00
Cebtenzzre	849408957c	tests : add a C compliance test (#2848 ) * tests : add a C compliance test * make : build C compliance test by default * make : fix clean and make sure C test fails on clang * make : move -Werror=implicit-int to CFLAGS	2023-08-30 09:20:26 +03:00
slaren	06abf8eeba	ggml : add view_src and view_offs to ggml_tensor for views (#2874 ) * ggml : add view_src and view_offs * update ggml-alloc to use view_src * update ggml_diag_mask to work correctly with automatic inplace * exclude other ops that set an inplace flag from automatic inplace	2023-08-29 23:24:42 +02:00
slaren	c03a243abf	remove outdated references to -eps and -gqa from README (#2881 )	2023-08-29 23:17:34 +02:00
xaedes	bf70e27cd6	fix check_gradient ggml_build_backward_expand was previously replaced by ggml_build_backward, but the assignment of forward graph to backward graph missing	2023-08-29 23:08:30 +02:00
Kawrakow	fa3582f509	Tell users attmepting to run perplexity with too few tokens to use more (#2882 ) Closes #2858 Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-29 23:55:45 +03:00
Kawrakow	e37e69dcc3	10X faster BPE tokenizer (#2876 ) * 10X faster BPE tokenizer * Remove comment that no longer applies --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-29 23:55:03 +03:00
xaedes	5854f51188	fix error message in ggml_allocr_alloc to display actual max_avail	2023-08-29 22:49:01 +02:00
xaedes	281245a48f	Merge branch 'master' into finetune-lora	2023-08-29 21:47:28 +02:00
xaedes	8a96d4c2aa	add missing argument 'int i0' to ggml_get_i32_nd & ggml_set_i32_nd header declarations	2023-08-29 21:24:37 +02:00
xaedes	dd4e4bca09	remove unused 'inplace' argument from ggml_compute_backward function inplace operations to add gradients are no longer created by ggml_compute_backward use allocator to automatically make inplace operations	2023-08-29 21:21:10 +02:00
xaedes	a76e66ac8d	fix ggml_acc_or_set to return tensor of correct shape	2023-08-29 21:02:10 +02:00
xaedes	b1aa26f718	add sanity check to ggml_compute_backward, asserting the correct shape of gradients	2023-08-29 21:01:17 +02:00
xaedes	5fcfa7e49e	increase test-grad0 context mem size to accommodate for bigger cgraph	2023-08-29 21:00:19 +02:00
xaedes	82c5247a20	add ggml API functions ggml_unravel_index, ggml_get_i32_nd and its analogs for set and for f32 ggml_get_i32_1d, ggml_set_i32_1d, ggml_get_f32_1d, ggml_set_f32_1d now support non-contiguous tensors. in case of non-contiguous tensor, the 1d index is unraveled into a multi index using ggml_unravel_index to be passed to '_nd' function equivalent. this fixes a bug in test-grad0 which happens due to ggml_build_backward not building purely contiguous tensors anymore	2023-08-29 20:59:31 +02:00

1 2 3 4 5 ...

1289 commits