llama.cpp

Author	SHA1	Message	Date
xaedes	1151653b15	replace llama API functions to get model tensors by one function to get model tensor by name LLAMA_API struct ggml_tensor * llama_get_model_tensor(struct llama_model * model, const char * name);	2023-08-16 21:36:40 +02:00
xaedes	39a2d15461	avoid stack overflow resulting from big ggml_cgraph replace stack allocation and ggml_build_forward by ggml_new_graph in combination with ggml_build_forward_expand	2023-08-16 16:42:25 +02:00
xaedes	0ab2507ce5	fix names of lora tensors	2023-08-16 16:41:20 +02:00
xaedes	620275361d	add debug prints for training memory improvements	2023-08-16 16:23:21 +02:00
xaedes	be7e564b11	bug fixes to make finetune compile automatic allocator does not work yet	2023-08-16 16:21:43 +02:00
xaedes	50b1e66200	remove const model and layer arguments in API functions for accessing model tensors	2023-08-16 16:21:02 +02:00
xaedes	28ee0c8583	first draft for LORA finetune training	2023-08-16 15:31:04 +02:00
xaedes	c0a372fd3d	add API functions to access remaining model parameters: mult, head and rot	2023-08-16 15:30:31 +02:00
xaedes	9eb1ef8653	move and remove code	2023-08-15 14:03:02 +02:00
xaedes	5e059ace25	add stub example for finetuning, based on train-text-from-scratch	2023-08-15 13:54:28 +02:00
xaedes	316b0707f4	add API functions to access llama model tensors	2023-08-15 13:53:13 +02:00
xaedes	3b5515bbe0	reverse order of for loop in ggml_build_backward_expand to save memory when using gradient checkpointing and allocator with this loop order gradient checkpointing with allocator on 16 layer model saves 13% memory; 2 layer memory it saves 2% memory. the computation results are the same	2023-08-14 22:09:36 +02:00
xaedes	56228461c8	fix memory "leak" in optimizers each iteration a new cplan with new memory for work data was allocated. now cplan creation only happens at the start of optimization, with each iteration reusing the cplan and its work data.	2023-08-14 21:12:02 +02:00
xaedes	3e6468b097	fix test when to create temporary backward graph temporary backward graph is only necessary when using checkpointing	2023-08-14 20:57:18 +02:00
xaedes	098654c277	only use ggml_allocr_alloc when tensor has NULL data and is no view	2023-08-14 20:57:18 +02:00
xaedes	faf3e21eaf	add debug asserts in ggml_allocr_alloc to some common pitfalls when using this function directly	2023-08-14 20:50:09 +02:00
xaedes	6e280b24dc	remove unused forward_batch function	2023-08-14 19:02:12 +02:00
xaedes	3794dceb7f	remove unused train params: mem_compute1_gb & mem_compute2_gb mem_compute_gb is used for compute when automatic memory allocator is not enabled, otherwise it can be very small to only hold the tensor definitions mem_compute0_gb is used for automatic memory allocator (as long as measurement of max required size is not implemented)	2023-08-14 18:44:42 +02:00
xaedes	6f161c784b	remove trailing whitespace	2023-08-14 18:33:27 +02:00
xaedes	271e4d64b5	remove unused training parameters "use_scratch" and "use_unified"	2023-08-14 18:31:59 +02:00
xaedes	c954f41ca4	remove handwritten training functions	2023-08-14 18:30:50 +02:00
xaedes	fe788a1c7a	allocate graph on context using ggml_new_graph	2023-08-14 18:24:13 +02:00
xaedes	75baed230c	set names for tensors in unified train function for easier debugging	2023-08-14 18:17:14 +02:00
xaedes	3e99a8d653	format name of cloned tensors with " (clone)" suffix	2023-08-14 18:15:09 +02:00
xaedes	865c4cd3c1	integrate unified training function which may use memory allocator the unified training function also supports arguments whether to use flash attention and/or gradient checkpointing	2023-08-14 18:12:58 +02:00
xaedes	4ed096c6b0	add training options whether to use allocator and/or unified training function	2023-08-14 18:10:02 +02:00
xaedes	d6c5b03858	fix ASSERT to work with zero layers	2023-08-14 18:08:19 +02:00
xaedes	38f4438c32	make sure some tensors are not reallocated by inserting new temporary nodes depending on them: output and parameter gradient tensors need to be available at the end of the graph execution parameter gradient tensors also need to be available before the graph execution because they are set to zero before each optimizer iteration checkpoint tensors are allocated all together to reduce memory allocator fragmentation afterwards, in addition to the temporary nodes, we also need to reset the temporary leafs	2023-08-14 18:07:16 +02:00
xaedes	9716eb8ef0	fix variable name and add missing boolean negation	2023-08-14 17:59:19 +02:00
xaedes	5884b43a62	add input tensors as checkpoints so that recursive tensor cloning of gradient checkpointing terminates on input tensors	2023-08-14 17:58:49 +02:00
xaedes	b2f1310196	swap arguments to commutative ops to be the same as in `forward_batch_wo_cache_flash_attn`	2023-08-14 17:57:13 +02:00
xaedes	5a11b75875	fix variable names	2023-08-14 17:55:51 +02:00
xaedes	345f516f7c	correctly clone view tensors by setting data pointers without this the checkpointing would only work when being used together with memory allocator	2023-08-14 17:55:13 +02:00
xaedes	52c92c0a8c	terminate recursive tensor cloning when reaching tensor without src tensors	2023-08-14 17:53:36 +02:00
xaedes	0dd496c5e2	fix variable name and add missing type cast	2023-08-14 17:52:48 +02:00
xaedes	cfddc36be2	correctly clone reshape and permute operations by also cloning tensor->nb values	2023-08-14 17:52:15 +02:00
xaedes	d43741540b	don't use allocate hash_map on context because the context has no_alloc=True when using memory allocator resulting in NULL data pointers	2023-08-14 17:51:20 +02:00
xaedes	fc826c8ea8	in train function replace add_inplace by regular add because using add_inplace seems to result in different gradients	2023-08-14 17:49:22 +02:00
xaedes	2bf422eafd	add train function using automatic gradient checkpointing backward pass and allocator	2023-08-06 23:07:57 +02:00
xaedes	d43af4b543	Merge branch 'master' into pr-train-mem-usage-improvements	2023-08-06 17:30:17 +02:00
DannyDaemonic	86c3219895	console : fix issue related to Windows 11 PowerShell console mode persistence (#2521 )	2023-08-06 09:49:34 +03:00
Keiichi Tabata	2e8265ae17	convert.py : add missing abstract methods for quantized data (#2491 )	2023-08-06 09:34:05 +03:00
Johannes Gäßler	f514d1b306	CUDA: faster k-quant mul_mat_q kernels (#2525 )	2023-08-05 18:20:44 +02:00
Jonas Wunderlich	332311234a	fix firefox autoscroll (#2519 )	2023-08-04 22:16:11 +02:00
Cebtenzzre	182af739c4	server: regenerate completion.js.hpp (#2515 )	2023-08-04 21:00:57 +02:00
Cebtenzzre	4329d1acb0	CUDA: use min compute capability of GPUs actually used (#2506 )	2023-08-04 17:35:22 +02:00
Cebtenzzre	02f9d96a86	CUDA: check if event is NULL before cudaStreamWaitEvent (#2505 ) Fixes #2503	2023-08-04 17:34:32 +02:00
DannyDaemonic	3498588e0f	Add --simple-io option for subprocesses and break out console.h and cpp (#1558 )	2023-08-04 08:20:12 -07:00
Stephen Nichols	5f631c2679	Fixing race condition in server and partial stream handling in frontend. (#2391 ) * Fixing race condition in server.cpp and partial stream handling in completion.js * Reverting assert edits. * Adding newline to eof	2023-08-04 13:37:24 +02:00
l3utterfly	415e99fec2	Stream save llama context data to file instead of allocating entire buffer upfront (#2488 ) * added stream saving context data to file to avoid allocating unnecessary amounts of memory * generalised copying state data to file or buffer * added comments explaining how copy_state_data works * fixed trailing whitespaces * fixed save load state example * updated save load state to use public function in llama.cpp * - restored breakage of the llama_copy_state_data API - moved new logic for copying llama state data to internal function * fixed function declaration order * restored save load state example * fixed whitepace * removed unused llama-util.h include * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * Apply code review suggestions Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-08-04 13:29:52 +02:00

1 2 3 4 5 ...

1031 commits