* added stream saving context data to file to avoid allocating unnecessary amounts of memory
* generalised copying state data to file or buffer
* added comments explaining how copy_state_data works
* fixed trailing whitespaces
* fixed save load state example
* updated save load state to use public function in llama.cpp
* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function
* fixed function declaration order
* restored save load state example
* fixed whitepace
* removed unused llama-util.h include
* Apply suggestions from code review
Co-authored-by: slaren <slarengh@gmail.com>
* Apply code review suggestions
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
* examples : add JSON schema grammars
* complete JSON grammar
* ensure primitive types can be used as root of schema
* support integer type and adjust usage text
* fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params
* ggml : add graph tensor allocator
* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset
* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
allows dynamic learning schedule and different batch data for each iteration without relying on low n_iter and high n_examples parameters
reduces runtime by avoiding restart of optimization function and improves training convergence by providing a different batch for each iteration
this callback is called before each iteration with custom data and pointer to learning schedule parameter (only used in Adam(W)).
can be used for dynamic learning schedule and setting input data for batches before each iteration
--enable-restart N Only for Adam optimizer. Enable restarts of cos-decay
--disable-restart N Only for Adam optimizer. Disable restarts of cos-decay
--opt-past N Number of optimization iterations to track for delta convergence test. Disabled when zero.
--opt-delta N Maximum delta for delta convergence test. Disabled when <= zero.
--opt-max-no-improvement N Maximum number of optimization iterations with no improvement. Disabled when <= zero.
--adam-epsf N AdamW epsilon for convergence test. Disabled when <= zero.
--adam-min-alpha N Adam minimum learning rate alpha, usually 0.1 * alpha