xaedes
3794dceb7f
remove unused train params: mem_compute1_gb & mem_compute2_gb
...
mem_compute_gb is used for compute when automatic memory allocator is not enabled, otherwise it can be very small to only hold the tensor definitions
mem_compute0_gb is used for automatic memory allocator (as long as measurement of max required size is not implemented)
2023-08-14 18:44:42 +02:00
xaedes
6f161c784b
remove trailing whitespace
2023-08-14 18:33:27 +02:00
xaedes
271e4d64b5
remove unused training parameters "use_scratch" and "use_unified"
2023-08-14 18:31:59 +02:00
xaedes
c954f41ca4
remove handwritten training functions
2023-08-14 18:30:50 +02:00
xaedes
fe788a1c7a
allocate graph on context using ggml_new_graph
2023-08-14 18:24:13 +02:00
xaedes
75baed230c
set names for tensors in unified train function for easier debugging
2023-08-14 18:17:14 +02:00
xaedes
3e99a8d653
format name of cloned tensors with " (clone)" suffix
2023-08-14 18:15:09 +02:00
xaedes
865c4cd3c1
integrate unified training function which may use memory allocator
...
the unified training function also supports arguments whether to use flash attention and/or gradient checkpointing
2023-08-14 18:12:58 +02:00
xaedes
4ed096c6b0
add training options whether to use allocator and/or unified training function
2023-08-14 18:10:02 +02:00
xaedes
d6c5b03858
fix ASSERT to work with zero layers
2023-08-14 18:08:19 +02:00
xaedes
38f4438c32
make sure some tensors are not reallocated by inserting new temporary nodes depending on them:
...
output and parameter gradient tensors need to be available at the end of the graph execution
parameter gradient tensors also need to be available before the graph execution because they are set to zero before each optimizer iteration
checkpoint tensors are allocated all together to reduce memory allocator fragmentation
afterwards, in addition to the temporary nodes, we also need to reset the temporary leafs
2023-08-14 18:07:16 +02:00
xaedes
9716eb8ef0
fix variable name and add missing boolean negation
2023-08-14 17:59:19 +02:00
xaedes
5884b43a62
add input tensors as checkpoints
...
so that recursive tensor cloning of gradient checkpointing terminates on input tensors
2023-08-14 17:58:49 +02:00
xaedes
b2f1310196
swap arguments to commutative ops to be the same as in forward_batch_wo_cache_flash_attn
2023-08-14 17:57:13 +02:00
xaedes
5a11b75875
fix variable names
2023-08-14 17:55:51 +02:00
xaedes
345f516f7c
correctly clone view tensors by setting data pointers
...
without this the checkpointing would only work when being used together with memory allocator
2023-08-14 17:55:13 +02:00
xaedes
52c92c0a8c
terminate recursive tensor cloning when reaching tensor without src tensors
2023-08-14 17:53:36 +02:00
xaedes
0dd496c5e2
fix variable name and add missing type cast
2023-08-14 17:52:48 +02:00
xaedes
cfddc36be2
correctly clone reshape and permute operations by also cloning tensor->nb values
2023-08-14 17:52:15 +02:00
xaedes
d43741540b
don't use allocate hash_map on context
...
because the context has no_alloc=True when using memory allocator resulting in NULL data pointers
2023-08-14 17:51:20 +02:00
xaedes
fc826c8ea8
in train function replace add_inplace by regular add
...
because using add_inplace seems to result in different gradients
2023-08-14 17:49:22 +02:00
xaedes
2bf422eafd
add train function using automatic gradient checkpointing backward pass and allocator
2023-08-06 23:07:57 +02:00
xaedes
d43af4b543
Merge branch 'master' into pr-train-mem-usage-improvements
2023-08-06 17:30:17 +02:00
DannyDaemonic
86c3219895
console : fix issue related to Windows 11 PowerShell console mode persistence ( #2521 )
2023-08-06 09:49:34 +03:00
Keiichi Tabata
2e8265ae17
convert.py : add missing abstract methods for quantized data ( #2491 )
2023-08-06 09:34:05 +03:00
Johannes Gäßler
f514d1b306
CUDA: faster k-quant mul_mat_q kernels ( #2525 )
2023-08-05 18:20:44 +02:00
Jonas Wunderlich
332311234a
fix firefox autoscroll ( #2519 )
2023-08-04 22:16:11 +02:00
Cebtenzzre
182af739c4
server: regenerate completion.js.hpp ( #2515 )
2023-08-04 21:00:57 +02:00
Cebtenzzre
4329d1acb0
CUDA: use min compute capability of GPUs actually used ( #2506 )
2023-08-04 17:35:22 +02:00
Cebtenzzre
02f9d96a86
CUDA: check if event is NULL before cudaStreamWaitEvent ( #2505 )
...
Fixes #2503
2023-08-04 17:34:32 +02:00
DannyDaemonic
3498588e0f
Add --simple-io option for subprocesses and break out console.h and cpp ( #1558 )
2023-08-04 08:20:12 -07:00
Stephen Nichols
5f631c2679
Fixing race condition in server and partial stream handling in frontend. ( #2391 )
...
* Fixing race condition in server.cpp and partial stream handling in completion.js
* Reverting assert edits.
* Adding newline to eof
2023-08-04 13:37:24 +02:00
l3utterfly
415e99fec2
Stream save llama context data to file instead of allocating entire buffer upfront ( #2488 )
...
* added stream saving context data to file to avoid allocating unnecessary amounts of memory
* generalised copying state data to file or buffer
* added comments explaining how copy_state_data works
* fixed trailing whitespaces
* fixed save load state example
* updated save load state to use public function in llama.cpp
* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function
* fixed function declaration order
* restored save load state example
* fixed whitepace
* removed unused llama-util.h include
* Apply suggestions from code review
Co-authored-by: slaren <slarengh@gmail.com>
* Apply code review suggestions
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2023-08-04 13:29:52 +02:00
Borislav Stanimirov
ff966e7ca6
build : fix several cast and printf warnings ( #2499 )
2023-08-04 13:07:21 +03:00
Evan Jones
8183159cf3
examples : generate JSON according to schema ( #1887 )
...
* examples : add JSON schema grammars
* complete JSON grammar
* ensure primitive types can be used as root of schema
* support integer type and adjust usage text
2023-08-02 22:05:44 -04:00
Johannes Gäßler
468ea24fb4
CUDA: faster non k-quant mul_mat_q kernels ( #2483 )
2023-08-02 18:04:04 +02:00
Johannes Gäßler
4f6b60c776
CUDA: Fix models with output size != 32000 ( #2480 )
2023-08-02 16:48:10 +02:00
ldwang
220d931864
readme : add Aquila-7B model series to supported models ( #2487 )
...
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
* Add Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
* Up Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-08-02 11:21:11 +03:00
Eve
81844fbcfd
tests : Fix compilation warnings (Linux/GCC) ( #2451 )
...
* fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params
2023-08-02 11:06:19 +03:00
Yiming Cui
a312193e18
readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models ( #2475 )
...
* add support for chinese llama-2 / alpaca-2
* remove white spaces
2023-08-02 09:18:31 +03:00
Bono Lv
c574bddb36
fix a typo in examples/server/README.md ( #2478 )
2023-08-01 14:54:28 +02:00
ebraminio
86aeb27734
server : Support dark mode ( #2414 )
...
* server : Support dark mode
So it respects user system light / dark settings.
* Update index.html.hpp by running ./deps.sh
2023-08-01 10:56:23 +02:00
Matteo Boschini
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal ( #2459 )
...
* Added gqa8 kernel to allow llama-2-70B on metal
* Update ggml-metal.m
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast
* Added ne03==ne13 assertion
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-08-01 10:43:12 +03:00
Johannes Gäßler
49e7cb5bb1
CUDA: fixed LLAMA_FAST compilation option ( #2473 )
2023-07-31 21:02:19 +02:00
Johannes Gäßler
b772bba42e
CUDA: fixed cmake F16 option ( #2471 )
2023-07-31 19:52:22 +02:00
Johannes Gäßler
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues ( #2453 )
2023-07-31 15:44:35 +02:00
Johannes Gäßler
1215ed7d5c
CUDA: Implemented row flattening for non-glm RoPE ( #2468 )
2023-07-31 14:32:30 +02:00
Johannes Gäßler
2dbf518911
CUDA: fewer memory bank conflicts for mul_mat_q ( #2458 )
2023-07-31 13:18:51 +02:00
slaren
9d2382b3e4
Fix Metal backend broken from the allocator changes ( #2455 )
...
* fix Metal backend broken from the allocator changes
2023-07-31 11:02:53 +02:00
slaren
a113689571
ggml : add graph tensor allocator ( #2411 )
...
* ggml : add graph tensor allocator
* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset
* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
2023-07-30 15:58:01 +02:00