xaedes
4ed096c6b0
add training options whether to use allocator and/or unified training function
2023-08-14 18:10:02 +02:00
xaedes
d6c5b03858
fix ASSERT to work with zero layers
2023-08-14 18:08:19 +02:00
xaedes
38f4438c32
make sure some tensors are not reallocated by inserting new temporary nodes depending on them:
...
output and parameter gradient tensors need to be available at the end of the graph execution
parameter gradient tensors also need to be available before the graph execution because they are set to zero before each optimizer iteration
checkpoint tensors are allocated all together to reduce memory allocator fragmentation
afterwards, in addition to the temporary nodes, we also need to reset the temporary leafs
2023-08-14 18:07:16 +02:00
xaedes
9716eb8ef0
fix variable name and add missing boolean negation
2023-08-14 17:59:19 +02:00
xaedes
5884b43a62
add input tensors as checkpoints
...
so that recursive tensor cloning of gradient checkpointing terminates on input tensors
2023-08-14 17:58:49 +02:00
xaedes
b2f1310196
swap arguments to commutative ops to be the same as in forward_batch_wo_cache_flash_attn
2023-08-14 17:57:13 +02:00
xaedes
5a11b75875
fix variable names
2023-08-14 17:55:51 +02:00
xaedes
345f516f7c
correctly clone view tensors by setting data pointers
...
without this the checkpointing would only work when being used together with memory allocator
2023-08-14 17:55:13 +02:00
xaedes
52c92c0a8c
terminate recursive tensor cloning when reaching tensor without src tensors
2023-08-14 17:53:36 +02:00
xaedes
0dd496c5e2
fix variable name and add missing type cast
2023-08-14 17:52:48 +02:00
xaedes
cfddc36be2
correctly clone reshape and permute operations by also cloning tensor->nb values
2023-08-14 17:52:15 +02:00
xaedes
d43741540b
don't use allocate hash_map on context
...
because the context has no_alloc=True when using memory allocator resulting in NULL data pointers
2023-08-14 17:51:20 +02:00
xaedes
fc826c8ea8
in train function replace add_inplace by regular add
...
because using add_inplace seems to result in different gradients
2023-08-14 17:49:22 +02:00
Jhen-Jie Hong
d783f7982e
metal : return null instead of exit(1) ( #2573 )
2023-08-14 16:37:39 +03:00
Cheng Shao
d75561df20
server : add --numa support ( #2524 )
2023-08-14 16:36:42 +03:00
Kamil Tomšík
348acf188c
llama : add missing enum keyword in function signatures ( #2610 )
2023-08-14 16:35:16 +03:00
Johannes Gäßler
1cd06fa25e
CUDA: launch_bounds, small q4_K, q5_K mmq refactor ( #2596 )
2023-08-14 10:41:22 +02:00
Jhen-Jie Hong
2feb8934eb
server : fix default grammar by use empty string in the UI ( #2604 )
2023-08-14 16:20:17 +08:00
Jhen-Jie Hong
5517d6e692
server : implement json-schema-to-grammar.mjs & add grammar param in the UI ( #2588 )
...
* server : implement json-schema-to-grammar.mjs by follow python impl
* server : add grammar support in chat.mjs
* server : implement grammer param in the UI
* server : generate .hpp
* server : remove trailing whitespaces
* server : generate .hpp
* server : fix sort of prop pairs
* server : optimize regex & iteration
2023-08-14 15:16:54 +08:00
vxiiduu
f31b539714
Enhance Windows 7 and below compatibility. ( #2592 )
...
* Enhance Windows 7 compatibility.
* Clean away unnecessary preprocessor conditional
2023-08-13 20:59:16 -07:00
drbh
ee77efea2a
test : add simple grammar parsing tests ( #2594 )
...
* adds simple grammar parsing tests
* adds cassert header
2023-08-13 17:00:48 +03:00
Johannes Gäßler
f64d44a9b9
CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time ( #2590 )
2023-08-13 00:24:45 +02:00
byte-6174
b19edd54d5
Adding support for llama2.c models ( #2559 )
2023-08-12 01:17:25 +02:00
Equim
53dc399472
server: fixed wrong variable name in timing json ( #2579 )
...
* server: fixed wrong variable name in timing json
* remove redunct entry
2023-08-12 00:35:14 +02:00
DannyDaemonic
9ca4abed89
Handle ENABLE_VIRTUAL_TERMINAL_PROCESSING
more gracefully on earlier versions of Windows.
2023-08-10 13:11:36 -07:00
Christian Demsar
e59fcb2bc1
Add --n-predict -2 for stopping generation on full context ( #2565 )
2023-08-10 16:28:27 +02:00
Martin Krasser
1638757767
Fix grammar-based sampling issue in server ( #2566 )
2023-08-10 13:16:38 +03:00
Sam Spilsbury
916a9acdd0
ggml-alloc: Don't try to re-use buffers of external tensors ( #2562 )
...
* ggml-alloc: Don't try to re-use buffers of external tensors
They might be weights that came from another context, so we
have no control over them (and they might be re-used elsewhere
so writing to them would be a bad idea).
* ggml-alloc: >= when checking for out-of-bounds
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2023-08-09 22:47:42 +02:00
grahameth
ea04a4ca19
add log_callback to llama_context_params for custom logging. ( #2234 )
...
* add log_callback to llama_context_params for custom logging.
* Fix macro expansion on gcc
* Add struct llama_state for global variables and move log_callback there
* Turn log level into enum and some minor changes.
* Remove model_for_logging parameter (not needed anymore)
* Convert remaining fprintf(stderr, ...) calls to use new macros.
* Fix enum and initialize g_state
* Fix log calls after merge
* Fix missing static
* Add back all the new lines in the logging strings
* Add comment for llama_log_callback and replace remaining printf calls
---------
Co-authored-by: grahameth <->
Co-authored-by: Helmut <helmut.buhler@inf.h-brs.de>
2023-08-09 22:46:40 +02:00
Johannes Gäßler
25d43e0eb5
CUDA: tuned mul_mat_q kernels ( #2546 )
2023-08-09 09:42:34 +02:00
Martin Krasser
f5bfea0580
Allow passing grammar to completion endpoint ( #2532 )
...
* Allow passing grammar to completion endpoint
2023-08-08 16:29:19 +03:00
Johannes Gäßler
acfc5478ff
CUDA: tighter VRAM scratch size for 65b/70b ( #2551 )
2023-08-08 14:38:16 +02:00
chaihahaha
7ed8d1fe7f
llm.vim : multiline autocompletion, get rid of "^@" ( #2543 )
2023-08-08 15:07:02 +03:00
Georgi Gerganov
e7f94d6fdc
vim : bring back simple llm.vim example
2023-08-08 15:06:18 +03:00
AustinMroz
2d7baaf50f
vim : streaming and more ( #2495 )
...
* Update Vim plugin
* Remove getbufoneline usage, Add input bind example.
getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.
An additional example that explains how to add a keybind that works in
insert mode was added.
2023-08-08 14:44:48 +03:00
klosax
f3c3b4b167
Add --rope-scale parameter ( #2544 )
...
* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling
2023-08-07 19:07:19 +02:00
Georgi Gerganov
93356bdb7a
ggml : mul mat tweaks ( #2372 )
...
* ggml : mul mat wip
ggml-ci
* ggml : alternative thread distribution for mul_mat
ggml-ci
* ggml : mul_mat block tiling attempt
* ggml : mul_mat threads yield
ggml-ci
2023-08-07 14:25:58 +03:00
Georgi Gerganov
60baff7c85
ggml : pad result of ggml_nbytes()
2023-08-07 14:24:42 +03:00
Georgi Gerganov
9082b5dfbf
ggml : change params pointer (style change) ( #2539 )
...
ggml-ci
2023-08-07 13:55:18 +03:00
Georgi Gerganov
99d29c0094
ggml : sync (custom ops) ( #2537 )
...
ggml-ci
2023-08-07 13:20:09 +03:00
Johannes Gäßler
3d9a551816
Fixed mmap prefetch for GPU offloading ( #2529 )
2023-08-07 10:09:40 +02:00
Georgi Gerganov
f6f9896ac3
metal : fix out-of-bounds access + inc concurrency nodes ( #2416 )
...
* metal : fix out-of-bounds access + style changes
* metal : increase concurrency nodes to 2*GGML_MAX_NODES
2023-08-07 10:52:57 +03:00
GiviMAD
34a14b28ff
[Makefile] Move ARM CFLAGS before compilation ( #2536 )
2023-08-07 09:21:46 +03:00
Henri Vasserman
7297128db8
[Zig] Rewrite build for Zig 0.11 ( #2514 )
...
* zig build fixes
* Disable LTO on Windows.
2023-08-07 08:35:53 +03:00
xaedes
2bf422eafd
add train function using automatic gradient checkpointing backward pass and allocator
2023-08-06 23:07:57 +02:00
xaedes
d43af4b543
Merge branch 'master' into pr-train-mem-usage-improvements
2023-08-06 17:30:17 +02:00
DannyDaemonic
86c3219895
console : fix issue related to Windows 11 PowerShell console mode persistence ( #2521 )
2023-08-06 09:49:34 +03:00
Keiichi Tabata
2e8265ae17
convert.py : add missing abstract methods for quantized data ( #2491 )
2023-08-06 09:34:05 +03:00
Johannes Gäßler
f514d1b306
CUDA: faster k-quant mul_mat_q kernels ( #2525 )
2023-08-05 18:20:44 +02:00
Jonas Wunderlich
332311234a
fix firefox autoscroll ( #2519 )
2023-08-04 22:16:11 +02:00