Commit graph

992 commits

Author SHA1 Message Date
xaedes
d43af4b543
Merge branch 'master' into pr-train-mem-usage-improvements 2023-08-06 17:30:17 +02:00
DannyDaemonic
86c3219895
console : fix issue related to Windows 11 PowerShell console mode persistence (#2521) 2023-08-06 09:49:34 +03:00
Keiichi Tabata
2e8265ae17
convert.py : add missing abstract methods for quantized data (#2491) 2023-08-06 09:34:05 +03:00
Johannes Gäßler
f514d1b306
CUDA: faster k-quant mul_mat_q kernels (#2525) 2023-08-05 18:20:44 +02:00
Jonas Wunderlich
332311234a
fix firefox autoscroll (#2519) 2023-08-04 22:16:11 +02:00
Cebtenzzre
182af739c4
server: regenerate completion.js.hpp (#2515) 2023-08-04 21:00:57 +02:00
Cebtenzzre
4329d1acb0
CUDA: use min compute capability of GPUs actually used (#2506) 2023-08-04 17:35:22 +02:00
Cebtenzzre
02f9d96a86
CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)
Fixes #2503
2023-08-04 17:34:32 +02:00
DannyDaemonic
3498588e0f
Add --simple-io option for subprocesses and break out console.h and cpp (#1558) 2023-08-04 08:20:12 -07:00
Stephen Nichols
5f631c2679
Fixing race condition in server and partial stream handling in frontend. (#2391)
* Fixing race condition in server.cpp and partial stream handling in completion.js

* Reverting assert edits.

* Adding newline to eof
2023-08-04 13:37:24 +02:00
l3utterfly
415e99fec2
Stream save llama context data to file instead of allocating entire buffer upfront (#2488)
* added stream saving context data to file to avoid allocating unnecessary amounts of memory

* generalised copying state data to file or buffer

* added comments explaining how copy_state_data works

* fixed trailing whitespaces

* fixed save load state example

* updated save load state to use public function in llama.cpp

* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function

* fixed function declaration order

* restored save load state example

* fixed whitepace

* removed unused llama-util.h include

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* Apply code review suggestions

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-08-04 13:29:52 +02:00
Borislav Stanimirov
ff966e7ca6
build : fix several cast and printf warnings (#2499) 2023-08-04 13:07:21 +03:00
Evan Jones
8183159cf3
examples : generate JSON according to schema (#1887)
* examples : add JSON schema grammars

* complete JSON grammar

* ensure primitive types can be used as root of schema

* support integer type and adjust usage text
2023-08-02 22:05:44 -04:00
Johannes Gäßler
468ea24fb4
CUDA: faster non k-quant mul_mat_q kernels (#2483) 2023-08-02 18:04:04 +02:00
Johannes Gäßler
4f6b60c776
CUDA: Fix models with output size != 32000 (#2480) 2023-08-02 16:48:10 +02:00
ldwang
220d931864
readme : add Aquila-7B model series to supported models (#2487)
* support bpe tokenizer in convert

Signed-off-by: ldwang <ftgreat@gmail.com>

* support bpe tokenizer in convert

Signed-off-by: ldwang <ftgreat@gmail.com>

* support bpe tokenizer in convert, fix

Signed-off-by: ldwang <ftgreat@gmail.com>

* Add Aquila-7B models in README.md

Signed-off-by: ldwang <ftgreat@gmail.com>

* Up Aquila-7B models in README.md

Signed-off-by: ldwang <ftgreat@gmail.com>

---------

Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-08-02 11:21:11 +03:00
Eve
81844fbcfd
tests : Fix compilation warnings (Linux/GCC) (#2451)
* fix hellaswag print format, cast away warning in test-double-float

* c++11 cannot use designated initializers

* add static to test-grad0.c internal functions

* use memcpy in test-double-float.c

* port c tests to c++

* use initializer list for ggml_init_params
2023-08-02 11:06:19 +03:00
Yiming Cui
a312193e18
readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)
* add support for chinese llama-2 / alpaca-2

* remove white spaces
2023-08-02 09:18:31 +03:00
Bono Lv
c574bddb36
fix a typo in examples/server/README.md (#2478) 2023-08-01 14:54:28 +02:00
ebraminio
86aeb27734
server : Support dark mode (#2414)
* server : Support dark mode

So it respects user system light / dark settings.

* Update index.html.hpp by running ./deps.sh
2023-08-01 10:56:23 +02:00
Matteo Boschini
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
* Added gqa8 kernel to allow llama-2-70B on metal

* Update ggml-metal.m

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>

* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast

* Added ne03==ne13 assertion

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-08-01 10:43:12 +03:00
Johannes Gäßler
49e7cb5bb1
CUDA: fixed LLAMA_FAST compilation option (#2473) 2023-07-31 21:02:19 +02:00
Johannes Gäßler
b772bba42e
CUDA: fixed cmake F16 option (#2471) 2023-07-31 19:52:22 +02:00
Johannes Gäßler
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues (#2453) 2023-07-31 15:44:35 +02:00
Johannes Gäßler
1215ed7d5c
CUDA: Implemented row flattening for non-glm RoPE (#2468) 2023-07-31 14:32:30 +02:00
Johannes Gäßler
2dbf518911
CUDA: fewer memory bank conflicts for mul_mat_q (#2458) 2023-07-31 13:18:51 +02:00
slaren
9d2382b3e4
Fix Metal backend broken from the allocator changes (#2455)
* fix Metal backend broken from the allocator changes
2023-07-31 11:02:53 +02:00
slaren
a113689571
ggml : add graph tensor allocator (#2411)
* ggml : add graph tensor allocator

* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset

* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
2023-07-30 15:58:01 +02:00
Johannes Gäßler
11f3ca06b8
CUDA: Quantized matrix matrix multiplication (#2160)
* mmq implementation for non k-quants

* q6_K

* q2_K

* q3_k

* q4_K

* vdr

* q5_K

* faster q8_1 loading

* loop unrolling

* add __restrict__

* q2_K sc_high

* GGML_CUDA_MMQ_Y

* Updated Makefile

* Update Makefile

* DMMV_F16 -> F16

* Updated README, CMakeLists

* Fix CMakeLists.txt

* Fix CMakeLists.txt

* Fix multi GPU out-of-bounds
2023-07-29 23:04:44 +02:00
Johannes Gäßler
9baf9ef304
CUDA: faster multi GPU synchronization (#2448) 2023-07-29 23:04:10 +02:00
xaedes
22cb368dd9
remove trailing whitespace 2023-07-28 23:55:30 +02:00
xaedes
c1a5e116a4
llama training : fix ggml_rms_norm_back calls to pass configurable eps 2023-07-28 23:13:20 +02:00
xaedes
ecdc16163e
ggml : update ggml_rms_norm_back with configurable eps 2023-07-28 23:13:20 +02:00
xaedes
87035b96f7
remove out-commented vectorized code of opt_adam
the vectorized code might be bit faster for low number of parameters, but it had a big memory usage overhead
2023-07-28 23:13:20 +02:00
xaedes
0f6a8ab519
tighten abs error bounds for sqrt in test-grad0 2023-07-28 23:13:20 +02:00
xaedes
47055c929f
tighten abs error bounds for flash_attn in test-grad0 2023-07-28 23:13:20 +02:00
xaedes
dbbc263313
add conditional compilation of using F16 exp in flash attention
uncomment `// #define GGML_FLASH_ATTN_EXP_FP16` to enable usage of f16 exp in flash attention
2023-07-28 23:13:20 +02:00
xaedes
1065c3b7b9
tighten abs error bounds for cross_entropy_loss in test-grad0 2023-07-28 23:13:20 +02:00
xaedes
24a4b099f3
change sampling parameters for prediction after training to defaults of common.h
and clarify what is context for prediction and what are generated tokens
2023-07-28 23:13:19 +02:00
xaedes
17a0898d50
fix increase of model.train_samples and model.train_tokens
now that each optimizer iteration gets its own batch we need to multiply by number of opt iterations
2023-07-28 23:13:19 +02:00
xaedes
58024d3e5f
rename training parameter cos-decay-alpha to cos-decay-min and clarify that adam-min-alpha also applies to warmup 2023-07-28 23:13:19 +02:00
xaedes
e6ff0728e0
add minimum number of tensor dimensions to apply weight decay (default 2)
this allows to not apply weight decay to bias parameters
2023-07-28 23:13:19 +02:00
xaedes
d7aa4d9576
use optimization callback in training
allows dynamic learning schedule and different batch data for each iteration without relying on low n_iter and high n_examples parameters

reduces runtime by avoiding restart of optimization function and improves training convergence by providing a different batch for each iteration
2023-07-28 23:13:19 +02:00
xaedes
bfc3119139
add optimization callback to ggml_opt_resume_g
this callback is called before each iteration with custom data and pointer to learning schedule parameter (only used in Adam(W)).

can be used for dynamic learning schedule and setting input data for batches before each iteration
2023-07-28 23:13:18 +02:00
xaedes
e843d6e71c
measure and print total training time 2023-07-28 23:13:18 +02:00
xaedes
ff759d957c
remove unused function argument from get_example_targets_batch 2023-07-28 23:13:18 +02:00
xaedes
ce937bc431
replace memcpy with reshape operation so that the graph is not cut at the input
this makes it possible to store other values into the input tensor and then simply recompute the graph without rebuilding it
2023-07-28 23:13:18 +02:00
xaedes
c6a18e15c1
add more training parameters:
--enable-restart N         Only for Adam optimizer. Enable restarts of cos-decay
--disable-restart N        Only for Adam optimizer. Disable restarts of cos-decay
--opt-past N               Number of optimization iterations to track for delta convergence test. Disabled when zero.
--opt-delta N              Maximum delta for delta convergence test. Disabled when <= zero.
--opt-max-no-improvement N Maximum number of optimization iterations with no improvement. Disabled when <= zero.
--adam-epsf N              AdamW epsilon for convergence test. Disabled when <= zero.
--adam-min-alpha N         Adam minimum learning rate alpha, usually 0.1 * alpha
2023-07-28 23:13:18 +02:00
xaedes
d0fbb7d328
llama : fix rope usage in train-text-from-scratch after ChatGLM change 2023-07-28 23:13:17 +02:00
xaedes
fc379a2de3
disable gradient checkpointing debug output 2023-07-28 23:13:17 +02:00