Commit graph

749 commits

Author SHA1 Message Date
Maxime
503db28849
llama : fix name shadowing and C4146 (#1526)
* Fix name shadowing and C4146

* Fix if macros not using defined when required

* Update llama-util.h

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update llama-util.h

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Code style

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-20 10:22:37 +03:00
Georgi Gerganov
8a203f9fa1 llama : fix compile warnings in llama_set_state_data() 2023-05-20 10:14:43 +03:00
Georgi Gerganov
4fd3e29297 ggml : fix scalar implementation of Q4_1 dot 2023-05-20 10:13:19 +03:00
Georgi Gerganov
2d5db48371
ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)
* ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0

* llama : bump LLAMA_FILE_VERSION to 3

* cuda : update Q4 and Q8 dequantize kernels

* ggml : fix AVX dot products

* readme : update performance table + hot topics
2023-05-19 22:17:18 +03:00
Georgi Gerganov
6986c7835a
tests : add missing header 2023-05-19 21:17:28 +03:00
Evan Jones
943e6081cc
examples : add persistent chat (#1495)
* examples : add persistent chat

* examples : fix whitespace

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-19 20:39:51 +03:00
Jason McCartney
7694b52b9a
main : make reverse prompt option act as a stop token in non-interactive mode (#1032)
* Make reverse prompt option act as a stop token in non-interactive scenarios

* Making requested review changes

* Update gpt_params_parse and fix a merge error

* Revert "Update gpt_params_parse and fix a merge error"

This reverts commit 2bb2ff1748.

* Update gpt_params_parse and fix a merge error take 2
2023-05-19 20:24:59 +03:00
David Kennedy
79e3efb0e9
readme : adds WizardLM to the list of supported models (#1485) 2023-05-19 20:16:30 +03:00
Georgi Gerganov
4b7e245adf
minor : fix compile warnings 2023-05-19 20:14:51 +03:00
xaedes
08a330a136
add cmake target for baby-llama-text 2023-05-19 18:41:26 +02:00
xaedes
332003584e
sample with non-greedy sampling parameters at the end of training 2023-05-19 18:41:06 +02:00
xaedes
e19ead6e3f
print used memory before and after optimization 2023-05-19 18:40:20 +02:00
xaedes
da86a1d736
fix cross entropy loss
- add target probabilities for each sample which is then used in cross entropy loss
2023-05-19 18:39:38 +02:00
xaedes
09b304d015
remove duplicate include 2023-05-19 18:36:05 +02:00
xaedes
37f5b76df1
ggml fixes to support backward pass on inplace operations 2023-05-19 18:35:40 +02:00
xaedes
44d83558bc
use different arguments for input and output checkpoint 2023-05-19 18:34:18 +02:00
xaedes
d8b0666429
initialize rng with srand 2023-05-19 18:29:47 +02:00
xaedes
25fe1c3815
use inplace functions where possible 2023-05-19 14:53:21 +02:00
Erik Scholz
5ea4339273
make kv_f16 the default for api users (#1517) 2023-05-18 19:31:01 +02:00
DannyDaemonic
ee9654138a
Fixes #1511 lambda issue for w64devkit (mingw) (#1513)
* Fix for w64devkit and mingw
2023-05-18 19:30:40 +02:00
Stephan Walter
dc271c52ed
Remove unused n_parts parameter (#1509) 2023-05-17 22:12:01 +00:00
rankaiyx
c238b5873a
benchmark-matmul: Print the average of the test results (#1490) 2023-05-17 16:47:58 +02:00
xaedes
b241b9cb6c
save train trained model to checkpoint and load model to be trained from checkpoint 2023-05-17 13:49:32 +02:00
xaedes
d328472f16
fix get_samples call, add model tensor names, increase model size, start training samples after newline 2023-05-17 12:52:20 +02:00
Tom Jobbins
2b2646931b
convert.py: Support models which are stored in a single pytorch_model.bin (#1469)
* Support models in a single pytorch_model.bin

* Remove spurious line with typo
2023-05-17 00:04:35 +02:00
Ilya Kurdyukov
42627421ec
~7% faster Q5_1 AVX2 code (#1477) 2023-05-16 18:36:47 +00:00
András Salamon
9560655409
define default model path once, sync path with readme (#1366) 2023-05-16 17:46:34 +02:00
sandyiscool
2a5ee023ad
Add alternate include path for openblas (#1476)
In some linux distributions (fedora, for example), the include path for openblas is located at '/usr/local/include'
2023-05-16 10:30:15 +02:00
xaedes
e063135d0b
add llama sampler, shuffle samples and constrain sampling to tokens occurring in train data 2023-05-15 21:12:28 +02:00
xaedes
ec881156f6
improve ggml_out_prod performance
- change iteration order (>15s -> 10s runtime)
- parallelize over one more dimension: over dst matrix rows (10s -> <5s runtime)
2023-05-15 14:42:24 +02:00
xaedes
19fb91899b
better weight initialization improves training convergence at start 2023-05-15 14:19:38 +02:00
xaedes
f3cf7df21f
better weight initialization improves training convergence at start 2023-05-15 14:18:57 +02:00
xaedes
efa4bb78ea
add ggml_out_prod and use it for mul_mat backward pass for improved performance
performance stats report improvement from 37 seconds to 16 seconds runtime during my training tests
2023-05-15 14:17:42 +02:00
zrm
63d20469b8
fix get_num_physical_cores() (#1436)
* fix get_num_physical_cores()
had been broken on complex topologies because "cpu cores" in /proc/cpuinfo is per-"physical id"

* Add spaces to maintain consistent formatting

---------

Co-authored-by: slaren <ddevesa@gmail.com>
2023-05-15 04:25:42 +02:00
slaren
b5c9295eef
benchmark-matmul: fix clang-tidy issues, report results in GFLOPS (#1458)
* benchmark-matmul: fix command line parsing, replace macros with functions, report results in GFLOPS
2023-05-14 22:46:00 +02:00
xaedes
a703d7a85f
activate threading in baby-llama-text 2023-05-14 21:00:55 +02:00
xaedes
d9b5268728
avoid printing too much newlines in baby-llama-text 2023-05-14 20:57:47 +02:00
xaedes
c054079fb8
improve performance of mul_mat backward pass
avoid transpose by using mul_mat with swapped arguments
2023-05-14 20:56:50 +02:00
xaedes
1f2b76de01
fix bug in ggml_compute_forward_soft_max_back_f32 on DEBUG build 2023-05-14 20:55:24 +02:00
xaedes
69108167cd
fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32
memcpy needs to be synchronized across threads to avoid race conditions.
=> do it in INIT phase
2023-05-14 20:54:57 +02:00
Johannes Gäßler
eb363627fd
cuda : deduplicated dequantization code (#1453) 2023-05-14 21:53:23 +03:00
xaedes
4339f8cf28
improve softmax backward pass
go from quadratic runtime to linear runtime by simplifying the formulas
2023-05-14 17:55:02 +02:00
xaedes
79b2d5b69d
ggml : alternative fix for race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 (#1454)
* fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32

memcpy needs to be synchronized across threads to avoid race conditions.
=> do it in INIT phase

* remove trailing whitespace

* Update ggml.c

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-14 18:55:02 +03:00
Georgi Gerganov
13c351ad72
ggml : various fixes (#1450)
- `ggml_rope()`
- `ggml_diag_mask_inf()` multi-threaded
- compatibility with scratch buffers
2023-05-14 18:22:50 +03:00
xaedes
ec1aea09ec
implement ggml_soft_max_back for more performant backward pass of soft_max
avoids creating big intermediate matrices of size n_embd x n_embd for llama layers and n_vocab x n_vocab for cross entropy loss
2023-05-14 17:16:26 +02:00
xaedes
f89c278d83
fix race condition bug in ggml_compute_forward_diag_mask_f32 2023-05-14 17:00:19 +02:00
xaedes
6e968d22b0
add text generating baby-llama from scratch example 2023-05-14 16:07:08 +02:00
katsu560
60f8c361ca
ggml : add AVX support based on AVX2 code (#1430) 2023-05-14 10:03:51 +00:00
Georgi Gerganov
601a033475
ggml : add GGML_QNT_VERSION to track quantization format changes
https://github.com/ggerganov/ggml/issues/150#issuecomment-1546625668
2023-05-14 10:20:19 +03:00
xaedes
6e88dc93bd
update python bindings 2023-05-13 19:05:24 +02:00