Maxime
503db28849
llama : fix name shadowing and C4146 ( #1526 )
...
* Fix name shadowing and C4146
* Fix if macros not using defined when required
* Update llama-util.h
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update llama-util.h
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Code style
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-20 10:22:37 +03:00
Georgi Gerganov
8a203f9fa1
llama : fix compile warnings in llama_set_state_data()
2023-05-20 10:14:43 +03:00
Georgi Gerganov
4fd3e29297
ggml : fix scalar implementation of Q4_1 dot
2023-05-20 10:13:19 +03:00
Georgi Gerganov
2d5db48371
ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 ( #1508 )
...
* ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0
* llama : bump LLAMA_FILE_VERSION to 3
* cuda : update Q4 and Q8 dequantize kernels
* ggml : fix AVX dot products
* readme : update performance table + hot topics
2023-05-19 22:17:18 +03:00
Georgi Gerganov
6986c7835a
tests : add missing header
2023-05-19 21:17:28 +03:00
Evan Jones
943e6081cc
examples : add persistent chat ( #1495 )
...
* examples : add persistent chat
* examples : fix whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-19 20:39:51 +03:00
Jason McCartney
7694b52b9a
main : make reverse prompt option act as a stop token in non-interactive mode ( #1032 )
...
* Make reverse prompt option act as a stop token in non-interactive scenarios
* Making requested review changes
* Update gpt_params_parse and fix a merge error
* Revert "Update gpt_params_parse and fix a merge error"
This reverts commit 2bb2ff1748
.
* Update gpt_params_parse and fix a merge error take 2
2023-05-19 20:24:59 +03:00
David Kennedy
79e3efb0e9
readme : adds WizardLM to the list of supported models ( #1485 )
2023-05-19 20:16:30 +03:00
Georgi Gerganov
4b7e245adf
minor : fix compile warnings
2023-05-19 20:14:51 +03:00
xaedes
08a330a136
add cmake target for baby-llama-text
2023-05-19 18:41:26 +02:00
xaedes
332003584e
sample with non-greedy sampling parameters at the end of training
2023-05-19 18:41:06 +02:00
xaedes
e19ead6e3f
print used memory before and after optimization
2023-05-19 18:40:20 +02:00
xaedes
da86a1d736
fix cross entropy loss
...
- add target probabilities for each sample which is then used in cross entropy loss
2023-05-19 18:39:38 +02:00
xaedes
09b304d015
remove duplicate include
2023-05-19 18:36:05 +02:00
xaedes
37f5b76df1
ggml fixes to support backward pass on inplace operations
2023-05-19 18:35:40 +02:00
xaedes
44d83558bc
use different arguments for input and output checkpoint
2023-05-19 18:34:18 +02:00
xaedes
d8b0666429
initialize rng with srand
2023-05-19 18:29:47 +02:00
xaedes
25fe1c3815
use inplace functions where possible
2023-05-19 14:53:21 +02:00
Erik Scholz
5ea4339273
make kv_f16 the default for api users ( #1517 )
2023-05-18 19:31:01 +02:00
DannyDaemonic
ee9654138a
Fixes #1511 lambda issue for w64devkit (mingw) ( #1513 )
...
* Fix for w64devkit and mingw
2023-05-18 19:30:40 +02:00
Stephan Walter
dc271c52ed
Remove unused n_parts parameter ( #1509 )
2023-05-17 22:12:01 +00:00
rankaiyx
c238b5873a
benchmark-matmul: Print the average of the test results ( #1490 )
2023-05-17 16:47:58 +02:00
xaedes
b241b9cb6c
save train trained model to checkpoint and load model to be trained from checkpoint
2023-05-17 13:49:32 +02:00
xaedes
d328472f16
fix get_samples call, add model tensor names, increase model size, start training samples after newline
2023-05-17 12:52:20 +02:00
Tom Jobbins
2b2646931b
convert.py: Support models which are stored in a single pytorch_model.bin ( #1469 )
...
* Support models in a single pytorch_model.bin
* Remove spurious line with typo
2023-05-17 00:04:35 +02:00
Ilya Kurdyukov
42627421ec
~7% faster Q5_1 AVX2 code ( #1477 )
2023-05-16 18:36:47 +00:00
András Salamon
9560655409
define default model path once, sync path with readme ( #1366 )
2023-05-16 17:46:34 +02:00
sandyiscool
2a5ee023ad
Add alternate include path for openblas ( #1476 )
...
In some linux distributions (fedora, for example), the include path for openblas is located at '/usr/local/include'
2023-05-16 10:30:15 +02:00
xaedes
e063135d0b
add llama sampler, shuffle samples and constrain sampling to tokens occurring in train data
2023-05-15 21:12:28 +02:00
xaedes
ec881156f6
improve ggml_out_prod performance
...
- change iteration order (>15s -> 10s runtime)
- parallelize over one more dimension: over dst matrix rows (10s -> <5s runtime)
2023-05-15 14:42:24 +02:00
xaedes
19fb91899b
better weight initialization improves training convergence at start
2023-05-15 14:19:38 +02:00
xaedes
f3cf7df21f
better weight initialization improves training convergence at start
2023-05-15 14:18:57 +02:00
xaedes
efa4bb78ea
add ggml_out_prod and use it for mul_mat backward pass for improved performance
...
performance stats report improvement from 37 seconds to 16 seconds runtime during my training tests
2023-05-15 14:17:42 +02:00
zrm
63d20469b8
fix get_num_physical_cores() ( #1436 )
...
* fix get_num_physical_cores()
had been broken on complex topologies because "cpu cores" in /proc/cpuinfo is per-"physical id"
* Add spaces to maintain consistent formatting
---------
Co-authored-by: slaren <ddevesa@gmail.com>
2023-05-15 04:25:42 +02:00
slaren
b5c9295eef
benchmark-matmul: fix clang-tidy issues, report results in GFLOPS ( #1458 )
...
* benchmark-matmul: fix command line parsing, replace macros with functions, report results in GFLOPS
2023-05-14 22:46:00 +02:00
xaedes
a703d7a85f
activate threading in baby-llama-text
2023-05-14 21:00:55 +02:00
xaedes
d9b5268728
avoid printing too much newlines in baby-llama-text
2023-05-14 20:57:47 +02:00
xaedes
c054079fb8
improve performance of mul_mat backward pass
...
avoid transpose by using mul_mat with swapped arguments
2023-05-14 20:56:50 +02:00
xaedes
1f2b76de01
fix bug in ggml_compute_forward_soft_max_back_f32 on DEBUG build
2023-05-14 20:55:24 +02:00
xaedes
69108167cd
fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32
...
memcpy needs to be synchronized across threads to avoid race conditions.
=> do it in INIT phase
2023-05-14 20:54:57 +02:00
Johannes Gäßler
eb363627fd
cuda : deduplicated dequantization code ( #1453 )
2023-05-14 21:53:23 +03:00
xaedes
4339f8cf28
improve softmax backward pass
...
go from quadratic runtime to linear runtime by simplifying the formulas
2023-05-14 17:55:02 +02:00
xaedes
79b2d5b69d
ggml : alternative fix for race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 ( #1454 )
...
* fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32
memcpy needs to be synchronized across threads to avoid race conditions.
=> do it in INIT phase
* remove trailing whitespace
* Update ggml.c
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-14 18:55:02 +03:00
Georgi Gerganov
13c351ad72
ggml : various fixes ( #1450 )
...
- `ggml_rope()`
- `ggml_diag_mask_inf()` multi-threaded
- compatibility with scratch buffers
2023-05-14 18:22:50 +03:00
xaedes
ec1aea09ec
implement ggml_soft_max_back for more performant backward pass of soft_max
...
avoids creating big intermediate matrices of size n_embd x n_embd for llama layers and n_vocab x n_vocab for cross entropy loss
2023-05-14 17:16:26 +02:00
xaedes
f89c278d83
fix race condition bug in ggml_compute_forward_diag_mask_f32
2023-05-14 17:00:19 +02:00
xaedes
6e968d22b0
add text generating baby-llama from scratch example
2023-05-14 16:07:08 +02:00
katsu560
60f8c361ca
ggml : add AVX support based on AVX2 code ( #1430 )
2023-05-14 10:03:51 +00:00
Georgi Gerganov
601a033475
ggml : add GGML_QNT_VERSION to track quantization format changes
...
https://github.com/ggerganov/ggml/issues/150#issuecomment-1546625668
2023-05-14 10:20:19 +03:00
xaedes
6e88dc93bd
update python bindings
2023-05-13 19:05:24 +02:00