Georgi Gerganov
dae6ba2abe
baby-llama : couple of clang-tidy warnings
2023-05-13 15:38:50 +03:00
Georgi Gerganov
ef3d42a3aa
ggml : fix clang-tidy warnings
2023-05-13 15:34:56 +03:00
Georgi Gerganov
95a487a17e
ggml : remove Q4_2 remnants
2023-05-13 15:22:24 +03:00
Georgi Gerganov
092913ecea
Merge remote-tracking branch 'origin/master' into HEAD
2023-05-13 15:20:22 +03:00
Georgi Gerganov
f048af0230
ggml : sync alibi fix from ggml repo
2023-05-13 11:54:33 +03:00
3ooabkhxtn
ac0cd259d5
Adding SSE instructions to ggml_vec_dot_q4_0_q8_0 ( #1413 )
2023-05-13 08:43:33 +00:00
Georgi Gerganov
0cd22e190a
llama : fix various warnings
2023-05-13 11:23:15 +03:00
Rinne
6456a4eb9f
embedding : remove unused code ( #1426 )
2023-05-13 10:24:20 +03:00
Georgi Gerganov
33034cfede
ggml : fix null ptr deref in backward pass
2023-05-13 10:08:01 +03:00
Georgi Gerganov
f977243ded
minor : fix compiler warnings + indentation style
2023-05-13 09:55:17 +03:00
Georgi Gerganov
cdd5350892
readme : update Q4_0 perplexities
...
I think these were affected by the removal of the `round` during quantization
2023-05-13 09:12:44 +03:00
Georgi Gerganov
738ace394a
llama : free ggml context in set / copy state data ( close #1425 )
2023-05-13 09:08:52 +03:00
Henri Vasserman
699b1ad7fe
opencl : fix kernels for the new formats ( #1422 )
...
* Fix OpenCL kernels for the new formats
* Fix Q5_0 alignment issues.
2023-05-13 09:01:15 +03:00
Georgi Gerganov
fb62f92433
llama : fix --mtest option ( close #1414 )
2023-05-12 21:44:20 +03:00
Johannes Gäßler
773ee249fb
CLI args use - instead of _, backwards compatible ( #1416 )
2023-05-12 14:34:55 +00:00
slaren
553fd4d4b5
Add clang-tidy reviews to CI ( #1407 )
2023-05-12 15:40:53 +02:00
Rinne
089b1c93ba
readme : add C#/.NET bindings repo ( #1409 )
2023-05-12 08:39:40 +03:00
Georgi Gerganov
b9fd7eee57
ggml : remove bit shuffling ( #1405 )
...
* ggml : remove Q4_0 bit shufling (ARM NEON)
* ggml : remove Q4_1 bit shuffling (ARM NEON + reference)
* ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)
* ggml : remove Q4_2 bit shuffling (WIP, BROKEN)
* ggml : remove Q5_0 bit shuffling (ARM NEON)
* ggml : 2x faster scalar implementations
* ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)
* ggml : simplify scalar dot
* ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit
* ggml : fix Q4_1 quantization
* ggml : update cuBLAS + normalize variable names
* ggml : remove Q4_2 mode
* ggml : minor formatting
* ggml : fix Q5_0 quantization
* scripts : add script for measuring the time per token
* AVX implementations (#1370 )
* ggml : uniform 5th bit extraction
* llama : produce error upon loading old model files
* llama : fix model magic/version write
* ggml : speed-up Q5_0 + Q5_1 at 4 threads
* ggml : preserve old Q4 and Q5 formats
* ggml : simplify Q8_1 - no need for low / high sums anymore
* ggml : fix Q8_0 and Q8_1 rounding
* Revert "AVX implementations (#1370 )"
This reverts commit 948d124837
.
* ggml : fix AVX2 implementation
* sha : update hashes for 7B and 13B
* readme : update timings + remove warning banner
* llama : update v2 PR number to 1405
* ggml : fix WASM comments
* ggml : back to original bit order
* readme : add note that Q4 and Q5 have been changed
* llama : fix return for unknown version
---------
Co-authored-by: Stephan Walter <stephan@walter.name>
2023-05-12 00:23:08 +03:00
xaedes
b9ef08ccab
remove trailing whitespace
2023-05-11 20:03:18 +02:00
xaedes
581e5eb954
cleanup code for batched training
2023-05-11 19:49:41 +02:00
xaedes
3e3ed9560c
add parallel batched forward function for baby-llama training
2023-05-11 19:31:46 +02:00
CRD716
b608b55a3e
prompts : model agnostic DAN ( #1304 )
...
* add model-agnostic dan prompt
* quick readme update
* save a token
* Revert "quick readme update"
This reverts commit 8dc342c069
.
2023-05-11 18:10:19 +03:00
Evan Jones
cf348a60e0
main : add option to save full output to session ( #1338 )
...
* main : add option to save full output to session
* split behavior into --session and --prompt-cache
* restore original implementation with new names
* PR comments
* move the check for incompatible parameters to gpt_params_parse
* Fix whitespace
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
---------
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-05-10 11:37:14 -04:00
DannyDaemonic
e6a46b0ed1
Locale fix for Windows ( #1379 )
2023-05-09 19:53:28 +02:00
Sami Farin
9f8dbc4787
use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler ( #1314 )
...
* use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler
Tested with a 13B model.
* use _mm_pause() in busyloop
* use _mm_pause() in busyloop on x86_64 to reduce power consumption
2023-05-09 14:29:20 +02:00
DannyDaemonic
41654efea8
Interface improvements and --multiline-input
(previously --author-mode
) ( #1040 )
...
* Interface improvements
* Multiline input
* Track character width
* Works with all characters and control codes + Windows console fixes
2023-05-08 19:45:48 -07:00
Georgi Gerganov
56551bc11f
readme : add notice about upcoming breaking change
2023-05-08 22:52:18 +03:00
Georgi Gerganov
6ca682b19d
ggml : swap vDSP_vsub args as per documentation
2023-05-08 21:16:35 +03:00
xaedes
9c3fe4eb76
swap arguments to vDSP_vdiv call
...
documentation for vDSP_vdiv states: "Note that B comes before A!"
2023-05-08 21:16:35 +03:00
xaedes
cafbb785fa
swap arguments to vDSP_vdiv call
...
documentation for vDSP_vdiv states: "Note that B comes before A!"
2023-05-08 20:13:40 +02:00
AlpinDale
fe60904eef
readme : add TOC and Pygmalion instructions ( #1359 )
2023-05-08 19:33:30 +03:00
Georgi Gerganov
6cc42deda5
ggml : fix nullptr derefs in GGML_OP_CONT and GGML_OP_RESHAPE back
2023-05-08 18:50:04 +03:00
Georgi Gerganov
78af3e92c9
ggml : fix compiler warnings + cosmetic changes
2023-05-08 18:37:17 +03:00
xaedes
0d72207ac3
c++ in baby-llama example
...
use c++ includes instead of c includes
use std::min, std::max instead of MIN, MAX macros
2023-05-08 16:56:55 +02:00
Pavol Rusnak
003ba2fb43
llama : fix hparams shadow ( #1367 )
...
fixes #1363
2023-05-08 17:48:21 +03:00
Georgi Gerganov
f9a6364912
llama : require first token to be BOS ( #1303 )
...
* llama : require first token to be BOS
* scripts : add ppl-run-all.sh
* perplexity : add BOS for each chunk
* readme : update perplexity values after BOS fix
* perplexity : add clarifying comments
2023-05-08 17:41:54 +03:00
xaedes
dea9c9359a
c++ in baby-llama example
...
use c++ includes instead of c includes
use std::min, std::max instead of MIN, MAX macros
2023-05-08 16:40:31 +02:00
ubik2
95078cc554
convert: add ability to convert safetensors files ( #1276 )
...
* when loading a safetensors file, ignore the metadata header
* check for safetensors files first, and only use PyTorch versions when safetensors aren't available
2023-05-08 13:54:26 +02:00
Johannes Gäßler
1f48b0abcf
Documented CUDA reproducibility, added warning ( #1346 )
2023-05-08 02:42:01 +02:00
xaedes
1ecbece752
disable slow tests grad0 and opt to avoid exceeding timeouts
2023-05-08 02:29:36 +02:00
xaedes
f5301061b6
remove busy loop that was used as sleep for slower sinus wave generation
2023-05-08 01:12:37 +02:00
xaedes
4997bc5819
reduce number of test-grad0 iterations
...
avoid exceeding timeout of automated tests
2023-05-08 00:57:41 +02:00
xaedes
2936dd60a4
remove trailing whitespace
2023-05-08 00:04:54 +02:00
xaedes
7c8768f819
add missing include for strcmp, etc
2023-05-07 23:43:43 +02:00
xaedes
660836f0ff
fix call to ggml_set_name
2023-05-07 23:39:57 +02:00
xaedes
9dd8e405fb
rename print functions in baby-llama example
2023-05-07 22:43:23 +02:00
xaedes
47ad186628
revert disabling of threading for rms_norm and norm
2023-05-07 21:56:10 +02:00
xaedes
5d9fed7e7f
remove shape annotations in llama_eval_internal
2023-05-07 21:45:29 +02:00
xaedes
d20ba6f6e6
update static assert of GGML_OP_COUNT
2023-05-07 21:42:42 +02:00
xaedes
e643fa1619
smaller default values for baby llama model parameters
2023-05-07 21:38:00 +02:00