Commit graph

554 commits

Author SHA1 Message Date
Georgi Gerganov
e038e01e28
sha : update hashes for 7B and 13B 2023-05-11 21:37:16 +03:00
Georgi Gerganov
5bc286ab18
ggml : fix AVX2 implementation 2023-05-11 21:37:16 +03:00
Georgi Gerganov
bd5e373058
Revert "AVX implementations (#1370)"
This reverts commit 948d124837.
2023-05-11 21:37:16 +03:00
Georgi Gerganov
6680244838
ggml : fix Q8_0 and Q8_1 rounding 2023-05-11 21:37:16 +03:00
Georgi Gerganov
582a39fff5
ggml : simplify Q8_1 - no need for low / high sums anymore 2023-05-11 21:37:16 +03:00
Georgi Gerganov
695f3963b1
ggml : preserve old Q4 and Q5 formats 2023-05-11 21:37:16 +03:00
Georgi Gerganov
b7ad385d42
ggml : speed-up Q5_0 + Q5_1 at 4 threads 2023-05-11 21:37:15 +03:00
Georgi Gerganov
09032e0290
llama : fix model magic/version write 2023-05-11 21:37:15 +03:00
Georgi Gerganov
d52172a509
llama : produce error upon loading old model files 2023-05-11 21:37:15 +03:00
Georgi Gerganov
489bd13fad
ggml : uniform 5th bit extraction 2023-05-11 21:37:15 +03:00
Stephan Walter
9e49d20150
AVX implementations (#1370) 2023-05-11 21:37:15 +03:00
Georgi Gerganov
928d2f335f
scripts : add script for measuring the time per token 2023-05-11 21:37:15 +03:00
Georgi Gerganov
83674556b8
ggml : fix Q5_0 quantization 2023-05-11 21:37:15 +03:00
Georgi Gerganov
b08c39b16c
ggml : minor formatting 2023-05-11 21:37:14 +03:00
Georgi Gerganov
4bf1c8a43e
ggml : remove Q4_2 mode 2023-05-11 21:37:14 +03:00
Georgi Gerganov
cdc9607329
ggml : update cuBLAS + normalize variable names 2023-05-11 21:37:14 +03:00
Georgi Gerganov
9472d0ea8b
ggml : fix Q4_1 quantization 2023-05-11 21:37:14 +03:00
Georgi Gerganov
0add6402bd
ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit 2023-05-11 21:37:14 +03:00
Georgi Gerganov
caaacd5765
ggml : simplify scalar dot 2023-05-11 21:37:14 +03:00
Georgi Gerganov
292a778ca2
ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) 2023-05-11 21:37:13 +03:00
Georgi Gerganov
b37a08f646
ggml : 2x faster scalar implementations 2023-05-11 21:37:13 +03:00
Georgi Gerganov
aa78dfed7d
ggml : remove Q5_0 bit shuffling (ARM NEON) 2023-05-11 21:37:13 +03:00
Georgi Gerganov
9f3285f741
ggml : remove Q4_2 bit shuffling (WIP, BROKEN) 2023-05-11 21:37:13 +03:00
Georgi Gerganov
fd2a137fac
ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON) 2023-05-11 21:37:13 +03:00
Georgi Gerganov
844d2af89d
ggml : remove Q4_1 bit shuffling (ARM NEON + reference) 2023-05-11 21:37:13 +03:00
Georgi Gerganov
5fa47bf6c7
ggml : remove Q4_0 bit shufling (ARM NEON) 2023-05-11 21:37:12 +03:00
CRD716
b608b55a3e
prompts : model agnostic DAN (#1304)
* add model-agnostic dan prompt

* quick readme update

* save a token

* Revert "quick readme update"

This reverts commit 8dc342c069.
2023-05-11 18:10:19 +03:00
Evan Jones
cf348a60e0
main : add option to save full output to session (#1338)
* main : add option to save full output to session

* split behavior into --session and --prompt-cache

* restore original implementation with new names

* PR comments

* move the check for incompatible parameters to gpt_params_parse

* Fix whitespace

Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>

---------

Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-05-10 11:37:14 -04:00
DannyDaemonic
e6a46b0ed1
Locale fix for Windows (#1379) 2023-05-09 19:53:28 +02:00
Sami Farin
9f8dbc4787
use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler (#1314)
* use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler

Tested with a 13B model.

* use _mm_pause() in busyloop

* use _mm_pause() in busyloop on x86_64 to reduce power consumption
2023-05-09 14:29:20 +02:00
DannyDaemonic
41654efea8
Interface improvements and --multiline-input (previously --author-mode) (#1040)
* Interface improvements
* Multiline input
* Track character width
* Works with all characters and control codes + Windows console fixes
2023-05-08 19:45:48 -07:00
Georgi Gerganov
56551bc11f
readme : add notice about upcoming breaking change 2023-05-08 22:52:18 +03:00
AlpinDale
fe60904eef
readme : add TOC and Pygmalion instructions (#1359) 2023-05-08 19:33:30 +03:00
Pavol Rusnak
003ba2fb43
llama : fix hparams shadow (#1367)
fixes #1363
2023-05-08 17:48:21 +03:00
Georgi Gerganov
f9a6364912
llama : require first token to be BOS (#1303)
* llama : require first token to be BOS

* scripts : add ppl-run-all.sh

* perplexity : add BOS for each chunk

* readme : update perplexity values after BOS fix

* perplexity : add clarifying comments
2023-05-08 17:41:54 +03:00
ubik2
95078cc554
convert: add ability to convert safetensors files (#1276)
* when loading a safetensors file, ignore the metadata header
* check for safetensors files first, and only use PyTorch versions when safetensors aren't available
2023-05-08 13:54:26 +02:00
Johannes Gäßler
1f48b0abcf
Documented CUDA reproducibility, added warning (#1346) 2023-05-08 02:42:01 +02:00
Henri Vasserman
e1295513a4
CI: add Windows CLBlast and OpenBLAS builds (#1277)
* Add OpenCL and CLBlast support

* Add OpenBLAS support

* Remove testing from matrix

* change build name to 'clblast'
2023-05-07 13:20:09 +02:00
swittk
1b0fd45465
ggml : Allow usage of CLBlast alongside Accelerate.framework (#1336)
Minor edit in ggml.c which originally would prevent OpenCL from loading completely if GGML_USE_ACCELERATE was defined.
Minor speedup in prompt eval time.
2023-05-06 23:03:23 -04:00
Jed Fox
3924088512
Remove default arguments from sampling functions (#1343) 2023-05-06 17:01:47 -04:00
DaniAndTheWeb
173d0e6419
makefile: automatic Arch Linux detection (#1332)
This commit is a port of a detection method used in koboldcpp's Makefile in order to automatically set the -lcblas option on Arch Linux
2023-05-05 23:57:14 +02:00
Erik Scholz
a3b85b28da
ci : add cublas to windows release (#1271) 2023-05-05 22:56:09 +02:00
Pavol Rusnak
921dcee00a
readme: add missing info (#1324) 2023-05-05 16:43:36 +02:00
Ionoclast Laboratories
2d13786e91
Fix for OpenCL / clbast builds on macOS. (#1329) 2023-05-05 14:18:21 +02:00
Benjamin Lecaillon
a90e96b266
Convert.py @staticmethod (#1327)
* Line 698 has one #staticmethod and should not

otherwise throw error at unpickle.load() as not callable

* Update convert.py

---------

Co-authored-by: Ivan Stepanov <ivanstepanovftw@gmail.com>
2023-05-05 03:17:07 +03:00
slaren
94c5652fc0
quantize: make output filename optional, default to ggml-model-<ftype>.bin (#1301) 2023-05-05 00:58:56 +02:00
Ivan Stepanov
34d9f22f44
Wrap exceptions in std::exception to verbose output on exception. (#1316) 2023-05-04 18:56:27 +02:00
Ivan Stepanov
d3e8093e9b
convert: support DT_BF16 tensors (#1309)
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-05-04 18:54:37 +02:00
44670
360cfe5bec
readme : add OpenBuddy link (#1321) 2023-05-04 19:33:31 +03:00
44670
2edbdb0f99
main : add --in-suffix option (#1318)
* adding --in-suffix option

* print input suffix before generation
2023-05-04 18:41:12 +03:00