Commit graph

830 commits

Author SHA1 Message Date
Concedo
49d6334dc1 try fix kernel 2023-05-14 00:41:26 +08:00
Concedo
e05455f852 fixed wrong sized struct from legacy q8_1, fixed opencl varsize arrays 2023-05-13 23:56:08 +08:00
Concedo
c9eb2ba1c5 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	ggml-opencl.c
2023-05-13 15:51:05 +08:00
Concedo
b6594ab91e do not show tokenizer warning 2023-05-13 15:48:17 +08:00
Rinne
6456a4eb9f
embedding : remove unused code (#1426) 2023-05-13 10:24:20 +03:00
Georgi Gerganov
cdd5350892
readme : update Q4_0 perplexities
I think these were affected by the removal of the `round` during quantization
2023-05-13 09:12:44 +03:00
Georgi Gerganov
738ace394a
llama : free ggml context in set / copy state data (close #1425) 2023-05-13 09:08:52 +03:00
Henri Vasserman
699b1ad7fe
opencl : fix kernels for the new formats (#1422)
* Fix OpenCL kernels for the new formats

* Fix Q5_0 alignment issues.
2023-05-13 09:01:15 +03:00
Concedo
cee8042793 integrated new version of clblast kernels as a separate file 2023-05-13 12:53:28 +08:00
Concedo
017023e477 updated kobold lite 2023-05-13 12:12:20 +08:00
Concedo
53e7256a25 should be good to merge, only thing missing is clblast new quants 2023-05-13 12:07:29 +08:00
Concedo
05cf5f7d6e partially working, but the blas matmul is broken 2023-05-13 11:35:38 +08:00
Georgi Gerganov
fb62f92433
llama : fix --mtest option (close #1414) 2023-05-12 21:44:20 +03:00
Concedo
b335f73a60 BACKWARDS COMPAT QUANT SHIM is ready, but upstream model converter is BORKED. BORK BORK. 2023-05-13 01:30:11 +08:00
Concedo
08810d5fee interim merge. do not use 2023-05-13 00:33:55 +08:00
Concedo
e9caff1cda Interim merge. Do not use.
Merge branch 'master' into concedo_experimental

# Conflicts:
#	README.md
#	SHA256SUMS
#	examples/quantize/quantize.cpp
#	ggml-opencl.c
#	ggml.c
#	ggml.h
#	llama.cpp
#	llama.h
2023-05-12 23:20:27 +08:00
Johannes Gäßler
773ee249fb
CLI args use - instead of _, backwards compatible (#1416) 2023-05-12 14:34:55 +00:00
slaren
553fd4d4b5
Add clang-tidy reviews to CI (#1407) 2023-05-12 15:40:53 +02:00
Rinne
089b1c93ba
readme : add C#/.NET bindings repo (#1409) 2023-05-12 08:39:40 +03:00
Georgi Gerganov
b9fd7eee57
ggml : remove bit shuffling (#1405)
* ggml : remove Q4_0 bit shufling (ARM NEON)

* ggml : remove Q4_1 bit shuffling (ARM NEON + reference)

* ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)

* ggml : remove Q4_2 bit shuffling (WIP, BROKEN)

* ggml : remove Q5_0 bit shuffling (ARM NEON)

* ggml : 2x faster scalar implementations

* ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)

* ggml : simplify scalar dot

* ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit

* ggml : fix Q4_1 quantization

* ggml : update cuBLAS + normalize variable names

* ggml : remove Q4_2 mode

* ggml : minor formatting

* ggml : fix Q5_0 quantization

* scripts : add script for measuring the time per token

* AVX implementations (#1370)

* ggml : uniform 5th bit extraction

* llama : produce error upon loading old model files

* llama : fix model magic/version write

* ggml : speed-up Q5_0 + Q5_1 at 4 threads

* ggml : preserve old Q4 and Q5 formats

* ggml : simplify Q8_1 - no need for low / high sums anymore

* ggml : fix Q8_0 and Q8_1 rounding

* Revert "AVX implementations (#1370)"

This reverts commit 948d124837.

* ggml : fix AVX2 implementation

* sha : update hashes for 7B and 13B

* readme : update timings + remove warning banner

* llama : update v2 PR number to 1405

* ggml : fix WASM comments

* ggml : back to original bit order

* readme : add note that Q4 and Q5 have been changed

* llama : fix return for unknown version

---------

Co-authored-by: Stephan Walter <stephan@walter.name>
2023-05-12 00:23:08 +03:00
CRD716
b608b55a3e
prompts : model agnostic DAN (#1304)
* add model-agnostic dan prompt

* quick readme update

* save a token

* Revert "quick readme update"

This reverts commit 8dc342c069.
2023-05-11 18:10:19 +03:00
Evan Jones
cf348a60e0
main : add option to save full output to session (#1338)
* main : add option to save full output to session

* split behavior into --session and --prompt-cache

* restore original implementation with new names

* PR comments

* move the check for incompatible parameters to gpt_params_parse

* Fix whitespace

Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>

---------

Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-05-10 11:37:14 -04:00
Concedo
19dbb3b2a5 Merge branch 'master' into concedo_experimental 2023-05-10 18:35:53 +08:00
DannyDaemonic
e6a46b0ed1
Locale fix for Windows (#1379) 2023-05-09 19:53:28 +02:00
Sami Farin
9f8dbc4787
use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler (#1314)
* use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler

Tested with a 13B model.

* use _mm_pause() in busyloop

* use _mm_pause() in busyloop on x86_64 to reduce power consumption
2023-05-09 14:29:20 +02:00
Concedo
e47f7ade05 updated kobold lite, patch oom errors 2023-05-09 19:16:45 +08:00
Concedo
6d87f67572 up ver 2023-05-09 17:25:46 +08:00
Concedo
54194911ac Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-05-09 16:50:43 +08:00
Concedo
e4c6a1e3ed update readme 2023-05-09 16:17:52 +08:00
DannyDaemonic
41654efea8
Interface improvements and --multiline-input (previously --author-mode) (#1040)
* Interface improvements
* Multiline input
* Track character width
* Works with all characters and control codes + Windows console fixes
2023-05-08 19:45:48 -07:00
Georgi Gerganov
56551bc11f
readme : add notice about upcoming breaking change 2023-05-08 22:52:18 +03:00
AlpinDale
fe60904eef
readme : add TOC and Pygmalion instructions (#1359) 2023-05-08 19:33:30 +03:00
Pavol Rusnak
003ba2fb43
llama : fix hparams shadow (#1367)
fixes #1363
2023-05-08 17:48:21 +03:00
Georgi Gerganov
f9a6364912
llama : require first token to be BOS (#1303)
* llama : require first token to be BOS

* scripts : add ppl-run-all.sh

* perplexity : add BOS for each chunk

* readme : update perplexity values after BOS fix

* perplexity : add clarifying comments
2023-05-08 17:41:54 +03:00
Concedo
2f2eff6e13 the dark gods have been sated, and redpajama is integrated... but at what cost? 2023-05-08 20:58:00 +08:00
ubik2
95078cc554
convert: add ability to convert safetensors files (#1276)
* when loading a safetensors file, ignore the metadata header
* check for safetensors files first, and only use PyTorch versions when safetensors aren't available
2023-05-08 13:54:26 +02:00
Concedo
b9904c3093 up ver 2023-05-08 11:13:16 +08:00
Concedo
1083876a1b Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
2023-05-08 11:12:42 +08:00
Concedo
89d70886a4 added support for setting custom context size at load time (memory allocation) 2023-05-08 11:11:25 +08:00
Johannes Gäßler
1f48b0abcf
Documented CUDA reproducibility, added warning (#1346) 2023-05-08 02:42:01 +02:00
Henri Vasserman
e1295513a4
CI: add Windows CLBlast and OpenBLAS builds (#1277)
* Add OpenCL and CLBlast support

* Add OpenBLAS support

* Remove testing from matrix

* change build name to 'clblast'
2023-05-07 13:20:09 +02:00
Concedo
62beded0e7 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	Makefile
#	README.md
2023-05-07 19:10:01 +08:00
swittk
1b0fd45465
ggml : Allow usage of CLBlast alongside Accelerate.framework (#1336)
Minor edit in ggml.c which originally would prevent OpenCL from loading completely if GGML_USE_ACCELERATE was defined.
Minor speedup in prompt eval time.
2023-05-06 23:03:23 -04:00
Jed Fox
3924088512
Remove default arguments from sampling functions (#1343) 2023-05-06 17:01:47 -04:00
Concedo
ff93b394da fixed a typo 2023-05-06 12:37:34 +08:00
Concedo
a48dddab86 slightly bump the RAM up to support chinese alpaca 2023-05-06 11:48:22 +08:00
DaniAndTheWeb
173d0e6419
makefile: automatic Arch Linux detection (#1332)
This commit is a port of a detection method used in koboldcpp's Makefile in order to automatically set the -lcblas option on Arch Linux
2023-05-05 23:57:14 +02:00
Erik Scholz
a3b85b28da
ci : add cublas to windows release (#1271) 2023-05-05 22:56:09 +02:00
Concedo
8a964e76c8 integrated mirostat as a launch parameter, works on all models 2023-05-06 00:47:17 +08:00
Concedo
851f55325a Merge remote-tracking branch 'temp/concedo' into concedo_experimental 2023-05-05 23:55:53 +08:00