Commit graph

702 commits

Author SHA1 Message Date
Concedo
b3315459c7 pilled the new dequants for clblast, fixed some ooms 2023-04-30 14:15:44 +08:00
Concedo
0061b90ec6 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
2023-04-30 10:35:02 +08:00
Georgi Gerganov
c3ca7a5f05
ggml : fix 32-bit ARM NEON 2023-04-29 21:34:23 +03:00
Georgi Gerganov
e8c051611a
ggml : use vzip instead of vuzp for consistency 2023-04-29 21:12:56 +03:00
Georgi Gerganov
0b5a935099
ggml : fix visibility and unused warnings 2023-04-29 19:28:36 +03:00
Georgi Gerganov
ec728e44d7
ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) 2023-04-29 18:43:42 +03:00
Georgi Gerganov
214b6a3570
ggml : adjust mul_mat_f16 work memory (#1226)
* llama : minor - remove explicity int64_t cast

* ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS

* ggml : add asserts to guard for incorrect wsize
2023-04-29 18:43:28 +03:00
Concedo
f149114395 up ver 2023-04-29 19:42:21 +08:00
Concedo
7afad2b9b5 integrated the new samplers 2023-04-29 19:41:41 +08:00
Georgi Gerganov
305eb5afd5
build : fix reference to old llama_util.h 2023-04-29 13:53:12 +03:00
Georgi Gerganov
84ca9c2ecf
examples : fix save-load-state + rename llama-util.h 2023-04-29 13:48:11 +03:00
Concedo
da0c34b028 Merge branch 'master' into concedo_experimental 2023-04-29 18:27:06 +08:00
Concedo
fe0e4de8e8 fixed a regression where a bad model was giving valid logits after library changes. now we run the eval through the model twice and compare logits. if they give the same logits for different inputs, model is broken 2023-04-29 18:25:17 +08:00
Georgi Gerganov
334637e43e
common : change default parameters to pre-#1126 (#1223) 2023-04-29 09:51:06 +03:00
Ivan Stepanov
dd7eff57d8
llama : new sampling algorithms (#1126)
* Sample interface, new samplers.

New samplers:
- locally typical sampling
- tail free sampling
- frequency and presence penalty
- mirostat

Ignore EOS fix: -inf should be used.

* mirostat

* Added --logit-bias and --no-penalize-nl, removed std::span

* Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

* Save and load example adjust

* Tests

* Windows build fix

* Windows test fix
2023-04-29 08:34:41 +03:00
Concedo
5aa185f3f7 remove preallocation 2023-04-29 12:32:37 +08:00
Concedo
bb282a4ecf reinstated the q4_3 format, for backwards compatibility. 2023-04-29 11:42:04 +08:00
Concedo
0fc1772a8f Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	ggml.c
2023-04-29 11:14:05 +08:00
Concedo
67ee2b93a7 removed bad import. 2023-04-29 09:59:16 +08:00
slaren
7fc50c051a
cuBLAS: use host pinned memory and dequantize while copying (#1207)
* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase
2023-04-29 02:04:18 +02:00
Henri Vasserman
b1ee8f59b4
cuBLAS: non-contiguous tensor support (#1215)
* Cuda: non-contiguous tensor support

* remove extra stuff

* rename

* fix error

* more fixes, now OpenBLAS and CLBlast build too

* now then?
2023-04-29 01:31:56 +02:00
Stephan Walter
36d19a603b
Remove Q4_3 which is no better than Q5 (#1218) 2023-04-28 23:10:43 +00:00
Georgi Gerganov
7f15c5c477
readme : update hot topics 2023-04-28 21:32:52 +03:00
Georgi Gerganov
55390bcaf2
ggml : sync ggml (ggml_alibi) 2023-04-28 20:51:05 +03:00
CRD716
5fba3c016b
examples : add Jeopardy example (#1168)
* Basic Setup

* Prevent Results.txt from coming up

* Prefixes, Line separators, etc

* editorcheck

* introduction to give more consistent results

* Basic graph thing

* Grading, ready for testing!

* Y'all ready to get funky?

* fix column removal stuff

* missed a few
2023-04-28 19:13:33 +03:00
Evan Jones
1481a9cf25
llama : add session file format and saved sessions in main (#1169) 2023-04-28 18:59:37 +03:00
Georgi Gerganov
11d902364b
ggml : add helper debug printf in soft_max 2023-04-28 17:59:08 +03:00
0cc4m
7296c961d9
ggml : add CLBlast support (#1164)
* Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing

* Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers

* Finish merge of ClBlast support

* Move CLBlast implementation to separate file

Add buffer reuse code (adapted from slaren's cuda implementation)

* Add q4_2 and q4_3 CLBlast support, improve code

* Double CLBlast speed by disabling OpenBLAS thread workaround

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>

* Fix device selection env variable names

* Fix cast in opencl kernels

* Add CLBlast to CMakeLists.txt

* Replace buffer pool with static buffers a, b, qb, c

Fix compile warnings

* Fix typos, use GGML_TYPE defines, improve code

* Improve btype dequant kernel selection code, add error if type is unsupported

* Improve code quality

* Move internal stuff out of header
* Use internal enums instead of CLBlast enums
* Remove leftover C++ includes and defines
* Make event use easier to read

Co-authored-by: Henri Vasserman <henv@hot.ee>

* Use c compiler for opencl files

* Simplify code, fix include

* First check error, then release event

* Make globals static, fix indentation

* Rename dequant kernels file to conform with other file names

* Fix import cl file name

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-28 17:57:16 +03:00
Folko-Ven
78ec543733
Correcting link to w64devkit (#1214)
Correcting link to w64devkit (change seeto to skeeto).
2023-04-28 16:22:48 +02:00
Johannes Gäßler
92a6e13a31
Add Manjaro CUDA include and lib dirs to Makefile (#1212) 2023-04-28 15:40:32 +02:00
Yann Follet
04aaae1d79
add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) 2023-04-28 11:59:48 +00:00
Concedo
f75de52b25 add short delay before exit gui 2023-04-28 15:09:17 +08:00
Concedo
e97c7099b0 created new tkinter GUI 2023-04-28 15:03:48 +08:00
Concedo
032a171867 integrated q5 formats 2023-04-28 12:58:39 +08:00
Concedo
e8a389f85b updated kobold lite, added debug mode, changed streaming mode to now use the same url when launching 2023-04-28 11:41:03 +08:00
Concedo
2499632cdc up version 2023-04-27 17:27:10 +08:00
Concedo
137efe2b8f updated embedded kobold lite, force streaming mode if stream flag is used 2023-04-27 17:16:55 +08:00
Concedo
95bbd46019 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.devops/tools.sh
#	README.md
2023-04-27 16:12:00 +08:00
Concedo
5070815dcf fixing discussion #121 and issue #122 2023-04-27 16:10:01 +08:00
Stephan Walter
0b2da20538
ggml : slightly faster AVX2 implementation for Q5 (#1197) 2023-04-26 23:26:42 +03:00
Georgi Gerganov
f9be42add0
readme : add quantization info 2023-04-26 23:24:42 +03:00
Georgi Gerganov
574406dc7e
ggml : add Q5_0 and Q5_1 quantization (#1187)
* ggml : add Q5_0 quantization (cuBLAS only)

* ggml : fix Q5_0 qh -> uint32_t

* ggml : fix q5_0 histogram stats

* ggml : q5_0 scalar dot product

* ggml : q5_0 ARM NEON dot

* ggml : q5_0 more efficient ARM NEON using uint64_t masks

* ggml : rename Q5_0 -> Q5_1

* ggml : adding Q5_0 mode

* quantize : add Q5_0 and Q5_1 to map

* ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195)

---------

Co-authored-by: Stephan Walter <stephan@walter.name>
2023-04-26 23:14:13 +03:00
Ásgeir Bjarni Ingvarsson
87a6f846d3
Allow setting the rng seed after initialization. (#1184)
The llama_set_state_data function restores the rng state to what it
was at the time llama_copy_state_data was called. But users may want
to restore the state and proceed with a different seed.
2023-04-26 22:08:43 +02:00
DaniAndTheWeb
ea3ad7eb60
Updating build instructions to include BLAS support (#1183)
* Updated build information

First update to the build instructions to include BLAS.

* Update README.md

* Update information about BLAS

* Better BLAS explanation

Adding a clearer BLAS explanation and adding a link to download the CUDA toolkit.

* Better BLAS explanation

* BLAS for Mac

Specifying that BLAS is already supported on Macs using the Accelerate Framework.

* Clarify the effect of BLAS

* Windows Make instructions

Added the instructions to build with Make on Windows

* Fixing typo

* Fix trailing whitespace
2023-04-26 22:03:03 +02:00
Pavol Rusnak
859fee6dfb
quantize : use map to assign quantization type from string (#1191)
instead of `int` (while `int` option still being supported)

This allows the following usage:

`./quantize ggml-model-f16.bin ggml-model-q4_0.bin q4_0`

instead of:

`./quantize ggml-model-f16.bin ggml-model-q4_0.bin 2`
2023-04-26 18:43:27 +02:00
Concedo
101f7a6e73 updated readme 2023-04-26 23:50:00 +08:00
Concedo
93a8e00dfa Merge branch 'master' into concedo
# Conflicts:
#	flake.nix
2023-04-26 18:01:35 +08:00
Disty0
27bc29128e
Update README.md (#120) 2023-04-26 17:33:34 +08:00
Stephan Walter
4afcc37869
Update SHA256SUMS after quantization change (#1181)
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-25 23:41:56 +02:00
ostix360
667c501334
py : cast lora_alpha to int in convert-lora-to-ggml (#1170)
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-25 23:33:08 +02:00