Commit graph

565 commits

Author SHA1 Message Date
Concedo
3687db7cf7 cublas is not feasible at this time. removed for now 2023-04-21 16:14:23 +08:00
Concedo
07bb31b034 wip dont use 2023-04-21 00:35:54 +08:00
Concedo
7ba36c2c6c trying to put out penguin based fires. sorry for inconvenience 2023-04-20 23:15:07 +08:00
Concedo
49697d86d8 adjusted down the buf memory allocation now that realloc seems to work 2023-04-20 17:51:13 +08:00
Concedo
4605074245 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	ggml.c
2023-04-20 17:30:54 +08:00
Concedo
3e88616439 fixed WONKY CODE 2023-04-20 16:41:32 +08:00
Concedo
0b08ec7c5d forgot to remove this 2023-04-20 16:28:47 +08:00
Concedo
346cd68903 make linux and OSX build process equal to windows. Now it will build all applicable libraries, for a full build do make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 2023-04-20 15:53:55 +08:00
Stephan Walter
c8c2c52482
AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) 2023-04-20 08:45:41 +02:00
Concedo
93761e7baf slightly clarified the library replacement steps - replacing the dll is necessary in addition to replacing the library imports 2023-04-20 12:23:54 +08:00
Gustavo Rocha Dias
5ca2d774cc
doc - explanation of how to use a custom version of the windows libraries at the lib folder. (#92)
the dynamic libraries also need to be updated if you replace the import libraries
2023-04-20 12:20:11 +08:00
slaren
02d6988121
Improve cuBLAS performance by dequantizing on the GPU (#1065) 2023-04-20 03:14:14 +02:00
CRD716
834695fe3a
Minor: Readme fixed grammar, spelling, and misc updates (#1071) 2023-04-19 19:52:14 +00:00
Kawrakow
f7d05095b4
Q4_2 quantization with rmse-optimized scale and quants (#1062)
* Q4_2 quantization with rmse-optimized scale and quants

For quantize-stats we get
q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012

For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks.

Quantization is slow (~90 seconds on my Mac for 7B) as not
multi-threaded as in PR #896.

* ggml : satisfy the sanitizer builds

Not sure why this makes them fail

* Better follow ggml conventions for function names

* Fixed type as per reviewer comment

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-19 20:20:14 +02:00
Georgi Gerganov
884e7d7a2b
ggml : use 8-bit precision for Q4_1 intermediate results (#1047)
* ggml : use 8-bit precision for Q4_1 intermediate results (ARM)

* ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32

56 ms/token with Q4_1 !

* ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051)

* gitignore : ignore ppl-*.txt files

---------

Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
2023-04-19 20:10:08 +03:00
Georgi Gerganov
7cd5c4a3e9
readme : add warning about Q4_2 and Q4_3 2023-04-19 19:07:54 +03:00
Stephan Walter
f3d4edf504
ggml : Q4 cleanup - remove 4-bit dot product code (#1061)
* Q4 cleanup

* Remove unused AVX512 Q4_0 code
2023-04-19 19:06:37 +03:00
Concedo
be1222c36e Merged the upstream cublas feature, 2023-04-19 20:45:37 +08:00
Concedo
cc407f283a messing around with memory allocation to bandaid the random ooms with various gpt2 and gptj models 2023-04-19 20:18:55 +08:00
slaren
8944a13296
Add NVIDIA cuBLAS support (#1044) 2023-04-19 11:22:45 +02:00
Concedo
f662a9a230 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	CMakeLists.txt
#	Makefile
#	README.md
2023-04-19 16:34:51 +08:00
Concedo
65bfcdb1cc Merge branch 'concedo_experimental' into concedo 2023-04-19 15:35:48 +08:00
Concedo
45ec09d31b fast forwarding for rwkv for unmodified contexts 2023-04-19 15:09:35 +08:00
AlpinDale
116488af66
Create make_pyinstaller.sh (#89) 2023-04-19 10:57:07 +08:00
slaren
6667401238
Multi-threaded ggml_cpy (#1035)
* Multi-threaded ggml_cpy

* Update ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Also fix wdata offset in ggml_compute_forward_add_q_f32

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-19 00:53:24 +02:00
Georgi Gerganov
77a73403ca
ggml : add new Q4_2 quantization (ARM only) (#1046)
* ggml : Q4_2 ARM

* ggml : add ggml_is_quantized()

* llama : update llama_type_name() with Q4_2 entry

* ggml : speed-up q4_2

- 4 threads: ~100ms -> ~90ms
- 8 threads:  ~55ms -> ~50ms

* ggml : optimize q4_2 using vmlaq_n_f32 + vmulq_n_f32
2023-04-18 23:54:57 +03:00
Georgi Gerganov
50a8a2af97
ggml : scratch that - vmlaq_n_f32 is always better
Had a background process that was messing with the timings
2023-04-18 23:11:23 +03:00
Georgi Gerganov
4caebf6d40
gitignore : vdot 2023-04-18 23:00:08 +03:00
Georgi Gerganov
dcdd65e296
ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators 2023-04-18 22:59:17 +03:00
Kawrakow
5ecff35151
Adding a simple program to measure speed of dot products (#1041)
On my Mac, the direct Q4_1 product is marginally slower
(~69 vs ~55 us for Q4_0). The SIMD-ified ggml version
is now almost 2X slower (~121 us).

On a Ryzen 7950X CPU, the direct product for Q4_1 quantization
is faster than the AVX2 implementation (~60 vs ~62 us).

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-04-18 19:00:14 +00:00
Georgi Gerganov
7faa7460f0
readme : update hot topics about new LoRA functionality 2023-04-18 20:10:26 +03:00
Georgi Gerganov
5af8e32238
ci : do not run on drafts 2023-04-18 19:57:06 +03:00
Concedo
f39def81d4 Update readme with more info 2023-04-18 21:44:26 +08:00
Concedo
3614956bc7 update readme 2023-04-18 21:39:05 +08:00
Concedo
ea01771dd5 rwkv is done 2023-04-18 20:55:01 +08:00
Concedo
a76b15b581 Merge branch 'concedo' into concedo_experimental
# Conflicts:
#	make_pyinstaller.bat
2023-04-18 17:42:43 +08:00
Gustavo Rocha Dias
ed5b5c45a9
doc - enhanced readme explaing how to compile at Windows. (#80) 2023-04-18 17:40:04 +08:00
Gustavo Rocha Dias
a9253cdfba
fix - at some OSs the PyInstaller command is case sensitive, at lowercase it doen't work. (#81) 2023-04-18 17:39:06 +08:00
Concedo
ac61e34d5f Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	README.md
2023-04-18 17:38:10 +08:00
Concedo
c200b674f4 updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter 2023-04-18 17:36:44 +08:00
Ivan Komarov
42747220b4
Do not close file after mmap (Windows version) (#1034) 2023-04-18 03:15:50 +02:00
Atsushi Tatsuma
e9298af389
readme : add Ruby bindings (#1029) 2023-04-17 22:34:35 +03:00
Cameron
4ad73137a1
add 4_0 to default outfile namestr dict (#1031)
this came up when trying to convert the gpt4all-lora-unfiltered-quantized.bin file
2023-04-17 20:26:23 +02:00
slaren
315a95a4d3
Add LoRA support (#820) 2023-04-17 17:28:55 +02:00
Arik Poznanski
efd05648c8
llama : well-defined static initialization of complex objects (#927)
* Replaced static initialization of complex objects with a initialization on first use. This prevents an undefined behavior on program run, for example, crash in Release build, works in Debug build

* replaced use of auto with exact type to avoid using -std=c++14

* Made the assessors functions for static maps be static const
2023-04-17 17:41:53 +03:00
Georgi Gerganov
eb17a026fd
quantize-stats : fix bug in --type argument 2023-04-17 17:31:06 +03:00
Concedo
8e923dc6e9 updated kobold lite 2023-04-17 21:33:57 +08:00
Georgi Gerganov
69b740289f
ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c 2023-04-17 16:16:23 +03:00
Ivan Komarov
f266259ad9
Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) 2023-04-17 15:10:57 +02:00
Concedo
1f4a69c051 version number api 2023-04-17 19:31:15 +08:00