Concedo
3687db7cf7
cublas is not feasible at this time. removed for now
2023-04-21 16:14:23 +08:00
Concedo
07bb31b034
wip dont use
2023-04-21 00:35:54 +08:00
Concedo
7ba36c2c6c
trying to put out penguin based fires. sorry for inconvenience
2023-04-20 23:15:07 +08:00
Concedo
49697d86d8
adjusted down the buf memory allocation now that realloc seems to work
2023-04-20 17:51:13 +08:00
Concedo
4605074245
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# ggml.c
2023-04-20 17:30:54 +08:00
Concedo
3e88616439
fixed WONKY CODE
2023-04-20 16:41:32 +08:00
Concedo
0b08ec7c5d
forgot to remove this
2023-04-20 16:28:47 +08:00
Concedo
346cd68903
make linux and OSX build process equal to windows. Now it will build all applicable libraries, for a full build do make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1
2023-04-20 15:53:55 +08:00
Stephan Walter
c8c2c52482
AVX2 optimization for vec_dot_q4_2_q8_0 ( #1068 )
2023-04-20 08:45:41 +02:00
Concedo
93761e7baf
slightly clarified the library replacement steps - replacing the dll is necessary in addition to replacing the library imports
2023-04-20 12:23:54 +08:00
Gustavo Rocha Dias
5ca2d774cc
doc - explanation of how to use a custom version of the windows libraries at the lib folder. ( #92 )
...
the dynamic libraries also need to be updated if you replace the import libraries
2023-04-20 12:20:11 +08:00
slaren
02d6988121
Improve cuBLAS performance by dequantizing on the GPU ( #1065 )
2023-04-20 03:14:14 +02:00
CRD716
834695fe3a
Minor: Readme fixed grammar, spelling, and misc updates ( #1071 )
2023-04-19 19:52:14 +00:00
Kawrakow
f7d05095b4
Q4_2 quantization with rmse-optimized scale and quants ( #1062 )
...
* Q4_2 quantization with rmse-optimized scale and quants
For quantize-stats we get
q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012
For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks.
Quantization is slow (~90 seconds on my Mac for 7B) as not
multi-threaded as in PR #896 .
* ggml : satisfy the sanitizer builds
Not sure why this makes them fail
* Better follow ggml conventions for function names
* Fixed type as per reviewer comment
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-19 20:20:14 +02:00
Georgi Gerganov
884e7d7a2b
ggml : use 8-bit precision for Q4_1 intermediate results ( #1047 )
...
* ggml : use 8-bit precision for Q4_1 intermediate results (ARM)
* ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32
56 ms/token with Q4_1 !
* ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051 )
* gitignore : ignore ppl-*.txt files
---------
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
2023-04-19 20:10:08 +03:00
Georgi Gerganov
7cd5c4a3e9
readme : add warning about Q4_2 and Q4_3
2023-04-19 19:07:54 +03:00
Stephan Walter
f3d4edf504
ggml : Q4 cleanup - remove 4-bit dot product code ( #1061 )
...
* Q4 cleanup
* Remove unused AVX512 Q4_0 code
2023-04-19 19:06:37 +03:00
Concedo
be1222c36e
Merged the upstream cublas feature,
2023-04-19 20:45:37 +08:00
Concedo
cc407f283a
messing around with memory allocation to bandaid the random ooms with various gpt2 and gptj models
2023-04-19 20:18:55 +08:00
slaren
8944a13296
Add NVIDIA cuBLAS support ( #1044 )
2023-04-19 11:22:45 +02:00
Concedo
f662a9a230
Merge branch 'master' into concedo
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README.md
2023-04-19 16:34:51 +08:00
Concedo
65bfcdb1cc
Merge branch 'concedo_experimental' into concedo
2023-04-19 15:35:48 +08:00
Concedo
45ec09d31b
fast forwarding for rwkv for unmodified contexts
2023-04-19 15:09:35 +08:00
AlpinDale
116488af66
Create make_pyinstaller.sh ( #89 )
2023-04-19 10:57:07 +08:00
slaren
6667401238
Multi-threaded ggml_cpy ( #1035 )
...
* Multi-threaded ggml_cpy
* Update ggml.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Also fix wdata offset in ggml_compute_forward_add_q_f32
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-19 00:53:24 +02:00
Georgi Gerganov
77a73403ca
ggml : add new Q4_2 quantization (ARM only) ( #1046 )
...
* ggml : Q4_2 ARM
* ggml : add ggml_is_quantized()
* llama : update llama_type_name() with Q4_2 entry
* ggml : speed-up q4_2
- 4 threads: ~100ms -> ~90ms
- 8 threads: ~55ms -> ~50ms
* ggml : optimize q4_2 using vmlaq_n_f32 + vmulq_n_f32
2023-04-18 23:54:57 +03:00
Georgi Gerganov
50a8a2af97
ggml : scratch that - vmlaq_n_f32 is always better
...
Had a background process that was messing with the timings
2023-04-18 23:11:23 +03:00
Georgi Gerganov
4caebf6d40
gitignore : vdot
2023-04-18 23:00:08 +03:00
Georgi Gerganov
dcdd65e296
ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators
2023-04-18 22:59:17 +03:00
Kawrakow
5ecff35151
Adding a simple program to measure speed of dot products ( #1041 )
...
On my Mac, the direct Q4_1 product is marginally slower
(~69 vs ~55 us for Q4_0). The SIMD-ified ggml version
is now almost 2X slower (~121 us).
On a Ryzen 7950X CPU, the direct product for Q4_1 quantization
is faster than the AVX2 implementation (~60 vs ~62 us).
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-04-18 19:00:14 +00:00
Georgi Gerganov
7faa7460f0
readme : update hot topics about new LoRA functionality
2023-04-18 20:10:26 +03:00
Georgi Gerganov
5af8e32238
ci : do not run on drafts
2023-04-18 19:57:06 +03:00
Concedo
f39def81d4
Update readme with more info
2023-04-18 21:44:26 +08:00
Concedo
3614956bc7
update readme
2023-04-18 21:39:05 +08:00
Concedo
ea01771dd5
rwkv is done
2023-04-18 20:55:01 +08:00
Concedo
a76b15b581
Merge branch 'concedo' into concedo_experimental
...
# Conflicts:
# make_pyinstaller.bat
2023-04-18 17:42:43 +08:00
Gustavo Rocha Dias
ed5b5c45a9
doc - enhanced readme explaing how to compile at Windows. ( #80 )
2023-04-18 17:40:04 +08:00
Gustavo Rocha Dias
a9253cdfba
fix - at some OSs the PyInstaller command is case sensitive, at lowercase it doen't work. ( #81 )
2023-04-18 17:39:06 +08:00
Concedo
ac61e34d5f
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# README.md
2023-04-18 17:38:10 +08:00
Concedo
c200b674f4
updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter
2023-04-18 17:36:44 +08:00
Ivan Komarov
42747220b4
Do not close file after mmap (Windows version) ( #1034 )
2023-04-18 03:15:50 +02:00
Atsushi Tatsuma
e9298af389
readme : add Ruby bindings ( #1029 )
2023-04-17 22:34:35 +03:00
Cameron
4ad73137a1
add 4_0 to default outfile namestr dict ( #1031 )
...
this came up when trying to convert the gpt4all-lora-unfiltered-quantized.bin file
2023-04-17 20:26:23 +02:00
slaren
315a95a4d3
Add LoRA support ( #820 )
2023-04-17 17:28:55 +02:00
Arik Poznanski
efd05648c8
llama : well-defined static initialization of complex objects ( #927 )
...
* Replaced static initialization of complex objects with a initialization on first use. This prevents an undefined behavior on program run, for example, crash in Release build, works in Debug build
* replaced use of auto with exact type to avoid using -std=c++14
* Made the assessors functions for static maps be static const
2023-04-17 17:41:53 +03:00
Georgi Gerganov
eb17a026fd
quantize-stats : fix bug in --type argument
2023-04-17 17:31:06 +03:00
Concedo
8e923dc6e9
updated kobold lite
2023-04-17 21:33:57 +08:00
Georgi Gerganov
69b740289f
ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c
2023-04-17 16:16:23 +03:00
Ivan Komarov
f266259ad9
Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() ( #933 )
2023-04-17 15:10:57 +02:00
Concedo
1f4a69c051
version number api
2023-04-17 19:31:15 +08:00