0cc4m
bbfba5f740
Fix import cl file name
2023-04-27 15:30:30 +02:00
0cc4m
96346fb2a4
Rename dequant kernels file to conform with other file names
2023-04-27 15:27:22 +02:00
0cc4m
fafebff53c
Make globals static, fix indentation
2023-04-27 15:25:09 +02:00
0cc4m
4a35ec9df5
First check error, then release event
2023-04-26 19:56:58 +02:00
0cc4m
ce97a807cb
Simplify code, fix include
2023-04-26 18:39:04 +02:00
0cc4m
b746458281
Use c compiler for opencl files
2023-04-26 18:38:31 +02:00
0cc4m
2b0c6a56f9
Improve code quality
...
* Move internal stuff out of header
* Use internal enums instead of CLBlast enums
* Remove leftover C++ includes and defines
* Make event use easier to read
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-04-26 07:48:04 +02:00
0cc4m
137071003c
Improve btype dequant kernel selection code, add error if type is unsupported
2023-04-25 19:40:54 +02:00
0cc4m
36bfb3c158
Fix typos, use GGML_TYPE defines, improve code
2023-04-25 18:43:31 +02:00
0cc4m
daa5df51f7
Replace buffer pool with static buffers a, b, qb, c
...
Fix compile warnings
2023-04-24 22:12:02 +02:00
0cc4m
ae73887fb9
Add CLBlast to CMakeLists.txt
2023-04-24 22:10:31 +02:00
0cc4m
18cc05bde4
Fix cast in opencl kernels
2023-04-24 22:10:31 +02:00
0cc4m
8603c25e3c
Fix device selection env variable names
2023-04-24 22:10:31 +02:00
0cc4m
f469d9afa0
Double CLBlast speed by disabling OpenBLAS thread workaround
...
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
2023-04-24 22:10:29 +02:00
0cc4m
309af7fce9
Add q4_2 and q4_3 CLBlast support, improve code
2023-04-24 22:09:08 +02:00
0cc4m
1b16b8c90d
Move CLBlast implementation to separate file
...
Add buffer reuse code (adapted from slaren's cuda implementation)
2023-04-24 22:09:08 +02:00
0cc4m
6f66870726
Finish merge of ClBlast support
2023-04-24 22:09:08 +02:00
0cc4m
b7143c1a2e
Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers
2023-04-24 22:09:08 +02:00
0cc4m
a908c37ce9
Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing
2023-04-24 22:09:08 +02:00
Georgi Gerganov
8a0f8673ba
ggml : export symbols ( #1155 )
2023-04-24 22:18:25 +03:00
xaedes
0c5692345d
examples : add save_load_state example ( #1150 )
...
* add save_load_state example
* use <cstdio> instead of <iostream> and fprintf / printf instead of cout
* renamed save-load-state example files replacing underscores by dashes
2023-04-24 19:23:31 +03:00
Georgi Gerganov
957c8ae21d
llama : increase scratch buffer size for 65B (ref #1152 )
...
Temporary solution
2023-04-24 18:47:30 +03:00
mgroeber9110
9b0a4d4214
examples/main README improvements and some light refactoring ( #1131 )
2023-04-24 15:45:32 +00:00
Stephan Walter
2ec83428de
Fix build for gcc 8 and test in CI ( #1154 )
2023-04-24 15:38:26 +00:00
slaren
e4cf982e0d
Fix cuda compilation ( #1128 )
...
* Fix: Issue with CUBLAS compilation error due to missing -fPIC flag
---------
Co-authored-by: B1gM8c <89020353+B1gM8c@users.noreply.github.com>
2023-04-24 17:29:58 +02:00
Georgi Gerganov
c4fe84fb0d
llama : refactor get / set state + remove redundant kv cache API ( #1143 )
2023-04-24 07:40:02 +03:00
slaren
1d78fecdab
Fix LoRA acronym ( #1145 )
2023-04-23 23:03:44 +02:00
Georgi Gerganov
284685f169
scripts : add helper scripts to synch ggml repo
2023-04-23 19:57:09 +03:00
DannyDaemonic
edce63baa9
Added README.md for main with examples and explanations ( #1139 )
2023-04-23 15:37:02 +00:00
Georgi Gerganov
ec9cdb6752
ggml : do not print perf ops that have not been used at all
2023-04-23 18:32:52 +03:00
Georgi Gerganov
e4422e299c
ggml : better PERF prints + support "LLAMA_PERF=1 make"
2023-04-23 18:15:39 +03:00
Stephan Walter
53c8434398
Improve AVX2 for vec_dot_q4_3_q8_0 ( #1138 )
2023-04-23 11:01:03 +00:00
Pavol Rusnak
c6524f46eb
readme : update gpt4all instructions ( #980 )
2023-04-23 10:21:26 +02:00
Yishuo Wang
c9e2c26f41
A better packNibbles
and mul_sum_i8_pairs_float
implementation using AVX512 ( #1119 )
2023-04-23 07:57:05 +00:00
Georgi Gerganov
0e018fe008
ggml : fix Q4_3 cuBLAS
2023-04-22 16:32:07 +03:00
Stephan Walter
857308d1e8
ci : trigger CI for drafts, but not most PR actions ( #1125 )
2023-04-22 16:12:29 +03:00
Stephan Walter
c50b628810
Fix CI: ARM NEON, quantization unit tests, editorconfig ( #1122 )
2023-04-22 10:54:13 +00:00
unbounded
5f939498d5
ggml : unit test for quantization functions ( #953 )
...
* Unit test for quantization functions
Use the ggml_internal_get_quantize_fn function to loop through all
quantization formats and run a sanity check on the result.
Also add a microbenchmark that times these functions directly without
running the rest of the GGML graph.
* test-quantize-fns: CI fixes
Fix issues uncovered in CI
- need to use sizes divisible by 32*8 for loop unrolling
- use intrinsic header that should work on Mac
* test-quantize: remove
Per PR comment, subsumed by test-quantize-fns
* test-quantize: fix for q8_0 intermediates
2023-04-22 12:10:39 +03:00
wbpxre150
36b4f7e064
llama : print timings on ctrl+c exit ( #1021 )
...
* print timings on ctrl+c exit
* remove redundant free memory call.
* add global pointer to ctx.
2023-04-22 11:56:35 +03:00
eiery
10f19c1121
llama : have n_batch default to 512 ( #1091 )
...
* set default n_batch to 512 when using BLAS
* spacing
* alternate implementation of setting different n_batch for BLAS
* set n_batch to 512 for all cases
2023-04-22 11:27:05 +03:00
Howard Su
7e312f165c
cmake : fix build under Windows when enable BUILD_SHARED_LIBS ( #1100 )
...
* Fix build under Windows when enable BUILD_SHARED_LIBS
* Make AVX512 test on Windows to build the shared libs
2023-04-22 11:18:20 +03:00
Georgi Gerganov
872c365a91
ggml : fix AVX build + update to new Q8_0 format
2023-04-22 11:08:12 +03:00
Georgi Gerganov
955ef9a5d5
ggml : alternative Q4_3 implementation using modified Q8_0 ( #1109 )
...
* ggml : prefer vzip to vuzp
This way we always use the same type of instruction across all quantizations
* ggml : alternative Q4_3 implementation using modified Q8_0
* ggml : fix Q4_3 scalar imlpementation
* ggml : slight improvement of Q4_3 - no need for loop unrolling
* ggml : fix AVX paths for Q8_0 quantization
2023-04-22 10:55:35 +03:00
Stephan Walter
c5aa5e5777
ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring ( #1099 )
...
* AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring
* finish AVX vectorization of quantize_row_q8_0
* Rename hsum_int_8 to hsum_i32_8
2023-04-22 10:37:05 +03:00
Clint Herron
e9a9cb0c54
examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience ( #1107 )
...
* Moving parameters to separate lines for readability.
* Increasing repeate_penalty to 1.1 to make alpaca more usable by default.
* Adding trailing newline.
2023-04-22 09:54:33 +03:00
xaedes
b6e7f9b09e
llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache ( #1105 )
...
* reserve correct size for logits
* add functions to get and set the whole llama state:
including rng, logits, embedding and kv_cache
* remove unused variables
* remove trailing whitespace
* fix comment
2023-04-22 09:21:32 +03:00
slaren
50cb666b8a
Improve cuBLAS performance by using a memory pool ( #1094 )
...
* Improve cuBLAS performance by using a memory pool
* Move cuda specific definitions to ggml-cuda.h/cu
* Add CXX flags to nvcc
* Change memory pool synchronization mechanism to a spin lock
General code cleanup
2023-04-21 21:59:17 +02:00
apaz
25d7abbd1f
llama : fixed rlimit error message ( #888 )
2023-04-21 21:48:06 +03:00
源文雨
018f2279f5
cmake : link threads publicly to ggml ( #1042 )
...
* fix: ld link test-tokenizer-0 error
```
cmake3 --build . --config Release
[ 5%] Built target ggml
[ 16%] Built target llama
[ 22%] Linking CXX executable ../bin/test-tokenizer-0
../libllama.a(ggml.c.o):在函数‘ggml_graph_compute’中:
ggml.c:(.text+0xf2db):对‘pthread_create’未定义的引用
ggml.c:(.text+0xf9d4):对‘pthread_join’未定义的引用
collect2: error: ld returned 1 exit status
gmake[2]: *** [bin/test-tokenizer-0] 错误 1
gmake[1]: *** [tests/CMakeFiles/test-tokenizer-0.dir/all] 错误 2
gmake: *** [all] 错误 2
```
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
2023-04-21 21:27:06 +03:00
Alex Klinkhamer
9411288271
main : evaluate tokens in batches after swapping context ( #1014 )
...
* examples : evaluate tokens in batches after swapping context
* Update examples/main/main.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-21 21:18:09 +03:00