Pavol Rusnak
bb98e77be7
nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py ( #981 )
2023-04-25 23:19:57 +02:00
Georgi Gerganov
7a32fcb3b2
ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) ( #1179 )
...
* ggml : add Q8_0 quantization format (rename the old one to Q8_1)
* tests : fix test-quantize-fns
* ggml : finalize Q8_0 implementation
* ggml : use q4_0_q8_0 and q4_2_q8_0
* ggml : fix Q8_0 dot product bug (ARM)
* ggml : Q8_0 unroll x2
* ggml : fix bug - using wrong block type
* ggml : extend quantize_fns_t with "vec_dot_type"
* ggml : fix Q8_0 to use 255 values out of 256
* ggml : fix assert using wrong QK4_2 instead of QK4_3
2023-04-25 23:40:51 +03:00
unbounded
dd0eabc049
ggml : use full range for Q4_0 and Q4_2 quantization ( #729 )
...
* Use full range for q4_0 quantization
By keeping the sign of the highest magnitude, we can make sure the
highest value maps to -8, which is currently unused.
This is a bit of a freebie since it is fully backwards compatible with
the current format.
* Update quantize_row_q4_0 for AVX/AVX2
* Update quantize_row_q4_0 for WASM
Untested
* Update quantize_row_q4_0 for Arm NEON
* Update quantize_row_q4_0 for PowerPC
Untested
* Use full range for q4_2 quantization
2023-04-25 20:20:46 +03:00
Concedo
0aa3d839fb
free old ctx on retry
2023-04-25 23:42:57 +08:00
Concedo
a696b0a16c
missed another thing
2023-04-25 23:16:04 +08:00
Concedo
8c9c218609
missed a thing
2023-04-25 23:02:08 +08:00
Concedo
235daf4016
Merge branch 'master' into concedo
...
# Conflicts:
# .github/workflows/build.yml
# README.md
2023-04-25 20:44:22 +08:00
Concedo
72b2331ad6
edge cases with mem crash? need verify
2023-04-25 20:42:30 +08:00
Concedo
5eec5d6ed9
Added backwards compatibility to an earlier version of NeoX.
2023-04-25 20:34:18 +08:00
Concedo
bff998f871
Slight refactor of the python code: credits to @LuxF3rre
2023-04-25 19:20:14 +08:00
xaedes
54bb60e268
ggml : fix bug in ggml_compute_forward_sum_f32 ( #1162 )
...
The sum over all rows is now computed instead of just the last row
2023-04-24 23:02:02 +02:00
Georgi Gerganov
8a0f8673ba
ggml : export symbols ( #1155 )
2023-04-24 22:18:25 +03:00
xaedes
0c5692345d
examples : add save_load_state example ( #1150 )
...
* add save_load_state example
* use <cstdio> instead of <iostream> and fprintf / printf instead of cout
* renamed save-load-state example files replacing underscores by dashes
2023-04-24 19:23:31 +03:00
Georgi Gerganov
957c8ae21d
llama : increase scratch buffer size for 65B (ref #1152 )
...
Temporary solution
2023-04-24 18:47:30 +03:00
mgroeber9110
9b0a4d4214
examples/main README improvements and some light refactoring ( #1131 )
2023-04-24 15:45:32 +00:00
Stephan Walter
2ec83428de
Fix build for gcc 8 and test in CI ( #1154 )
2023-04-24 15:38:26 +00:00
slaren
e4cf982e0d
Fix cuda compilation ( #1128 )
...
* Fix: Issue with CUBLAS compilation error due to missing -fPIC flag
---------
Co-authored-by: B1gM8c <89020353+B1gM8c@users.noreply.github.com>
2023-04-24 17:29:58 +02:00
Concedo
59fb174678
fixed compile errors, made mmap automatic when lora is selected, added updated quantizers and quantization handling for gpt neox gpt 2 and gptj
2023-04-24 23:20:06 +08:00
Concedo
3962eb39c7
added token unbanning
2023-04-24 21:50:20 +08:00
Concedo
1b9b9068b1
merged q4_2 and q4_3 dequants and FIXED CLBLAST SLOWNESS!
2023-04-24 21:33:01 +08:00
Concedo
e58f1d1336
Merge branch 'master' into concedo_experimental
2023-04-24 19:43:17 +08:00
Georgi Gerganov
c4fe84fb0d
llama : refactor get / set state + remove redundant kv cache API ( #1143 )
2023-04-24 07:40:02 +03:00
Concedo
8e615c8245
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
2023-04-24 12:20:08 +08:00
slaren
1d78fecdab
Fix LoRA acronym ( #1145 )
2023-04-23 23:03:44 +02:00
Georgi Gerganov
284685f169
scripts : add helper scripts to synch ggml repo
2023-04-23 19:57:09 +03:00
DannyDaemonic
edce63baa9
Added README.md for main with examples and explanations ( #1139 )
2023-04-23 15:37:02 +00:00
Georgi Gerganov
ec9cdb6752
ggml : do not print perf ops that have not been used at all
2023-04-23 18:32:52 +03:00
Georgi Gerganov
e4422e299c
ggml : better PERF prints + support "LLAMA_PERF=1 make"
2023-04-23 18:15:39 +03:00
Stephan Walter
53c8434398
Improve AVX2 for vec_dot_q4_3_q8_0 ( #1138 )
2023-04-23 11:01:03 +00:00
Pavol Rusnak
c6524f46eb
readme : update gpt4all instructions ( #980 )
2023-04-23 10:21:26 +02:00
Concedo
9129e937f9
only llama can use batch sizes above 256 to prevent unacceptably high memory usage
2023-04-23 15:57:06 +08:00
Yishuo Wang
c9e2c26f41
A better packNibbles
and mul_sum_i8_pairs_float
implementation using AVX512 ( #1119 )
2023-04-23 07:57:05 +00:00
Concedo
432cc91649
still needs to be a bit higher for very small contexts
2023-04-23 15:01:38 +08:00
Concedo
4e1ea2ac61
hopefully fixed the ooms for good
2023-04-23 13:49:50 +08:00
Gustavo Rocha Dias
3f21bd81f3
doc - Better explanation of how to build the libraries at Windows. ( #107 )
2023-04-23 13:40:09 +08:00
Concedo
d41490c27b
just revert back to the working commit
2023-04-23 00:35:42 +08:00
Concedo
c60fb5ef4b
fixed rwkv build errors on ARM devices
2023-04-23 00:18:38 +08:00
Concedo
b5d6284190
increase initial buffer too
2023-04-23 00:07:33 +08:00
Concedo
d2f14b2b1f
add an extra buffer to mem allocations
2023-04-23 00:04:32 +08:00
Concedo
7c60441d71
Merge branch 'master' into concedo
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
2023-04-22 23:46:14 +08:00
Concedo
eb73b4c261
remove writing to cl_buffer_c and change it to a writeonly buffer - should work since beta is always zero.
2023-04-22 23:19:17 +08:00
Concedo
cd6c121357
reinstated the reusable buffers -> approx 10% speedup for prompt processing
2023-04-22 22:49:27 +08:00
Georgi Gerganov
0e018fe008
ggml : fix Q4_3 cuBLAS
2023-04-22 16:32:07 +03:00
Stephan Walter
857308d1e8
ci : trigger CI for drafts, but not most PR actions ( #1125 )
2023-04-22 16:12:29 +03:00
Stephan Walter
c50b628810
Fix CI: ARM NEON, quantization unit tests, editorconfig ( #1122 )
2023-04-22 10:54:13 +00:00
unbounded
5f939498d5
ggml : unit test for quantization functions ( #953 )
...
* Unit test for quantization functions
Use the ggml_internal_get_quantize_fn function to loop through all
quantization formats and run a sanity check on the result.
Also add a microbenchmark that times these functions directly without
running the rest of the GGML graph.
* test-quantize-fns: CI fixes
Fix issues uncovered in CI
- need to use sizes divisible by 32*8 for loop unrolling
- use intrinsic header that should work on Mac
* test-quantize: remove
Per PR comment, subsumed by test-quantize-fns
* test-quantize: fix for q8_0 intermediates
2023-04-22 12:10:39 +03:00
wbpxre150
36b4f7e064
llama : print timings on ctrl+c exit ( #1021 )
...
* print timings on ctrl+c exit
* remove redundant free memory call.
* add global pointer to ctx.
2023-04-22 11:56:35 +03:00
Concedo
811989c2ad
fixed pyinstaller
2023-04-22 16:31:42 +08:00
eiery
10f19c1121
llama : have n_batch default to 512 ( #1091 )
...
* set default n_batch to 512 when using BLAS
* spacing
* alternate implementation of setting different n_batch for BLAS
* set n_batch to 512 for all cases
2023-04-22 11:27:05 +03:00
Concedo
1b7aa2b815
Merge branch 'master' into concedo
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
2023-04-22 16:22:08 +08:00