Commit graph

551 commits

Author SHA1 Message Date
Concedo
346cd68903 make linux and OSX build process equal to windows. Now it will build all applicable libraries, for a full build do make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 2023-04-20 15:53:55 +08:00
Concedo
93761e7baf slightly clarified the library replacement steps - replacing the dll is necessary in addition to replacing the library imports 2023-04-20 12:23:54 +08:00
Gustavo Rocha Dias
5ca2d774cc
doc - explanation of how to use a custom version of the windows libraries at the lib folder. (#92)
the dynamic libraries also need to be updated if you replace the import libraries
2023-04-20 12:20:11 +08:00
Concedo
be1222c36e Merged the upstream cublas feature, 2023-04-19 20:45:37 +08:00
Concedo
cc407f283a messing around with memory allocation to bandaid the random ooms with various gpt2 and gptj models 2023-04-19 20:18:55 +08:00
slaren
8944a13296
Add NVIDIA cuBLAS support (#1044) 2023-04-19 11:22:45 +02:00
Concedo
f662a9a230 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	CMakeLists.txt
#	Makefile
#	README.md
2023-04-19 16:34:51 +08:00
Concedo
65bfcdb1cc Merge branch 'concedo_experimental' into concedo 2023-04-19 15:35:48 +08:00
Concedo
45ec09d31b fast forwarding for rwkv for unmodified contexts 2023-04-19 15:09:35 +08:00
AlpinDale
116488af66
Create make_pyinstaller.sh (#89) 2023-04-19 10:57:07 +08:00
slaren
6667401238
Multi-threaded ggml_cpy (#1035)
* Multi-threaded ggml_cpy

* Update ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Also fix wdata offset in ggml_compute_forward_add_q_f32

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-19 00:53:24 +02:00
Georgi Gerganov
77a73403ca
ggml : add new Q4_2 quantization (ARM only) (#1046)
* ggml : Q4_2 ARM

* ggml : add ggml_is_quantized()

* llama : update llama_type_name() with Q4_2 entry

* ggml : speed-up q4_2

- 4 threads: ~100ms -> ~90ms
- 8 threads:  ~55ms -> ~50ms

* ggml : optimize q4_2 using vmlaq_n_f32 + vmulq_n_f32
2023-04-18 23:54:57 +03:00
Georgi Gerganov
50a8a2af97
ggml : scratch that - vmlaq_n_f32 is always better
Had a background process that was messing with the timings
2023-04-18 23:11:23 +03:00
Georgi Gerganov
4caebf6d40
gitignore : vdot 2023-04-18 23:00:08 +03:00
Georgi Gerganov
dcdd65e296
ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators 2023-04-18 22:59:17 +03:00
Kawrakow
5ecff35151
Adding a simple program to measure speed of dot products (#1041)
On my Mac, the direct Q4_1 product is marginally slower
(~69 vs ~55 us for Q4_0). The SIMD-ified ggml version
is now almost 2X slower (~121 us).

On a Ryzen 7950X CPU, the direct product for Q4_1 quantization
is faster than the AVX2 implementation (~60 vs ~62 us).

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-04-18 19:00:14 +00:00
Georgi Gerganov
7faa7460f0
readme : update hot topics about new LoRA functionality 2023-04-18 20:10:26 +03:00
Georgi Gerganov
5af8e32238
ci : do not run on drafts 2023-04-18 19:57:06 +03:00
Concedo
f39def81d4 Update readme with more info 2023-04-18 21:44:26 +08:00
Concedo
3614956bc7 update readme 2023-04-18 21:39:05 +08:00
Concedo
ea01771dd5 rwkv is done 2023-04-18 20:55:01 +08:00
Concedo
a76b15b581 Merge branch 'concedo' into concedo_experimental
# Conflicts:
#	make_pyinstaller.bat
2023-04-18 17:42:43 +08:00
Gustavo Rocha Dias
ed5b5c45a9
doc - enhanced readme explaing how to compile at Windows. (#80) 2023-04-18 17:40:04 +08:00
Gustavo Rocha Dias
a9253cdfba
fix - at some OSs the PyInstaller command is case sensitive, at lowercase it doen't work. (#81) 2023-04-18 17:39:06 +08:00
Concedo
ac61e34d5f Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	README.md
2023-04-18 17:38:10 +08:00
Concedo
c200b674f4 updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter 2023-04-18 17:36:44 +08:00
Ivan Komarov
42747220b4
Do not close file after mmap (Windows version) (#1034) 2023-04-18 03:15:50 +02:00
Atsushi Tatsuma
e9298af389
readme : add Ruby bindings (#1029) 2023-04-17 22:34:35 +03:00
Cameron
4ad73137a1
add 4_0 to default outfile namestr dict (#1031)
this came up when trying to convert the gpt4all-lora-unfiltered-quantized.bin file
2023-04-17 20:26:23 +02:00
slaren
315a95a4d3
Add LoRA support (#820) 2023-04-17 17:28:55 +02:00
Arik Poznanski
efd05648c8
llama : well-defined static initialization of complex objects (#927)
* Replaced static initialization of complex objects with a initialization on first use. This prevents an undefined behavior on program run, for example, crash in Release build, works in Debug build

* replaced use of auto with exact type to avoid using -std=c++14

* Made the assessors functions for static maps be static const
2023-04-17 17:41:53 +03:00
Georgi Gerganov
eb17a026fd
quantize-stats : fix bug in --type argument 2023-04-17 17:31:06 +03:00
Concedo
8e923dc6e9 updated kobold lite 2023-04-17 21:33:57 +08:00
Georgi Gerganov
69b740289f
ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c 2023-04-17 16:16:23 +03:00
Ivan Komarov
f266259ad9
Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) 2023-04-17 15:10:57 +02:00
Concedo
1f4a69c051 version number api 2023-04-17 19:31:15 +08:00
Concedo
364e2736c9 Merge branch 'master' into concedo 2023-04-17 17:34:50 +08:00
Concedo
763ad172c0 arranged files, updated kobold lite, modified makefile for extra link args on linux, started RWKV implementation 2023-04-17 17:31:45 +08:00
slaren
47f61aaa5f
Fix: do not close file on mmap (#1017) 2023-04-16 21:27:38 +02:00
Concedo
9581171a9f updated embedded lite again 2023-04-16 22:42:51 +08:00
Concedo
bee6a401fd slight clarity fix 2023-04-16 22:04:19 +08:00
Concedo
96fb12cfa2 Merge branch 'master' into concedo 2023-04-16 21:59:05 +08:00
Concedo
c757fbee1d fixes to stopper tokens, fixed BLAS mode for GPT2 and GPTJ, updated kobold lite 2023-04-16 21:54:18 +08:00
Concedo
6548d3b3fb Added prints for stopping sequences, made makefile 1% friendlier to arch linux users 2023-04-16 20:43:17 +08:00
Georgi Gerganov
3173a62eb9
stdout : vertical align outputs for better readibility 2023-04-16 13:59:27 +03:00
Concedo
525184930d added a kobold API compatible implementation of stopping sequences 2023-04-16 18:37:49 +08:00
Pavol Rusnak
489537e6cf
examples: add missing <ctime> include for time() (#1011) 2023-04-16 10:13:00 +00:00
nanahi
2d3481c721
Fix msys2 build error and warnings (#1009) 2023-04-16 11:13:42 +02:00
Concedo
8bf2e50a11 converted the cl file to be a string literal instead 2023-04-16 15:57:30 +08:00
Concedo
5a4d1b5d15 Merge branch 'master' into concedo
# Conflicts:
#	CMakeLists.txt
#	Makefile
2023-04-16 14:08:23 +08:00