llama.cpp

Author	SHA1	Message	Date
Concedo	3687db7cf7	cublas is not feasible at this time. removed for now	2023-04-21 16:14:23 +08:00
Concedo	07bb31b034	wip dont use	2023-04-21 00:35:54 +08:00
Concedo	7ba36c2c6c	trying to put out penguin based fires. sorry for inconvenience	2023-04-20 23:15:07 +08:00
Concedo	49697d86d8	adjusted down the buf memory allocation now that realloc seems to work	2023-04-20 17:51:13 +08:00
Concedo	4605074245	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # ggml.c	2023-04-20 17:30:54 +08:00
Concedo	3e88616439	fixed WONKY CODE	2023-04-20 16:41:32 +08:00
Concedo	0b08ec7c5d	forgot to remove this	2023-04-20 16:28:47 +08:00
Concedo	346cd68903	make linux and OSX build process equal to windows. Now it will build all applicable libraries, for a full build do `make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1`	2023-04-20 15:53:55 +08:00
Stephan Walter	c8c2c52482	AVX2 optimization for vec_dot_q4_2_q8_0 (#1068 )	2023-04-20 08:45:41 +02:00
Concedo	93761e7baf	slightly clarified the library replacement steps - replacing the dll is necessary in addition to replacing the library imports	2023-04-20 12:23:54 +08:00
Gustavo Rocha Dias	5ca2d774cc	doc - explanation of how to use a custom version of the windows libraries at the lib folder. (#92 ) the dynamic libraries also need to be updated if you replace the import libraries	2023-04-20 12:20:11 +08:00
slaren	02d6988121	Improve cuBLAS performance by dequantizing on the GPU (#1065 )	2023-04-20 03:14:14 +02:00
CRD716	834695fe3a	Minor: Readme fixed grammar, spelling, and misc updates (#1071 )	2023-04-19 19:52:14 +00:00
Kawrakow	f7d05095b4	Q4_2 quantization with rmse-optimized scale and quants (#1062 ) * Q4_2 quantization with rmse-optimized scale and quants For quantize-stats we get q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012 For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks. Quantization is slow (~90 seconds on my Mac for 7B) as not multi-threaded as in PR #896. * ggml : satisfy the sanitizer builds Not sure why this makes them fail * Better follow ggml conventions for function names * Fixed type as per reviewer comment --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-04-19 20:20:14 +02:00
Georgi Gerganov	884e7d7a2b	ggml : use 8-bit precision for Q4_1 intermediate results (#1047 ) * ggml : use 8-bit precision for Q4_1 intermediate results (ARM) * ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32 56 ms/token with Q4_1 ! * ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051) * gitignore : ignore ppl-*.txt files --------- Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>	2023-04-19 20:10:08 +03:00
Georgi Gerganov	7cd5c4a3e9	readme : add warning about Q4_2 and Q4_3	2023-04-19 19:07:54 +03:00
Stephan Walter	f3d4edf504	ggml : Q4 cleanup - remove 4-bit dot product code (#1061 ) * Q4 cleanup * Remove unused AVX512 Q4_0 code	2023-04-19 19:06:37 +03:00
Concedo	be1222c36e	Merged the upstream cublas feature,	2023-04-19 20:45:37 +08:00
Concedo	cc407f283a	messing around with memory allocation to bandaid the random ooms with various gpt2 and gptj models	2023-04-19 20:18:55 +08:00
slaren	8944a13296	Add NVIDIA cuBLAS support (#1044 )	2023-04-19 11:22:45 +02:00
Concedo	f662a9a230	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # CMakeLists.txt # Makefile # README.md	2023-04-19 16:34:51 +08:00
Concedo	65bfcdb1cc	Merge branch 'concedo_experimental' into concedo	2023-04-19 15:35:48 +08:00
Concedo	45ec09d31b	fast forwarding for rwkv for unmodified contexts	2023-04-19 15:09:35 +08:00
AlpinDale	116488af66	Create make_pyinstaller.sh (#89 )	2023-04-19 10:57:07 +08:00
slaren	6667401238	Multi-threaded ggml_cpy (#1035 ) * Multi-threaded ggml_cpy * Update ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Also fix wdata offset in ggml_compute_forward_add_q_f32 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-04-19 00:53:24 +02:00
Georgi Gerganov	77a73403ca	ggml : add new Q4_2 quantization (ARM only) (#1046 ) * ggml : Q4_2 ARM * ggml : add ggml_is_quantized() * llama : update llama_type_name() with Q4_2 entry * ggml : speed-up q4_2 - 4 threads: ~100ms -> ~90ms - 8 threads: ~55ms -> ~50ms * ggml : optimize q4_2 using vmlaq_n_f32 + vmulq_n_f32	2023-04-18 23:54:57 +03:00
Georgi Gerganov	50a8a2af97	ggml : scratch that - vmlaq_n_f32 is always better Had a background process that was messing with the timings	2023-04-18 23:11:23 +03:00
Georgi Gerganov	4caebf6d40	gitignore : vdot	2023-04-18 23:00:08 +03:00
Georgi Gerganov	dcdd65e296	ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators	2023-04-18 22:59:17 +03:00
Kawrakow	5ecff35151	Adding a simple program to measure speed of dot products (#1041 ) On my Mac, the direct Q4_1 product is marginally slower (~69 vs ~55 us for Q4_0). The SIMD-ified ggml version is now almost 2X slower (~121 us). On a Ryzen 7950X CPU, the direct product for Q4_1 quantization is faster than the AVX2 implementation (~60 vs ~62 us). --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-04-18 19:00:14 +00:00
Georgi Gerganov	7faa7460f0	readme : update hot topics about new LoRA functionality	2023-04-18 20:10:26 +03:00
Georgi Gerganov	5af8e32238	ci : do not run on drafts	2023-04-18 19:57:06 +03:00
Concedo	f39def81d4	Update readme with more info	2023-04-18 21:44:26 +08:00
Concedo	3614956bc7	update readme	2023-04-18 21:39:05 +08:00
Concedo	ea01771dd5	rwkv is done	2023-04-18 20:55:01 +08:00
Concedo	a76b15b581	Merge branch 'concedo' into concedo_experimental # Conflicts: # make_pyinstaller.bat	2023-04-18 17:42:43 +08:00
Gustavo Rocha Dias	ed5b5c45a9	doc - enhanced readme explaing how to compile at Windows. (#80 )	2023-04-18 17:40:04 +08:00
Gustavo Rocha Dias	a9253cdfba	fix - at some OSs the PyInstaller command is case sensitive, at lowercase it doen't work. (#81 )	2023-04-18 17:39:06 +08:00
Concedo	ac61e34d5f	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # README.md	2023-04-18 17:38:10 +08:00
Concedo	c200b674f4	updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter	2023-04-18 17:36:44 +08:00
Ivan Komarov	42747220b4	Do not close file after mmap (Windows version) (#1034 )	2023-04-18 03:15:50 +02:00
Atsushi Tatsuma	e9298af389	readme : add Ruby bindings (#1029 )	2023-04-17 22:34:35 +03:00
Cameron	4ad73137a1	add 4_0 to default outfile namestr dict (#1031 ) this came up when trying to convert the gpt4all-lora-unfiltered-quantized.bin file	2023-04-17 20:26:23 +02:00
slaren	315a95a4d3	Add LoRA support (#820 )	2023-04-17 17:28:55 +02:00
Arik Poznanski	efd05648c8	llama : well-defined static initialization of complex objects (#927 ) * Replaced static initialization of complex objects with a initialization on first use. This prevents an undefined behavior on program run, for example, crash in Release build, works in Debug build * replaced use of auto with exact type to avoid using -std=c++14 * Made the assessors functions for static maps be static const	2023-04-17 17:41:53 +03:00
Georgi Gerganov	eb17a026fd	quantize-stats : fix bug in --type argument	2023-04-17 17:31:06 +03:00
Concedo	8e923dc6e9	updated kobold lite	2023-04-17 21:33:57 +08:00
Georgi Gerganov	69b740289f	ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c	2023-04-17 16:16:23 +03:00
Ivan Komarov	f266259ad9	Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933 )	2023-04-17 15:10:57 +02:00
Concedo	1f4a69c051	version number api	2023-04-17 19:31:15 +08:00

1 2 3 4 5 ...

565 commits