llama.cpp

Author	SHA1	Message	Date
Concedo	b6594ab91e	do not show tokenizer warning	2023-05-13 15:48:17 +08:00
Concedo	cee8042793	integrated new version of clblast kernels as a separate file	2023-05-13 12:53:28 +08:00
Concedo	017023e477	updated kobold lite	2023-05-13 12:12:20 +08:00
Concedo	53e7256a25	should be good to merge, only thing missing is clblast new quants	2023-05-13 12:07:29 +08:00
Concedo	05cf5f7d6e	partially working, but the blas matmul is broken	2023-05-13 11:35:38 +08:00
Concedo	b335f73a60	BACKWARDS COMPAT QUANT SHIM is ready, but upstream model converter is BORKED. BORK BORK.	2023-05-13 01:30:11 +08:00
Concedo	08810d5fee	interim merge. do not use	2023-05-13 00:33:55 +08:00
Concedo	e9caff1cda	Interim merge. Do not use. Merge branch 'master' into concedo_experimental # Conflicts: # README.md # SHA256SUMS # examples/quantize/quantize.cpp # ggml-opencl.c # ggml.c # ggml.h # llama.cpp # llama.h	2023-05-12 23:20:27 +08:00
Georgi Gerganov	b9fd7eee57	ggml : remove bit shuffling (#1405 ) * ggml : remove Q4_0 bit shufling (ARM NEON) * ggml : remove Q4_1 bit shuffling (ARM NEON + reference) * ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON) * ggml : remove Q4_2 bit shuffling (WIP, BROKEN) * ggml : remove Q5_0 bit shuffling (ARM NEON) * ggml : 2x faster scalar implementations * ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) * ggml : simplify scalar dot * ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit * ggml : fix Q4_1 quantization * ggml : update cuBLAS + normalize variable names * ggml : remove Q4_2 mode * ggml : minor formatting * ggml : fix Q5_0 quantization * scripts : add script for measuring the time per token * AVX implementations (#1370) * ggml : uniform 5th bit extraction * llama : produce error upon loading old model files * llama : fix model magic/version write * ggml : speed-up Q5_0 + Q5_1 at 4 threads * ggml : preserve old Q4 and Q5 formats * ggml : simplify Q8_1 - no need for low / high sums anymore * ggml : fix Q8_0 and Q8_1 rounding * Revert "AVX implementations (#1370)" This reverts commit `948d124837`. * ggml : fix AVX2 implementation * sha : update hashes for 7B and 13B * readme : update timings + remove warning banner * llama : update v2 PR number to 1405 * ggml : fix WASM comments * ggml : back to original bit order * readme : add note that Q4 and Q5 have been changed * llama : fix return for unknown version --------- Co-authored-by: Stephan Walter <stephan@walter.name>	2023-05-12 00:23:08 +03:00
CRD716	b608b55a3e	prompts : model agnostic DAN (#1304 ) * add model-agnostic dan prompt * quick readme update * save a token * Revert "quick readme update" This reverts commit `8dc342c069`.	2023-05-11 18:10:19 +03:00
Evan Jones	cf348a60e0	main : add option to save full output to session (#1338 ) * main : add option to save full output to session * split behavior into --session and --prompt-cache * restore original implementation with new names * PR comments * move the check for incompatible parameters to gpt_params_parse * Fix whitespace Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com> --------- Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>	2023-05-10 11:37:14 -04:00
Concedo	19dbb3b2a5	Merge branch 'master' into concedo_experimental	2023-05-10 18:35:53 +08:00
DannyDaemonic	e6a46b0ed1	Locale fix for Windows (#1379 )	2023-05-09 19:53:28 +02:00
Sami Farin	9f8dbc4787	use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler (#1314 ) * use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler Tested with a 13B model. * use _mm_pause() in busyloop * use _mm_pause() in busyloop on x86_64 to reduce power consumption	2023-05-09 14:29:20 +02:00
Concedo	e47f7ade05	updated kobold lite, patch oom errors	2023-05-09 19:16:45 +08:00
Concedo	6d87f67572	up ver	2023-05-09 17:25:46 +08:00
Concedo	54194911ac	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-05-09 16:50:43 +08:00
Concedo	e4c6a1e3ed	update readme	2023-05-09 16:17:52 +08:00
DannyDaemonic	41654efea8	Interface improvements and `--multiline-input` (previously `--author-mode`) (#1040 ) * Interface improvements * Multiline input * Track character width * Works with all characters and control codes + Windows console fixes	2023-05-08 19:45:48 -07:00
Georgi Gerganov	56551bc11f	readme : add notice about upcoming breaking change	2023-05-08 22:52:18 +03:00
AlpinDale	fe60904eef	readme : add TOC and Pygmalion instructions (#1359 )	2023-05-08 19:33:30 +03:00
Pavol Rusnak	003ba2fb43	llama : fix hparams shadow (#1367 ) fixes #1363	2023-05-08 17:48:21 +03:00
Georgi Gerganov	f9a6364912	llama : require first token to be BOS (#1303 ) * llama : require first token to be BOS * scripts : add ppl-run-all.sh * perplexity : add BOS for each chunk * readme : update perplexity values after BOS fix * perplexity : add clarifying comments	2023-05-08 17:41:54 +03:00
Concedo	2f2eff6e13	the dark gods have been sated, and redpajama is integrated... but at what cost?	2023-05-08 20:58:00 +08:00
ubik2	95078cc554	convert: add ability to convert safetensors files (#1276 ) * when loading a safetensors file, ignore the metadata header * check for safetensors files first, and only use PyTorch versions when safetensors aren't available	2023-05-08 13:54:26 +02:00
Concedo	b9904c3093	up ver	2023-05-08 11:13:16 +08:00
Concedo	1083876a1b	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md	2023-05-08 11:12:42 +08:00
Concedo	89d70886a4	added support for setting custom context size at load time (memory allocation)	2023-05-08 11:11:25 +08:00
Johannes Gäßler	1f48b0abcf	Documented CUDA reproducibility, added warning (#1346 )	2023-05-08 02:42:01 +02:00
Henri Vasserman	e1295513a4	CI: add Windows CLBlast and OpenBLAS builds (#1277 ) * Add OpenCL and CLBlast support * Add OpenBLAS support * Remove testing from matrix * change build name to 'clblast'	2023-05-07 13:20:09 +02:00
Concedo	62beded0e7	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # Makefile # README.md	2023-05-07 19:10:01 +08:00
swittk	1b0fd45465	ggml : Allow usage of CLBlast alongside Accelerate.framework (#1336 ) Minor edit in ggml.c which originally would prevent OpenCL from loading completely if GGML_USE_ACCELERATE was defined. Minor speedup in prompt eval time.	2023-05-06 23:03:23 -04:00
Jed Fox	3924088512	Remove default arguments from sampling functions (#1343 )	2023-05-06 17:01:47 -04:00
Concedo	ff93b394da	fixed a typo	2023-05-06 12:37:34 +08:00
Concedo	a48dddab86	slightly bump the RAM up to support chinese alpaca	2023-05-06 11:48:22 +08:00
DaniAndTheWeb	173d0e6419	makefile: automatic Arch Linux detection (#1332 ) This commit is a port of a detection method used in koboldcpp's Makefile in order to automatically set the -lcblas option on Arch Linux	2023-05-05 23:57:14 +02:00
Erik Scholz	a3b85b28da	ci : add cublas to windows release (#1271 )	2023-05-05 22:56:09 +02:00
Concedo	8a964e76c8	integrated mirostat as a launch parameter, works on all models	2023-05-06 00:47:17 +08:00
Concedo	851f55325a	Merge remote-tracking branch 'temp/concedo' into concedo_experimental	2023-05-05 23:55:53 +08:00
Pavol Rusnak	921dcee00a	readme: add missing info (#1324 )	2023-05-05 16:43:36 +02:00
Concedo	2edbcebe27	added optional force versioning flag	2023-05-05 22:02:00 +08:00
Concedo	39f3d1cf48	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md # examples/quantize/quantize.cpp	2023-05-05 21:34:33 +08:00
Ionoclast Laboratories	2d13786e91	Fix for OpenCL / clbast builds on macOS. (#1329 )	2023-05-05 14:18:21 +02:00
Hendrik Langer	8131bc8b56	add new sampling algorithm mirostat	2023-05-05 13:23:47 +02:00
Benjamin Lecaillon	a90e96b266	Convert.py @staticmethod (#1327 ) * Line 698 has one #staticmethod and should not otherwise throw error at unpickle.load() as not callable * Update convert.py --------- Co-authored-by: Ivan Stepanov <ivanstepanovftw@gmail.com>	2023-05-05 03:17:07 +03:00
slaren	94c5652fc0	quantize: make output filename optional, default to ggml-model-<ftype>.bin (#1301 )	2023-05-05 00:58:56 +02:00
Ivan Stepanov	34d9f22f44	Wrap exceptions in std::exception to verbose output on exception. (#1316 )	2023-05-04 18:56:27 +02:00
Ivan Stepanov	d3e8093e9b	convert: support DT_BF16 tensors (#1309 ) Co-authored-by: Pavol Rusnak <pavol@rusnak.io>	2023-05-04 18:54:37 +02:00
44670	360cfe5bec	readme : add OpenBuddy link (#1321 )	2023-05-04 19:33:31 +03:00
44670	2edbdb0f99	main : add --in-suffix option (#1318 ) * adding --in-suffix option * print input suffix before generation	2023-05-04 18:41:12 +03:00

1 2 3 4 5 ...

819 commits