llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	dae6ba2abe	baby-llama : couple of clang-tidy warnings	2023-05-13 15:38:50 +03:00
Georgi Gerganov	ef3d42a3aa	ggml : fix clang-tidy warnings	2023-05-13 15:34:56 +03:00
Georgi Gerganov	95a487a17e	ggml : remove Q4_2 remnants	2023-05-13 15:22:24 +03:00
Georgi Gerganov	092913ecea	Merge remote-tracking branch 'origin/master' into HEAD	2023-05-13 15:20:22 +03:00
Georgi Gerganov	f048af0230	ggml : sync alibi fix from ggml repo	2023-05-13 11:54:33 +03:00
3ooabkhxtn	ac0cd259d5	Adding SSE instructions to ggml_vec_dot_q4_0_q8_0 (#1413 )	2023-05-13 08:43:33 +00:00
Georgi Gerganov	0cd22e190a	llama : fix various warnings	2023-05-13 11:23:15 +03:00
Rinne	6456a4eb9f	embedding : remove unused code (#1426 )	2023-05-13 10:24:20 +03:00
Georgi Gerganov	33034cfede	ggml : fix null ptr deref in backward pass	2023-05-13 10:08:01 +03:00
Georgi Gerganov	f977243ded	minor : fix compiler warnings + indentation style	2023-05-13 09:55:17 +03:00
Georgi Gerganov	cdd5350892	readme : update Q4_0 perplexities I think these were affected by the removal of the `round` during quantization	2023-05-13 09:12:44 +03:00
Georgi Gerganov	738ace394a	llama : free ggml context in set / copy state data (close #1425 )	2023-05-13 09:08:52 +03:00
Henri Vasserman	699b1ad7fe	opencl : fix kernels for the new formats (#1422 ) * Fix OpenCL kernels for the new formats * Fix Q5_0 alignment issues.	2023-05-13 09:01:15 +03:00
Georgi Gerganov	fb62f92433	llama : fix --mtest option (close #1414 )	2023-05-12 21:44:20 +03:00
Johannes Gäßler	773ee249fb	CLI args use - instead of _, backwards compatible (#1416 )	2023-05-12 14:34:55 +00:00
slaren	553fd4d4b5	Add clang-tidy reviews to CI (#1407 )	2023-05-12 15:40:53 +02:00
Rinne	089b1c93ba	readme : add C#/.NET bindings repo (#1409 )	2023-05-12 08:39:40 +03:00
Georgi Gerganov	b9fd7eee57	ggml : remove bit shuffling (#1405 ) * ggml : remove Q4_0 bit shufling (ARM NEON) * ggml : remove Q4_1 bit shuffling (ARM NEON + reference) * ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON) * ggml : remove Q4_2 bit shuffling (WIP, BROKEN) * ggml : remove Q5_0 bit shuffling (ARM NEON) * ggml : 2x faster scalar implementations * ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) * ggml : simplify scalar dot * ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit * ggml : fix Q4_1 quantization * ggml : update cuBLAS + normalize variable names * ggml : remove Q4_2 mode * ggml : minor formatting * ggml : fix Q5_0 quantization * scripts : add script for measuring the time per token * AVX implementations (#1370) * ggml : uniform 5th bit extraction * llama : produce error upon loading old model files * llama : fix model magic/version write * ggml : speed-up Q5_0 + Q5_1 at 4 threads * ggml : preserve old Q4 and Q5 formats * ggml : simplify Q8_1 - no need for low / high sums anymore * ggml : fix Q8_0 and Q8_1 rounding * Revert "AVX implementations (#1370)" This reverts commit `948d124837`. * ggml : fix AVX2 implementation * sha : update hashes for 7B and 13B * readme : update timings + remove warning banner * llama : update v2 PR number to 1405 * ggml : fix WASM comments * ggml : back to original bit order * readme : add note that Q4 and Q5 have been changed * llama : fix return for unknown version --------- Co-authored-by: Stephan Walter <stephan@walter.name>	2023-05-12 00:23:08 +03:00
xaedes	b9ef08ccab	remove trailing whitespace	2023-05-11 20:03:18 +02:00
xaedes	581e5eb954	cleanup code for batched training	2023-05-11 19:49:41 +02:00
xaedes	3e3ed9560c	add parallel batched forward function for baby-llama training	2023-05-11 19:31:46 +02:00
CRD716	b608b55a3e	prompts : model agnostic DAN (#1304 ) * add model-agnostic dan prompt * quick readme update * save a token * Revert "quick readme update" This reverts commit `8dc342c069`.	2023-05-11 18:10:19 +03:00
Evan Jones	cf348a60e0	main : add option to save full output to session (#1338 ) * main : add option to save full output to session * split behavior into --session and --prompt-cache * restore original implementation with new names * PR comments * move the check for incompatible parameters to gpt_params_parse * Fix whitespace Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com> --------- Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>	2023-05-10 11:37:14 -04:00
DannyDaemonic	e6a46b0ed1	Locale fix for Windows (#1379 )	2023-05-09 19:53:28 +02:00
Sami Farin	9f8dbc4787	use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler (#1314 ) * use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler Tested with a 13B model. * use _mm_pause() in busyloop * use _mm_pause() in busyloop on x86_64 to reduce power consumption	2023-05-09 14:29:20 +02:00
DannyDaemonic	41654efea8	Interface improvements and `--multiline-input` (previously `--author-mode`) (#1040 ) * Interface improvements * Multiline input * Track character width * Works with all characters and control codes + Windows console fixes	2023-05-08 19:45:48 -07:00
Georgi Gerganov	56551bc11f	readme : add notice about upcoming breaking change	2023-05-08 22:52:18 +03:00
Georgi Gerganov	6ca682b19d	ggml : swap vDSP_vsub args as per documentation	2023-05-08 21:16:35 +03:00
xaedes	9c3fe4eb76	swap arguments to vDSP_vdiv call documentation for vDSP_vdiv states: "Note that B comes before A!"	2023-05-08 21:16:35 +03:00
xaedes	cafbb785fa	swap arguments to vDSP_vdiv call documentation for vDSP_vdiv states: "Note that B comes before A!"	2023-05-08 20:13:40 +02:00
AlpinDale	fe60904eef	readme : add TOC and Pygmalion instructions (#1359 )	2023-05-08 19:33:30 +03:00
Georgi Gerganov	6cc42deda5	ggml : fix nullptr derefs in GGML_OP_CONT and GGML_OP_RESHAPE back	2023-05-08 18:50:04 +03:00
Georgi Gerganov	78af3e92c9	ggml : fix compiler warnings + cosmetic changes	2023-05-08 18:37:17 +03:00
xaedes	0d72207ac3	c++ in baby-llama example use c++ includes instead of c includes use std::min, std::max instead of MIN, MAX macros	2023-05-08 16:56:55 +02:00
Pavol Rusnak	003ba2fb43	llama : fix hparams shadow (#1367 ) fixes #1363	2023-05-08 17:48:21 +03:00
Georgi Gerganov	f9a6364912	llama : require first token to be BOS (#1303 ) * llama : require first token to be BOS * scripts : add ppl-run-all.sh * perplexity : add BOS for each chunk * readme : update perplexity values after BOS fix * perplexity : add clarifying comments	2023-05-08 17:41:54 +03:00
xaedes	dea9c9359a	c++ in baby-llama example use c++ includes instead of c includes use std::min, std::max instead of MIN, MAX macros	2023-05-08 16:40:31 +02:00
ubik2	95078cc554	convert: add ability to convert safetensors files (#1276 ) * when loading a safetensors file, ignore the metadata header * check for safetensors files first, and only use PyTorch versions when safetensors aren't available	2023-05-08 13:54:26 +02:00
Johannes Gäßler	1f48b0abcf	Documented CUDA reproducibility, added warning (#1346 )	2023-05-08 02:42:01 +02:00
xaedes	1ecbece752	disable slow tests grad0 and opt to avoid exceeding timeouts	2023-05-08 02:29:36 +02:00
xaedes	f5301061b6	remove busy loop that was used as sleep for slower sinus wave generation	2023-05-08 01:12:37 +02:00
xaedes	4997bc5819	reduce number of test-grad0 iterations avoid exceeding timeout of automated tests	2023-05-08 00:57:41 +02:00
xaedes	2936dd60a4	remove trailing whitespace	2023-05-08 00:04:54 +02:00
xaedes	7c8768f819	add missing include for strcmp, etc	2023-05-07 23:43:43 +02:00
xaedes	660836f0ff	fix call to ggml_set_name	2023-05-07 23:39:57 +02:00
xaedes	9dd8e405fb	rename print functions in baby-llama example	2023-05-07 22:43:23 +02:00
xaedes	47ad186628	revert disabling of threading for rms_norm and norm	2023-05-07 21:56:10 +02:00
xaedes	5d9fed7e7f	remove shape annotations in llama_eval_internal	2023-05-07 21:45:29 +02:00
xaedes	d20ba6f6e6	update static assert of GGML_OP_COUNT	2023-05-07 21:42:42 +02:00
xaedes	e643fa1619	smaller default values for baby llama model parameters	2023-05-07 21:38:00 +02:00

1 2 3 4 5 ...

650 commits