llama.cpp

Author	SHA1	Message	Date
Julia Longtin	ea858eee03	first fixes.	2024-05-13 22:12:54 +00:00
Julia Longtin	feed51c3f4	attempt to speed up float clearing.	2024-05-13 22:12:54 +00:00
Julia Longtin	2ed306623c	allow using code from ggml-phi-knc-dot_q5_K_q8_K.c	2024-05-13 22:12:50 +00:00
Julia Longtin	d5f39c3caa	force to compile.	2024-05-13 22:11:16 +00:00
Julia Longtin	b794e48ff8	tell ggml-common.h to export what we want.	2024-05-13 22:11:16 +00:00
Julia Longtin	2c5daab90f	pull in ggml specific types.	2024-05-13 22:11:16 +00:00
Julia Longtin	7080280c5b	import stdio.h for size_t.	2024-05-13 22:11:16 +00:00
Julia Longtin	96dce97091	import stdint.h for sizeSt.	2024-05-13 22:11:16 +00:00
Julia Longtin	0e6c910db9	begin work on targeting dot_q5_K_q8_K.	2024-05-13 22:11:16 +00:00
Julia Longtin	16cbe5dd81	be more specific about the length of our list of run amounts.	2024-05-13 22:11:16 +00:00
Julia Longtin	c605e951dc	spacing changes.	2024-05-13 22:11:16 +00:00
Julia Longtin	56be29fc58	formatting changes.	2024-05-13 22:11:16 +00:00
Julia Longtin	97c69835dc	use the same header as ggml.c, and remove some warnings.	2024-05-13 22:11:16 +00:00
Julia Longtin	580a347e59	remove intrinsics import, and use upConv to save 12 bytes of memory transit.	2024-05-13 22:11:15 +00:00
Julia Longtin	9ba28eaed3	Update ggml-phi-knc.c	2024-05-13 22:11:15 +00:00
Julia Longtin	72e2b13185	add a benchmark / test binary.	2024-05-13 22:11:15 +00:00
Julia Longtin	6f699fc98d	merge from upstream	2024-05-13 22:11:15 +00:00
Julia Longtin	926b0e8076	Update ggml.c	2024-05-13 22:11:15 +00:00
Julia Longtin	6e1b77ad58	Update ggml.c	2024-05-13 22:11:15 +00:00
Julia Longtin	f940c96aac	Update ggml.c	2024-05-13 22:11:15 +00:00
Julia Longtin	2458643dac	implement F32 dot products.	2024-05-13 22:11:15 +00:00
Julia Longtin	59ce785f61	import intrinsics.	2024-05-13 22:11:15 +00:00
Julia Longtin	c08ddb831f	use right type, and define GGML_F32_VEC_ZERO.	2024-05-13 22:11:15 +00:00
Julia Longtin	25095cac23	try to implement one intrinsic	2024-05-13 22:11:15 +00:00
Julia Longtin	8f6e535edc	try to detect the PHI cross compiler in make.	2024-05-13 22:11:15 +00:00
Julia Longtin	f7f174ecc9	try to detect the PHI cross compiler in make.	2024-05-13 22:11:15 +00:00
Julia Longtin	b9e2f2a332	instead of checking on glibc, check on SYS_getcpu	2024-05-13 22:11:10 +00:00
Julia Longtin	78291d93b9	handle the case that we have no glibc on the PHI.	2024-05-13 22:05:33 +00:00
Julia Longtin	757f952046	add detection of Xeon PHI: Knights Corner.	2024-05-13 22:03:26 +00:00
compilade	ee52225067	convert-hf : support direct Q8_0 conversion (#7234 ) * convert-hf : support q8_0 conversion * convert-hf : add missing ftype This was messing with the checksums otherwise. * convert-hf : add missing ftype to Baichuan and Xverse I didn't notice these on my first pass.	2024-05-13 14:10:51 -04:00
Georgi Gerganov	614d3b914e	llama : less KV padding when FA is off (#7257 ) ggml-ci	2024-05-13 17:15:15 +03:00
k.h.lai	30e70334f7	llava-cli: fix base64 prompt (#7248 )	2024-05-14 00:02:36 +10:00
Johannes Gäßler	1c570d8bee	perplexity: add BF16 vs. FP16 results (#7150 )	2024-05-13 13:03:27 +02:00
Neo Zhang	948f4ec7c5	[SYCL] rm wait() (#7233 )	2024-05-13 18:11:26 +08:00
Joan Fontanals	9aa672490c	llama : rename jina tokenizers to v2 (#7249 ) * refactor: rename jina tokenizers to v2 * refactor: keep refactoring non-breaking	2024-05-13 11:35:14 +03:00
Brian	b1f8af1886	convert.py: Outfile default name change and additional metadata support (#4858 ) * convert.py: Outfile default name change and additional metadata support * convert.py: don't stringify Metadata load method output * convert.py: typo fix * convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp	2024-05-13 12:56:47 +10:00
Benjamin Findley	e586ee4259	change default temperature of OAI compat API from 0 to 1 (#7226 ) * change default temperature of OAI compat API from 0 to 1 * make tests explicitly send temperature to OAI API	2024-05-13 12:40:08 +10:00
Neo Zhang	cbf75894d2	[SYCL] Add oneapi runtime dll files to win release package (#7241 ) * add oneapi running time dlls to release package * fix path * fix path * fix path * fix path * fix path --------- Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:04:29 +08:00
Neo Zhang	0d5cef78ae	[SYCL] update CI with oneapi 2024.1 (#7235 ) Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:02:55 +08:00
Johannes Gäßler	dc685be466	CUDA: add FP32 FlashAttention vector kernel (#7188 ) * CUDA: add FP32 FlashAttention vector kernel * fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel	2024-05-12 19:40:45 +02:00
Georgi Gerganov	6f1b63606f	cmake : fix version cmp (#7227 )	2024-05-12 18:30:23 +03:00
slaren	b228aba91a	remove convert-lora-to-ggml.py (#7204 )	2024-05-12 02:29:33 +02:00
Georgi Gerganov	7bd4ffb780	metal : fix warnings (skipme) (#0 )	2024-05-11 21:38:13 +03:00
Georgi Gerganov	1622ac023f	sync : ggml	2024-05-11 21:35:05 +03:00
Georgi Gerganov	6aeff24f8b	metal : fix indent (ggml/0)	2024-05-11 21:34:21 +03:00
Georgi Gerganov	325756d28d	ggml : resolve merge (ggml/0) ggml-ci	2024-05-11 21:33:08 +03:00
Josh Ramer	fed0108491	Scripting & documenting debugging one test without anything else in the loop. (#7096 ) * A little documentation that shares my quick tips for working in the repository. * Update startup-testing-debugging.md * script that shows a menu of tests to pick from & run the debugger on * debug-test.sh: Refactor CLI help message * debug-test.sh: documentation update * debug-test.sh: CLI Help output corrections * debug-test.sh: minor doc fix --------- authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal> Assisted-by: brian khuu <mofosyne@gmail.com>	2024-05-12 03:26:35 +10:00
Xuan Son Nguyen	72c177c1f6	fix system prompt handling (#7153 )	2024-05-11 17:28:10 +02:00
compilade	5a419926b0	convert-hf : support bfloat16 conversion (#7158 ) * convert-hf : support bfloat16 conversion * gguf-py : flake8 fixes * convert-hf : add missing space after comma * convert-hf : get bit-exact same output as ./quantize The quantization version was missing. * convert-hf : don't round bf16 NANs * convert-hf : save some memory with np.int16 intermediate bf16 weights * convert-hf : more closely match llama.cpp with which weights to keep in f32 * convert-hf : add --outtype auto-f16 A reason for this to exist is for model quantizers who want an initial GGUF with the most fidelity to the original model while still using a 16-bit float type instead of 32-bit floats. * convert-hf : remove a semicolon because flake8 doesn't like it It's a reflex from when programming in C/C++, I guess. * convert-hf : support outtype templating in outfile name * convert-hf : rename --outtype auto-f16 to --outtype auto	2024-05-11 11:06:26 -04:00
Georgi Gerganov	fae9d234b6	sync : ggml ggml-ci	2024-05-11 15:38:34 +03:00

1 2 3 4 5 ...

2901 commits