llama.cpp

Author	SHA1	Message	Date
Julia Longtin	588a0b19cc	expand mask, and align memory.	2024-05-13 22:12:54 +00:00
Julia Longtin	3994d81bf0	try to use vectorized zeroing function.	2024-05-13 22:12:54 +00:00
Julia Longtin	e227717136	add missing variable.	2024-05-13 22:12:54 +00:00
Julia Longtin	d5a27eb507	copy right block.	2024-05-13 22:12:54 +00:00
Julia Longtin	9f92f9730e	fix typo.	2024-05-13 22:12:54 +00:00
Julia Longtin	484c4abf8d	promote aux16 into a vector. (part three)	2024-05-13 22:12:54 +00:00
Julia Longtin	fb0fb9ff1b	promote aux16 into a vector.	2024-05-13 22:12:54 +00:00
Julia Longtin	405b5fa731	promote aux16 into a vector.	2024-05-13 22:12:54 +00:00
Julia Longtin	b92e06456c	formatting improvement.	2024-05-13 22:12:54 +00:00
Julia Longtin	ea858eee03	first fixes.	2024-05-13 22:12:54 +00:00
Julia Longtin	feed51c3f4	attempt to speed up float clearing.	2024-05-13 22:12:54 +00:00
Julia Longtin	2ed306623c	allow using code from ggml-phi-knc-dot_q5_K_q8_K.c	2024-05-13 22:12:50 +00:00
Julia Longtin	d5f39c3caa	force to compile.	2024-05-13 22:11:16 +00:00
Julia Longtin	b794e48ff8	tell ggml-common.h to export what we want.	2024-05-13 22:11:16 +00:00
Julia Longtin	2c5daab90f	pull in ggml specific types.	2024-05-13 22:11:16 +00:00
Julia Longtin	7080280c5b	import stdio.h for size_t.	2024-05-13 22:11:16 +00:00
Julia Longtin	96dce97091	import stdint.h for sizeSt.	2024-05-13 22:11:16 +00:00
Julia Longtin	0e6c910db9	begin work on targeting dot_q5_K_q8_K.	2024-05-13 22:11:16 +00:00
Julia Longtin	16cbe5dd81	be more specific about the length of our list of run amounts.	2024-05-13 22:11:16 +00:00
Julia Longtin	c605e951dc	spacing changes.	2024-05-13 22:11:16 +00:00
Julia Longtin	56be29fc58	formatting changes.	2024-05-13 22:11:16 +00:00
Julia Longtin	97c69835dc	use the same header as ggml.c, and remove some warnings.	2024-05-13 22:11:16 +00:00
Julia Longtin	580a347e59	remove intrinsics import, and use upConv to save 12 bytes of memory transit.	2024-05-13 22:11:15 +00:00
Julia Longtin	9ba28eaed3	Update ggml-phi-knc.c	2024-05-13 22:11:15 +00:00
Julia Longtin	72e2b13185	add a benchmark / test binary.	2024-05-13 22:11:15 +00:00
Julia Longtin	6f699fc98d	merge from upstream	2024-05-13 22:11:15 +00:00
Julia Longtin	926b0e8076	Update ggml.c	2024-05-13 22:11:15 +00:00
Julia Longtin	6e1b77ad58	Update ggml.c	2024-05-13 22:11:15 +00:00
Julia Longtin	f940c96aac	Update ggml.c	2024-05-13 22:11:15 +00:00
Julia Longtin	2458643dac	implement F32 dot products.	2024-05-13 22:11:15 +00:00
Julia Longtin	59ce785f61	import intrinsics.	2024-05-13 22:11:15 +00:00
Julia Longtin	c08ddb831f	use right type, and define GGML_F32_VEC_ZERO.	2024-05-13 22:11:15 +00:00
Julia Longtin	25095cac23	try to implement one intrinsic	2024-05-13 22:11:15 +00:00
Julia Longtin	8f6e535edc	try to detect the PHI cross compiler in make.	2024-05-13 22:11:15 +00:00
Julia Longtin	f7f174ecc9	try to detect the PHI cross compiler in make.	2024-05-13 22:11:15 +00:00
Julia Longtin	b9e2f2a332	instead of checking on glibc, check on SYS_getcpu	2024-05-13 22:11:10 +00:00
Julia Longtin	78291d93b9	handle the case that we have no glibc on the PHI.	2024-05-13 22:05:33 +00:00
Julia Longtin	757f952046	add detection of Xeon PHI: Knights Corner.	2024-05-13 22:03:26 +00:00
compilade	ee52225067	convert-hf : support direct Q8_0 conversion (#7234 ) * convert-hf : support q8_0 conversion * convert-hf : add missing ftype This was messing with the checksums otherwise. * convert-hf : add missing ftype to Baichuan and Xverse I didn't notice these on my first pass.	2024-05-13 14:10:51 -04:00
Georgi Gerganov	614d3b914e	llama : less KV padding when FA is off (#7257 ) ggml-ci	2024-05-13 17:15:15 +03:00
k.h.lai	30e70334f7	llava-cli: fix base64 prompt (#7248 )	2024-05-14 00:02:36 +10:00
Johannes Gäßler	1c570d8bee	perplexity: add BF16 vs. FP16 results (#7150 )	2024-05-13 13:03:27 +02:00
Neo Zhang	948f4ec7c5	[SYCL] rm wait() (#7233 )	2024-05-13 18:11:26 +08:00
Joan Fontanals	9aa672490c	llama : rename jina tokenizers to v2 (#7249 ) * refactor: rename jina tokenizers to v2 * refactor: keep refactoring non-breaking	2024-05-13 11:35:14 +03:00
Brian	b1f8af1886	convert.py: Outfile default name change and additional metadata support (#4858 ) * convert.py: Outfile default name change and additional metadata support * convert.py: don't stringify Metadata load method output * convert.py: typo fix * convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp	2024-05-13 12:56:47 +10:00
Benjamin Findley	e586ee4259	change default temperature of OAI compat API from 0 to 1 (#7226 ) * change default temperature of OAI compat API from 0 to 1 * make tests explicitly send temperature to OAI API	2024-05-13 12:40:08 +10:00
Neo Zhang	cbf75894d2	[SYCL] Add oneapi runtime dll files to win release package (#7241 ) * add oneapi running time dlls to release package * fix path * fix path * fix path * fix path * fix path --------- Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:04:29 +08:00
Neo Zhang	0d5cef78ae	[SYCL] update CI with oneapi 2024.1 (#7235 ) Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:02:55 +08:00
Johannes Gäßler	dc685be466	CUDA: add FP32 FlashAttention vector kernel (#7188 ) * CUDA: add FP32 FlashAttention vector kernel * fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel	2024-05-12 19:40:45 +02:00
Georgi Gerganov	6f1b63606f	cmake : fix version cmp (#7227 )	2024-05-12 18:30:23 +03:00

1 2 3 4 5 ...

2910 commits