llama.cpp

Author	SHA1	Message	Date
Julia Longtin	f09b3ed79e	use quotes properly.	2024-03-23 20:53:16 +00:00
Julia Longtin	bb5eb95816	use better memory save operator.	2024-03-23 20:49:11 +00:00
Julia Longtin	9d7ca41703	expand mask, and align memory.	2024-03-23 20:48:43 +00:00
Julia Longtin	bd6d7e6238	try to use vectorized zeroing function.	2024-03-23 19:55:12 +00:00
Julia Longtin	f985372e3a	add missing variable.	2024-03-23 19:49:16 +00:00
Julia Longtin	31d4f9312b	copy right block.	2024-03-23 19:47:21 +00:00
Julia Longtin	e43a63e7c6	fix typo.	2024-03-23 16:29:30 +00:00
Julia Longtin	f092a10dc9	promote aux16 into a vector. (part three)	2024-03-23 16:27:11 +00:00
Julia Longtin	c72157a5a6	promote aux16 into a vector.	2024-03-23 16:24:11 +00:00
Julia Longtin	e3503c924a	promote aux16 into a vector.	2024-03-23 16:21:20 +00:00
Julia Longtin	edb76ffddb	formatting improvement.	2024-03-23 16:19:17 +00:00
Julia Longtin	6face8a0be	first fixes.	2024-03-23 15:56:47 +00:00
Julia Longtin	0a2051aa88	attempt to speed up float clearing.	2024-03-23 15:55:00 +00:00
Julia Longtin	0b012c03ef	allow using code from ggml-phi-knc-dot_q5_K_q8_K.c	2024-03-23 15:02:56 +00:00
Julia Longtin	0b3f17127f	force to compile.	2024-03-23 14:58:33 +00:00
Julia Longtin	18f353987c	tell ggml-common.h to export what we want.	2024-03-23 14:49:35 +00:00
Julia Longtin	cd20404250	pull in ggml specific types.	2024-03-23 14:38:15 +00:00
Julia Longtin	8f57803f58	import stdio.h for size_t.	2024-03-23 14:29:59 +00:00
Julia Longtin	9bcb8350d5	import stdint.h for sizeSt.	2024-03-23 14:28:29 +00:00
Julia Longtin	a7bd64c130	begin work on targeting dot_q5_K_q8_K.	2024-03-23 14:19:47 +00:00
Julia Longtin	9185e14922	be more specific about the length of our list of run amounts.	2024-03-21 20:38:49 +00:00
Julia Longtin	0979522fbe	spacing changes.	2024-03-21 18:36:25 +00:00
Julia Longtin	ac3637142d	formatting changes.	2024-03-20 21:34:12 +00:00
Julia Longtin	76e66e77c2	use the same header as ggml.c, and remove some warnings.	2024-03-20 21:12:22 +00:00
Julia Longtin	ee27148629	remove intrinsics import, and use upConv to save 12 bytes of memory transit.	2024-03-20 20:15:30 +00:00
Julia Longtin	ab6f3a8a8d	Update ggml-phi-knc.c	2024-03-17 21:36:14 +00:00
Julia Longtin	f882673ba6	add a benchmark / test binary.	2024-03-17 21:20:14 +00:00
Julia Longtin	fe663c1b63	merge from upstream	2024-03-17 21:15:32 +00:00
Julia Longtin	eac00a72d5	Update ggml.c	2024-03-16 14:17:21 +00:00
Julia Longtin	e216a2f133	Update ggml.c	2024-03-16 14:15:51 +00:00
Julia Longtin	257ffd9955	Update ggml.c	2024-03-16 14:13:22 +00:00
Julia Longtin	717e164dd7	implement F32 dot products.	2024-03-16 14:05:03 +00:00
Julia Longtin	7a57feba0c	import intrinsics.	2024-03-13 19:26:54 +00:00
Julia Longtin	a1ae649662	use right type, and define GGML_F32_VEC_ZERO.	2024-03-13 19:23:53 +00:00
Julia Longtin	f346a41deb	try to implement one intrinsic	2024-03-13 19:18:10 +00:00
Julia Longtin	aec982eefd	try to detect the PHI cross compiler in make.	2024-03-12 21:54:38 +00:00
Julia Longtin	a31c936c5a	try to detect the PHI cross compiler in make.	2024-03-12 21:40:46 +00:00
Julia Longtin	5a2973af25	instead of checking on glibc, check on SYS_getcpu	2024-03-12 21:07:10 +00:00
Julia Longtin	7f3722beb6	handle the case that we have no glibc on the PHI.	2024-03-12 21:02:14 +00:00
Julia Longtin	868a2016ac	add detection of Xeon PHI: Knights Corner.	2024-03-12 20:57:43 +00:00
slaren	306d34be7a	ci : remove tidy-review (#6021 )	2024-03-12 17:55:19 +02:00
Georgi Gerganov	8030da7afe	ggml : reuse quantum structs across backends (#5943 ) * ggml : reuse quant blocks across backends ggml-ci * ggml : define helper constants only for CUDA and SYCL ggml-ci * ggml : define helper quantum constants for SYCL ggml-ci	2024-03-12 14:27:20 +02:00
Georgi Gerganov	184215e783	ggml : fix UB in IQ2_S and IQ3_S (#6012 )	2024-03-12 13:49:55 +02:00
Georgi Gerganov	48358b2e5b	sycl : update IQ1_S kernels (WIP - not working!) (#5995 ) * sycl : try to fix after IQ1_S changes * sycl : iq1s_grid -> iq1s_grid_gpu * sycl : fix grid type	2024-03-12 11:15:05 +02:00
gliptic	5cdb371731	grammar : fix unnecessarily retained pointer to rules (#6003 )	2024-03-11 21:59:03 +02:00
Kawrakow	44ca159faf	1.5 bit: we can do even better (#5999 ) * iq1_s: we can do even better Spent one of the 4 scale bits on a signs of a 0.125 shift. I.e., quants are now -1 + delta, delta, 1 + delta, where delta is +/- 0.125. CUDA works, same performance as before. PPL(LLaMA-v2-7B) is now 11.85! * iq1_s: make scalar and AVX2 work with the new version * iq1_s: make Neon work with new version. ~10% drop in performance, so will need some more work. * iq1_s: make Metal work with new version * iq1_s: very slightly faster dequantize on Metal * iq1_s: fix dequantize on the CPU --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-03-11 17:53:15 +02:00
Georgi Gerganov	05b06210c9	llama : more consistent names of count variables (#5994 ) * llama : more consistent names of count variables ggml-ci * llama : n_parallel -> n_seq_max * common : fix param name * examples : fix param name	2024-03-11 17:49:47 +02:00
Georgi Gerganov	83796e62bc	llama : refactor unicode stuff (#5992 ) * llama : refactor unicode stuff ggml-ci * unicode : names * make : fix c++ compiler * unicode : names * unicode : straighten tables * zig : fix build * unicode : put nfd normalization behind API ggml-ci * swift : fix build * unicode : add BOM * unicode : add <cstdint> ggml-ci * unicode : pass as cpts as const ref	2024-03-11 17:47:47 +02:00
Jakub N	828defefb6	Update server docker image URLs (#5997 )	2024-03-11 14:40:42 +01:00
Xuan Son Nguyen	caa106d4e0	Server: format error to json (#5961 ) * server: format error to json * server: do not crash on grammar error * fix api key test case * revert limit max n_predict * small fix * correct coding style * update completion.js * launch_slot_with_task * update docs * update_slots * update webui * update readme	2024-03-11 10:56:41 +01:00

1 2 3 4 5 ...

2449 commits