llama.cpp

Author	SHA1	Message	Date
Zhiyuan Li	623db3b06f	update lint	2024-11-05 13:31:41 +11:00
Zhiyuan Li	e264c35fc9	remove some codes	2024-11-05 03:03:38 +11:00
Zhiyuan Li	acb1b9d22c	Merge branch 'ggerganov:master' into master	2024-11-05 02:58:54 +11:00
Zhiyuan Li	4574795cd5	use recommended way GGML_TENSOR_LOCALS	2024-11-05 02:57:08 +11:00
Zhiyuan Li	4693b4611f	rewrite to be more inline with the common pattern for distributing threads	2024-11-05 02:49:22 +11:00
Zhiyuan Li	a749ba7701	put the declaration outside the loop	2024-11-05 02:45:40 +11:00
Zhiyuan Li	6a1e977e34	Update ggml/src/ggml-sycl/concat.cpp Co-authored-by: Meng, Hengyu <airdldl@163.com>	2024-11-05 02:41:55 +11:00
Zhiyuan Li	35a1a2dfa9	move element-wise functions outside	2024-11-05 02:40:11 +11:00
Xuan Son Nguyen	9e0ecfb697	server : clarify /slots endpoint, add is_processing (#10162 ) * server : clarify /slots endpoint, add is_processing * fix tests	2024-11-04 16:33:29 +01:00
snadampal	6a066b9978	fix build break on arm64 linux (#10166 ) This fixes the build break from the recent changes to move the CPU backend to separate files https://github.com/ggerganov/llama.cpp/pull/10144	2024-11-04 16:08:33 +01:00
Zhiyuan Li	72e4432577	add appropriate asserts	2024-11-05 01:20:52 +11:00
Zhiyuan Li	b81602477b	Update ggml/src/ggml-cpu.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-05 01:14:27 +11:00
Zhiyuan Li	a878502f43	fix define error	2024-11-05 01:07:33 +11:00
Zhiyuan Li	81cb301224	update the function to use appropriate types	2024-11-05 00:55:59 +11:00
Zhiyuan Li	bb0685fad5	Update ggml/src/ggml-sycl/wkv6.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-05 00:42:37 +11:00
Zhiyuan Li	8c7b4ec22a	Update ggml/src/ggml-sycl/outprod.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-05 00:42:31 +11:00
Zhiyuan Li	9ea34a78cb	fix: add defualt	2024-11-04 23:28:26 +11:00
Diego Devesa	ea02c753eb	cuda : clear error after changing peer access (#10153 )	2024-11-04 13:10:23 +01:00
Georgi Gerganov	05697f670b	metal : simplify f16 and f32 dequant kernels (#0 )	2024-11-04 13:49:34 +02:00
Georgi Gerganov	f8e58135cf	metal : move dequantize templates to beginning of MSL source (#0 )	2024-11-04 13:44:06 +02:00
Zhiyuan Li	61c665b7f1	fix: update changes to upstream	2024-11-04 22:17:12 +11:00
Zhiyuan Li	5f792141c5	Merge branch 'ggerganov:master' into master	2024-11-04 22:12:31 +11:00
Georgi Gerganov	153251f761	sync : ggml	2024-11-04 22:10:53 +11:00
Yuri Khrustalev	eb5711c496	cmake : make it possible linking ggml as external lib (ggml/1003)	2024-11-04 22:10:53 +11:00
Plamen Minev	8050d021ab	metal : fix minor string leaks (ggml/1004)	2024-11-04 22:10:53 +11:00
Diego Devesa	89812b157a	ggml : move CPU backend to a separate file (#10144 )	2024-11-04 22:10:53 +11:00
Georgi Gerganov	b18963085b	metal : minor fixup in FA kernel (#10143 ) * metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var	2024-11-04 22:09:57 +11:00
Georgi Gerganov	4d266310f5	flake.lock: Update (#10146 )	2024-11-04 22:09:57 +11:00
leo-pony	329ed914c9	CANN: adjust backend registry refactor. (#10158 ) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.	2024-11-04 19:08:22 +08:00
Georgi Gerganov	ce027adfb3	sync : ggml	2024-11-04 10:33:37 +02:00
Yuri Khrustalev	284e5b0275	cmake : make it possible linking ggml as external lib (ggml/1003)	2024-11-04 10:33:11 +02:00
Plamen Minev	e2292aaa17	metal : fix minor string leaks (ggml/1004)	2024-11-04 10:33:10 +02:00
Diego Devesa	9f40989351	ggml : move CPU backend to a separate file (#10144 )	2024-11-03 19:34:08 +01:00
Georgi Gerganov	08828a6d7d	metal : minor fixup in FA kernel (#10143 ) * metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var	2024-11-03 15:18:40 +02:00
Georgi Gerganov	1839f69130	flake.lock: Update (#10146 )	2024-11-03 05:14:15 -08:00
Zhiyuan Li	811aa872d6	wkv6: drop armv9 and tranfer to GGML style	2024-11-03 23:54:57 +11:00
Zhiyuan Li	042c3e0fd3	Merge branch 'ggerganov:master' into master	2024-11-03 17:30:25 +11:00
Zhiyuan Li	1c58096f6f	sycl: Enhance OP support judgment	2024-11-03 16:43:17 +11:00
Zhiyuan Li	bee1cec7d2	sycl: add some ops	2024-11-03 16:43:17 +11:00
Zhiyuan Li	2fc42b6a82	wkv on sycl	2024-11-03 16:43:17 +11:00
Zhiyuan Li	3f75f12114	rwkv6: rename params	2024-11-03 16:43:17 +11:00
Zhiyuan Li	e198f7b9df	rwkv6: update cuda file name	2024-11-03 16:43:17 +11:00
Zhiyuan Li	b4254c5550	rwkv6: support avx2 avx512 armv8 armv9	2024-11-03 16:43:17 +11:00
Zhiyuan Li	f66c75a495	rwkv6: rename to wkv6	2024-11-03 16:43:17 +11:00
Christian Köhnenkamp	9830b6923b	Add apple arm to presets (#10134 ) * Add apple arm to presets * Add final new line	2024-11-02 15:35:31 -07:00
sasha0552	42cadc74bd	server : fix slot selection by lru (#10126 ) * server : fix slot selection by lru, migrate lcs to `size_t` * minor debug log fix	2024-11-02 18:34:56 +02:00
Georgi Gerganov	45950415ed	server : fix endpoint checks (#10135 ) ggml-ci	2024-11-02 18:34:00 +02:00
Georgi Gerganov	1926d6e39d	llama : adjust default context size + print warnings (#10136 ) * llama : adjust default context size + print warnings ggml-ci * ggml-ci : add missing gpu-layers + adjust context sizes	2024-11-02 15:18:56 +02:00
Diego Devesa	b634f8a26f	simple-chat : only add bos on first prompt (#10129 )	2024-11-02 13:08:53 +01:00
Xuan Son Nguyen	7554aa4655	convert-lora : make `--base` optional (#10110 ) * convert-lora : make `--base` optional * lint * handle case where base_model_name_or_path is invalid * do not include metadata from base model * clarify unspecified --base * add small comment [no ci] * trigger ci	2024-11-02 12:53:17 +01:00

1 2 3 4 5 ...

4061 commits