llama.cpp

Author	SHA1	Message	Date
jianyuzh	533c647d0e	check for sycl blas, better performance	2024-01-23 13:34:05 +08:00
Meng, Hengyu	67e6b3cb7d	align pr4766	2024-01-23 03:32:09 +00:00
luoyu-intel	f008cc7b68	enable SYCL_F16 support	2024-01-23 02:38:44 +00:00
jianyuzh	f396a3b65e	add know issue for pvc hang issue	2024-01-23 02:38:44 +00:00
luoyu-intel	623d8031cb	fix code err	2024-01-23 02:38:44 +00:00
jianyuzh	e3481faa2f	rm original sycl code before refactor	2024-01-23 02:38:44 +00:00
jianyuzh	ae941b1b57	add syc and link for sycl readme	2024-01-23 02:38:44 +00:00
jianyuzh	35a0daaaa1	restore rm code to fix hang issue	2024-01-23 02:38:44 +00:00
luoyu-intel	d5f7d364f6	remove sycl version from include path	2024-01-23 02:38:44 +00:00
luoyu-intel	57e9fbadb2	fix return type	2024-01-23 02:38:44 +00:00
Neo Zhang Jianyu	593ce001e2	Update README_sycl.md	2024-01-23 02:38:44 +00:00
jianyuzh	d80dd65f42	dos2unix	2024-01-23 02:38:44 +00:00
jianyuzh	09b5619df4	rm rear space	2024-01-23 02:38:44 +00:00
jianyuzh	7350fd48ef	add ls-sycl-device, rm unused files	2024-01-23 02:38:44 +00:00
jianyuzh	0d6e7219b6	add ls-sycl-device tool	2024-01-23 02:38:44 +00:00
jianyuzh	79d30d7713	add run script, comment debug code	2024-01-23 02:38:44 +00:00
jianyuzh	a8936f4902	set nthread=1 when sycl, increase performance	2024-01-23 02:38:44 +00:00
jianyuzh	95daece908	fix build with sycl	2024-01-23 02:38:44 +00:00
jianyuzh	ca2cb6982a	update readme, refactor build script	2024-01-23 02:38:44 +00:00
jianyuzh	c3c5b20ac5	mv dpct definition from folder dpct to ggml-sycl.h	2024-01-23 02:38:44 +00:00
jianyuzh	c67c2ab228	refactor device log	2024-01-23 02:38:44 +00:00
jianyuzh	a47f5ec42e	summary dpct definition in one header file to replace folder:dpct	2024-01-23 02:38:44 +00:00
jianyuzh	5b5389941e	fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481	2024-01-23 02:38:44 +00:00
jianyuzh	bd38129aeb	add print tensor function to debug	2024-01-23 02:38:44 +00:00
jianyuzh	3645f25d74	correct queue: rm dtct:get_queue	2024-01-23 02:38:44 +00:00
jianyuzh	fa3a58605b	clear CMAKE to rm unused lib and options	2024-01-23 02:38:44 +00:00
jianyuzh	c709c3cb37	ren ggml-sycl.hpp -> ggml-sycl.h	2024-01-23 02:38:44 +00:00
jianyuzh	69d76c8b58	fix error of select non-zero device, format device list	2024-01-23 02:38:44 +00:00
jianyuzh	c2ef7a9cb9	step 8, rename all macro & func from cuda by sycl	2024-01-23 02:38:42 +00:00
jianyuzh	3b1a743e82	step7 add debug for code path, rm log	2024-01-23 02:15:32 +00:00
jianyuzh	65f895d41b	support main device is non-zero	2024-01-23 02:15:32 +00:00
jianyuzh	3a9d2c54ba	step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue	2024-01-23 02:15:32 +00:00
jianyuzh	6dd32789b4	step 5 format device and print	2024-01-23 02:15:32 +00:00
jianyuzh	da752edaf5	add GGML_LIST_DEVICE function	2024-01-23 02:15:32 +00:00
jianyuzh	43f2c35859	step3 add fp16, slower 31->28	2024-01-23 02:15:32 +00:00
jianyuzh	02dffb68b8	step 2	2024-01-23 02:15:32 +00:00
jianyuzh	ff83711055	step 1	2024-01-23 02:15:32 +00:00
jianyuzh	0c00b4f654	add debug functio, commit all help code	2024-01-23 02:15:32 +00:00
jianyuzh	233876936b	update init_cublas	2024-01-23 02:15:32 +00:00
jianyuzh	7a4343df61	first update for migration	2024-01-23 02:15:32 +00:00
slaren	011e8ec577	llama : fix not enough space in buffer with Qwen (#5086 )	2024-01-22 23:42:41 +01:00
Kawrakow	6f9939d119	KL-divergence (#5076 ) * kl-divergence: be able to save all logits to a file * Add ability to compute KL-divergence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 16:10:14 +02:00
Reinforce-II	780e24a22e	ggml : parallelize FP32 conversion when using BLAS (#5045 ) * make GGML_TASK_INIT phase can be run in multithread * multithreaded dequantize in mul_mat when using blas library * minor fixes * update outdated comment * fix coding style * simplify code Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-22 15:15:08 +02:00
XiaotaoChen	3ce7e8f8e7	llava : MobileVLM support (#4954 ) * MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>	2024-01-22 15:09:35 +02:00
Someone Serge	b2d80e105a	flake.nix: add a comment about flakes vs nix	2024-01-22 12:19:30 +00:00
Someone Serge	28603cd283	nix: add a comment on the many nixpkgs-with-cuda instances	2024-01-22 12:19:30 +00:00
Someone Serge	5e97ec91ae	nix: add a comment about makeScope	2024-01-22 12:19:30 +00:00
Someone Serge	7251870780	nix: refactor the cleanSource rules	2024-01-22 12:19:30 +00:00
Someone Serge	fe8b3c0d4b	workflows: nix-ci: drop the redundant "paths" filter	2024-01-22 12:19:30 +00:00
Someone Serge	f4dd059259	workflows: nix-build-aarch64: rate limit	2024-01-22 12:19:30 +00:00

1 2 3 4 5 ...

1994 commits