llama.cpp

Author	SHA1	Message	Date
luoyu-intel	d5f7d364f6	remove sycl version from include path	2024-01-23 02:38:44 +00:00
luoyu-intel	57e9fbadb2	fix return type	2024-01-23 02:38:44 +00:00
Neo Zhang Jianyu	593ce001e2	Update README_sycl.md	2024-01-23 02:38:44 +00:00
jianyuzh	d80dd65f42	dos2unix	2024-01-23 02:38:44 +00:00
jianyuzh	09b5619df4	rm rear space	2024-01-23 02:38:44 +00:00
jianyuzh	7350fd48ef	add ls-sycl-device, rm unused files	2024-01-23 02:38:44 +00:00
jianyuzh	0d6e7219b6	add ls-sycl-device tool	2024-01-23 02:38:44 +00:00
jianyuzh	79d30d7713	add run script, comment debug code	2024-01-23 02:38:44 +00:00
jianyuzh	a8936f4902	set nthread=1 when sycl, increase performance	2024-01-23 02:38:44 +00:00
jianyuzh	95daece908	fix build with sycl	2024-01-23 02:38:44 +00:00
jianyuzh	ca2cb6982a	update readme, refactor build script	2024-01-23 02:38:44 +00:00
jianyuzh	c3c5b20ac5	mv dpct definition from folder dpct to ggml-sycl.h	2024-01-23 02:38:44 +00:00
jianyuzh	c67c2ab228	refactor device log	2024-01-23 02:38:44 +00:00
jianyuzh	a47f5ec42e	summary dpct definition in one header file to replace folder:dpct	2024-01-23 02:38:44 +00:00
jianyuzh	5b5389941e	fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481	2024-01-23 02:38:44 +00:00
jianyuzh	bd38129aeb	add print tensor function to debug	2024-01-23 02:38:44 +00:00
jianyuzh	3645f25d74	correct queue: rm dtct:get_queue	2024-01-23 02:38:44 +00:00
jianyuzh	fa3a58605b	clear CMAKE to rm unused lib and options	2024-01-23 02:38:44 +00:00
jianyuzh	c709c3cb37	ren ggml-sycl.hpp -> ggml-sycl.h	2024-01-23 02:38:44 +00:00
jianyuzh	69d76c8b58	fix error of select non-zero device, format device list	2024-01-23 02:38:44 +00:00
jianyuzh	c2ef7a9cb9	step 8, rename all macro & func from cuda by sycl	2024-01-23 02:38:42 +00:00
jianyuzh	3b1a743e82	step7 add debug for code path, rm log	2024-01-23 02:15:32 +00:00
jianyuzh	65f895d41b	support main device is non-zero	2024-01-23 02:15:32 +00:00
jianyuzh	3a9d2c54ba	step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue	2024-01-23 02:15:32 +00:00
jianyuzh	6dd32789b4	step 5 format device and print	2024-01-23 02:15:32 +00:00
jianyuzh	da752edaf5	add GGML_LIST_DEVICE function	2024-01-23 02:15:32 +00:00
jianyuzh	43f2c35859	step3 add fp16, slower 31->28	2024-01-23 02:15:32 +00:00
jianyuzh	02dffb68b8	step 2	2024-01-23 02:15:32 +00:00
jianyuzh	ff83711055	step 1	2024-01-23 02:15:32 +00:00
jianyuzh	0c00b4f654	add debug functio, commit all help code	2024-01-23 02:15:32 +00:00
jianyuzh	233876936b	update init_cublas	2024-01-23 02:15:32 +00:00
jianyuzh	7a4343df61	first update for migration	2024-01-23 02:15:32 +00:00
slaren	011e8ec577	llama : fix not enough space in buffer with Qwen (#5086 )	2024-01-22 23:42:41 +01:00
Kawrakow	6f9939d119	KL-divergence (#5076 ) * kl-divergence: be able to save all logits to a file * Add ability to compute KL-divergence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 16:10:14 +02:00
Reinforce-II	780e24a22e	ggml : parallelize FP32 conversion when using BLAS (#5045 ) * make GGML_TASK_INIT phase can be run in multithread * multithreaded dequantize in mul_mat when using blas library * minor fixes * update outdated comment * fix coding style * simplify code Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-22 15:15:08 +02:00
XiaotaoChen	3ce7e8f8e7	llava : MobileVLM support (#4954 ) * MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>	2024-01-22 15:09:35 +02:00
Someone Serge	b2d80e105a	flake.nix: add a comment about flakes vs nix	2024-01-22 12:19:30 +00:00
Someone Serge	28603cd283	nix: add a comment on the many nixpkgs-with-cuda instances	2024-01-22 12:19:30 +00:00
Someone Serge	5e97ec91ae	nix: add a comment about makeScope	2024-01-22 12:19:30 +00:00
Someone Serge	7251870780	nix: refactor the cleanSource rules	2024-01-22 12:19:30 +00:00
Someone Serge	fe8b3c0d4b	workflows: nix-ci: drop the redundant "paths" filter	2024-01-22 12:19:30 +00:00
Someone Serge	f4dd059259	workflows: nix-build-aarch64: rate limit	2024-01-22 12:19:30 +00:00
Someone Serge	f7276f7500	workflows: nix-ci: rebuild on flake.lock updates	2024-01-22 12:19:30 +00:00
Kawrakow	15bceec2d7	imatrix : keep intermediate imatrix results (#5077 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 14:18:43 +02:00
compilade	d6bd4d46dd	llama : support StableLM 2 1.6B (#5052 ) * llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.	2024-01-22 13:21:52 +02:00
Daniel Bevenius	152d9d05e0	finetune : print sample-start/include-sample-start (#5072 ) This commit adds `--sample-start` and `--include-sample-start` to the output from the main function in finetune.cpp. The motivation for this is that even though these are set explicitly by the user via the command line, if one forgets to set them then it is useful to have their values printed out. Otherwise it is possible to go through the whole training process before realizing that the values are not what one expected. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-01-22 13:11:01 +02:00
Kawrakow	66d575c45c	llama : add Q3_K_XS (#5060 ) * Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S * Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K Together with an importance matrix, this brings perplexity for LLaMA-v2-70B below the perplexity of the former Q2_K with a 800 MB smaller quantized model size. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 12:43:33 +02:00
bobqianic	57744932c6	ci : fix Windows CI by updating Intel SDE version (#5053 )	2024-01-22 10:55:05 +02:00
Shijie	3466c6ebcf	llama : add more qwen2 models (#5071 )	2024-01-22 09:33:19 +02:00
iSma	504dc37be8	Revert LLAMA_NATIVE to OFF in flake.nix (#5066 )	2024-01-21 21:37:13 +00:00

1 2 3 4 5 ...

1986 commits