llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	7528c705b0	llama : fix uninitialized tensors	2024-05-21 22:02:00 +03:00
Georgi Gerganov	92711138f9	convert : read/write n_head_kv	2024-05-21 19:40:01 +03:00
Georgi Gerganov	e9acbce624	cuda : fix compile warning	2024-05-21 19:08:12 +03:00
Georgi Gerganov	23b72b871c	llama : remove tmp assert	2024-05-21 18:29:12 +03:00
Georgi Gerganov	600896b882	llama : move rope factors from KV header to tensors	2024-05-21 18:26:55 +03:00
Georgi Gerganov	d93b5cad0a	minor : cleanup	2024-05-21 17:51:17 +03:00
Georgi Gerganov	4f787ead14	backends : fix pragma semicolons	2024-05-21 17:51:17 +03:00
Georgi Gerganov	e7c7d8ca42	tests : update to use new rope API	2024-05-21 17:51:17 +03:00
Georgi Gerganov	f4cb482c62	minor : style	2024-05-21 17:51:16 +03:00
Georgi Gerganov	352c3859a7	backends : add dev messages to support rope freq. factors	2024-05-21 17:51:16 +03:00
Georgi Gerganov	471d8170bc	ggml : update ggml_rope_ext API to support freq. factors	2024-05-21 17:51:15 +03:00
Georgi Gerganov	2d473a4a9a	metal : support rope freq_factors	2024-05-21 17:51:01 +03:00
liuwei	8a9c897fd0	add one line of comments	2024-05-21 17:51:01 +03:00
liuwei	d05ae12e93	set to the short freq factor when context size is small than trained context size	2024-05-21 17:51:00 +03:00
liuwei	b1f491a297	fix flint warnings on convert-hf-to-gguf.py	2024-05-21 17:51:00 +03:00
liuwei	5683db3bf7	remove unused rope scaling type 'su' frin gguf converter	2024-05-21 17:51:00 +03:00
liuwei	6333ed1a30	make freq factors only depend on ctx size	2024-05-21 17:51:00 +03:00
liuwei	c5569311a4	add long rope support in ggml cpu backend	2024-05-21 17:51:00 +03:00
liuwei	9f871298b6	adjust index value in cuda long rope freq factors	2024-05-21 17:51:00 +03:00
liuwei	cc19780a55	address build warnings on llama.cpp	2024-05-21 17:51:00 +03:00
Wei Liu	56d9fa72de	add phi3 128k support in cuda	2024-05-21 17:50:58 +03:00
Wei Liu	8fa413d8b5	add phi3 128k support in convert-hf-to-gguf	2024-05-21 17:49:56 +03:00
Amir	11474e756d	examples: cache hf model when --model not provided (#7353 ) * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided	2024-05-21 17:13:12 +03:00
Johannes Gäßler	d8ee902227	CUDA: deduplicate mmq code (#7397 )	2024-05-21 16:02:12 +02:00
jaime-m-p	d7e852c1bc	Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425 ) * Update brute force test: add_special * Update brute force test: default values for add_bos_token and add_eos_token * Enable rtrim when pre-inserting BOS Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Revert "server : fix test regexes"	2024-05-21 14:39:48 +02:00
jaime-m-p	917dc8cfa6	Tokenizer SPM fixes for phi-3 and llama-spm (#7375 ) * Update brute force test: special tokens * Fix added tokens - Try to read 'added_tokens.json'. - Try to read 'tokenizer_config.json'. - Try to read 'tokenizer.json'. * Fix special tokens rtrim Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : fix test regexes	2024-05-20 20:15:57 +02:00
Georgi Gerganov	fabf30b4c4	llama : remove Persimmon (#7408 ) * llama : remove Persimmon * requirements : remove	2024-05-21 02:35:28 +10:00
Johannes Gäßler	20385cebcc	perplexity: update README FP16 results [no ci] (#7413 )	2024-05-20 18:15:38 +02:00
Radoslav Gerganov	db10f01310	rpc : track allocated buffers (#7411 ) * rpc : track allocated buffers ref: #7407 * rpc : pack rpc_tensor tightly	2024-05-20 16:36:55 +03:00
Georgi Gerganov	3bc10cb485	server : fix temperature + disable some tests (#7409 ) * server : fix temperature * server : disable tests relying on parallel determinism * ci : change server Debug -> RelWithDebInfo	2024-05-20 22:10:03 +10:00
AidanBeltonS	6bf9b66fa3	[SYCL] Update SYCL upscale operation (#7321 ) * Update SYCL upscale operation * Formatting * Remove messages	2024-05-20 16:38:23 +05:30
Bingan	26cd4237bc	Update README.md (#7410 )	2024-05-20 11:55:34 +02:00
Herman Semenov	213e90ed73	ggml-opencl, llama: using reserve() if count already known (#7272 )	2024-05-20 10:33:21 +03:00
junchao-loongson	65c58207ec	ggml : add loongarch lsx and lasx support (#6454 ) * add loongarch lsx and lasx optimize code * Add loongarch compilation support to makefile * revert stb_image.h * opt bytes_from_nibbles_32 and sum_i16_pairs_float * fix undeclared * format code * update * update 2 --------- Co-authored-by: Jinyang He <hejinyang@loongson.cn>	2024-05-20 10:19:21 +03:00
Georgi Gerganov	1cc0155d04	server : tuning tests (#7388 ) * server : don't pass temperature as string * server : increase timeout * tests : fix the fix 0.8f -> 0.8 ggml-ci * tests : set explicit temperature	2024-05-20 10:16:41 +03:00
Georgi Gerganov	e932094d58	server : return error on too large embedding input (#7389 )	2024-05-20 08:56:05 +03:00
Georgi Gerganov	2789baf480	tests : fix --keep_split -> --keep-split (#7374 )	2024-05-20 08:55:09 +03:00
Srihari-mcw	33c8d50acc	Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258 )	2024-05-20 12:18:39 +10:00
slaren	d359f30921	llama : remove MPI backend (#7395 )	2024-05-20 01:17:03 +02:00
Fred Douglas	1ea2a0036e	quantize : fix --keep-split check (#7374 )	2024-05-19 19:37:04 +03:00
0cc4m	f030ec1f7a	Vulkan Embedding Fix (#7360 ) * Fix empty Vulkan host buffers Add fp32 fp16 matmul shader Fix matmul shader alignment * Remove deprecated tensor->backend uses * Fix Vulkan validation errors on embedding models with no offloaded layers * Fix Vulkan llava segfault when not offloading layers	2024-05-19 17:19:53 +02:00
slaren	e4e6f67be6	ggml : fix another case of quants nans (#7387 )	2024-05-19 17:08:46 +02:00
Johannes Gäßler	5ca49cbecd	ggml: implement quantized KV cache for FA (#7372 )	2024-05-19 16:46:13 +02:00
Johannes Gäßler	1b01f06db0	server: add test for token probs (#7347 )	2024-05-19 16:26:02 +02:00
Johannes Gäßler	41858392e1	server: fix seed being reported back (#7382 )	2024-05-19 17:06:33 +03:00
Anas Ahouzi	6aade19ee7	Add StableLM2 pre-tokenizer (#7349 ) * Add StableLM pre-tokenizer * Fix space * Fix trailing whitespace	2024-05-19 22:46:46 +10:00
slaren	ab33f7a338	cuda : clear error after buffer allocation failure (#7376 )	2024-05-19 14:19:37 +02:00
Brian	e23b974f4c	labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363 ) https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action Recommends the use of checkout action to use the correct repo context when applying settings for PR labels e.g. steps: - uses: actions/checkout@v4 # Uploads repository content to the runner with: repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more - uses: actions/labeler@v5 with: configuration-path: 'path/to/the/uploaded/configuration/file'	2024-05-19 20:51:03 +10:00
Georgi Gerganov	854d365aba	cmake : update android comments (#7341 )	2024-05-19 11:01:01 +03:00
fraxy-v	f5bf761747	Capture CUDA logging output (#7298 ) * logging: output capture in cuda module * fix compile error * fix: vsnprintf terminates with 0, string use not correct * post review * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-05-19 00:44:42 +02:00

1 2 3 4 5 ...

2978 commits