llama.cpp

Author	SHA1	Message	Date
k.h.lai	fcda1128bc	vulkan: add workaround for iterator boundary check to fix clang-cl debug build (#7426 )	2024-05-22 14:53:21 +02:00
Justine Tunney	03d8900ebe	llama : add missing model type names (#7445 )	2024-05-22 14:08:18 +03:00
Georgi Gerganov	9b3d833189	cuda : fix compile warning (#7454 )	2024-05-22 12:36:37 +03:00
Johannes Gäßler	95fb0aefab	CUDA: remove incorrect precision check (#7454 )	2024-05-22 10:24:29 +02:00
Georgi Gerganov	3e5faa8503	cuda : fix rope + add tests (#7452 ) * cuda : fix rope pos data ggml-ci * ggml : drop mode & 1 == 1 support for ggml_rope ggml-ci * ggml : support freq_factors for f16 rope (CPU) ggml-ci * tests : add rope tests using frequency factors ggml-ci	2024-05-22 11:01:35 +03:00
teleprint-me	12285b5325	chore: Map model file and vocab types	2024-05-22 02:58:12 -04:00
teleprint-me	0b43e14030	refactor: Add experimental mapping for BPE pre-tokenizers	2024-05-21 22:45:45 -04:00
teleprint-me	34e14ae96d	refactor: Add experimental model mappings	2024-05-21 19:11:51 -04:00
liuwei-git	201cc11afa	llama : add phi3 128K model support (#7225 ) * add phi3 128k support in convert-hf-to-gguf * add phi3 128k support in cuda * address build warnings on llama.cpp * adjust index value in cuda long rope freq factors * add long rope support in ggml cpu backend * make freq factors only depend on ctx size * remove unused rope scaling type 'su' frin gguf converter * fix flint warnings on convert-hf-to-gguf.py * set to the short freq factor when context size is small than trained context size * add one line of comments * metal : support rope freq_factors * ggml : update ggml_rope_ext API to support freq. factors * backends : add dev messages to support rope freq. factors * minor : style * tests : update to use new rope API * backends : fix pragma semicolons * minor : cleanup * llama : move rope factors from KV header to tensors * llama : remove tmp assert * cuda : fix compile warning * convert : read/write n_head_kv * llama : fix uninitialized tensors --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-21 23:28:32 +03:00
teleprint-me	b2aac685d5	docs: Fix comment	2024-05-21 16:07:12 -04:00
teleprint-me	83b9fcd3e4	refactor: Rename constants to reduce confusion between references	2024-05-21 16:06:39 -04:00
Georgi Gerganov	6369bf0433	metal : handle F16 inf values, fix FA partial offload (#7434 ) ggml-ci	2024-05-21 23:03:42 +03:00
Olivier Chafik	e402de364b	`grammars`: fix resampling logic regression (#7424 )	2024-05-21 20:40:00 +01:00
Johannes Gäßler	fcf6538ba6	CUDA: fix unused warning in mmq.cu (#7442 )	2024-05-21 20:27:12 +03:00
Georgi Gerganov	c3f8d58356	tests : test-tokenizer-0.sh print more info (#7402 )	2024-05-21 19:53:48 +03:00
Amir	11474e756d	examples: cache hf model when --model not provided (#7353 ) * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided	2024-05-21 17:13:12 +03:00
Johannes Gäßler	d8ee902227	CUDA: deduplicate mmq code (#7397 )	2024-05-21 16:02:12 +02:00
jaime-m-p	d7e852c1bc	Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425 ) * Update brute force test: add_special * Update brute force test: default values for add_bos_token and add_eos_token * Enable rtrim when pre-inserting BOS Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Revert "server : fix test regexes"	2024-05-21 14:39:48 +02:00
teleprint-me	2fe28ad4d3	chore: Rename from repo to model repo and reorder for improved readability	2024-05-21 01:41:35 -04:00
teleprint-me	4768650aff	chore: Add formatting, set common vocab files, apply pattern to model map	2024-05-21 01:38:29 -04:00
teleprint-me	fb32f50834	feat: Add hf model mapping descriptors for each repo	2024-05-21 01:07:13 -04:00
teleprint-me	a3bdac091c	chore: Remove unused enum import reference	2024-05-21 00:46:31 -04:00
teleprint-me	6296206392	chore: Apply deduped token type references	2024-05-21 00:45:06 -04:00
teleprint-me	a35b76755f	Merge branch 'master' into auto-model-support	2024-05-21 00:16:34 -04:00
teleprint-me	aed0573f68	proto: Add experimental vocab pre-tokenizer regular expressions	2024-05-21 00:14:26 -04:00
teleprint-me	12537fdabc	chore: Add tokenizer constants for model metadata	2024-05-21 00:13:49 -04:00
teleprint-me	5978bb007d	chore: Fix and update comments	2024-05-20 14:59:40 -04:00
teleprint-me	2fa2c7a86c	chore: Move enums and model map to constants	2024-05-20 14:51:03 -04:00
teleprint-me	d9ba963cd4	refactor: Restructure tokenizer model metadata	2024-05-20 14:42:59 -04:00
jaime-m-p	917dc8cfa6	Tokenizer SPM fixes for phi-3 and llama-spm (#7375 ) * Update brute force test: special tokens * Fix added tokens - Try to read 'added_tokens.json'. - Try to read 'tokenizer_config.json'. - Try to read 'tokenizer.json'. * Fix special tokens rtrim Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : fix test regexes	2024-05-20 20:15:57 +02:00
teleprint-me	18bb36e496	chore: Allow the user to config the logger	2024-05-20 14:06:21 -04:00
Georgi Gerganov	fabf30b4c4	llama : remove Persimmon (#7408 ) * llama : remove Persimmon * requirements : remove	2024-05-21 02:35:28 +10:00
Johannes Gäßler	20385cebcc	perplexity: update README FP16 results [no ci] (#7413 )	2024-05-20 18:15:38 +02:00
Radoslav Gerganov	db10f01310	rpc : track allocated buffers (#7411 ) * rpc : track allocated buffers ref: #7407 * rpc : pack rpc_tensor tightly	2024-05-20 16:36:55 +03:00
Georgi Gerganov	3bc10cb485	server : fix temperature + disable some tests (#7409 ) * server : fix temperature * server : disable tests relying on parallel determinism * ci : change server Debug -> RelWithDebInfo	2024-05-20 22:10:03 +10:00
AidanBeltonS	6bf9b66fa3	[SYCL] Update SYCL upscale operation (#7321 ) * Update SYCL upscale operation * Formatting * Remove messages	2024-05-20 16:38:23 +05:30
Bingan	26cd4237bc	Update README.md (#7410 )	2024-05-20 11:55:34 +02:00
Herman Semenov	213e90ed73	ggml-opencl, llama: using reserve() if count already known (#7272 )	2024-05-20 10:33:21 +03:00
junchao-loongson	65c58207ec	ggml : add loongarch lsx and lasx support (#6454 ) * add loongarch lsx and lasx optimize code * Add loongarch compilation support to makefile * revert stb_image.h * opt bytes_from_nibbles_32 and sum_i16_pairs_float * fix undeclared * format code * update * update 2 --------- Co-authored-by: Jinyang He <hejinyang@loongson.cn>	2024-05-20 10:19:21 +03:00
Georgi Gerganov	1cc0155d04	server : tuning tests (#7388 ) * server : don't pass temperature as string * server : increase timeout * tests : fix the fix 0.8f -> 0.8 ggml-ci * tests : set explicit temperature	2024-05-20 10:16:41 +03:00
Georgi Gerganov	e932094d58	server : return error on too large embedding input (#7389 )	2024-05-20 08:56:05 +03:00
Georgi Gerganov	2789baf480	tests : fix --keep_split -> --keep-split (#7374 )	2024-05-20 08:55:09 +03:00
teleprint-me	bdd0286bd0	refactor: Use proper names for referenced member variables	2024-05-20 01:39:09 -04:00
teleprint-me	a1951e27dc	refactor: Add proper names for remote model references	2024-05-20 01:36:44 -04:00
teleprint-me	6fc4492b3f	chore: Add english pangram to vocab tests	2024-05-20 00:51:35 -04:00
teleprint-me	381dad5eb3	fix: Add missing model architectures	2024-05-20 00:50:42 -04:00
teleprint-me	9a2834e24e	fix: Use __name__ as logger name	2024-05-19 22:39:30 -04:00
teleprint-me	a0362ea475	patch: Fix nested quotes for dict refs	2024-05-19 22:39:05 -04:00
teleprint-me	89a46fe818	feat: Attempt to mirror the llama.cpp API for compatibility	2024-05-19 22:31:05 -04:00
teleprint-me	c6f2a48af7	feat: Add prototype for identifying the vocab type	2024-05-19 22:30:37 -04:00

1 2 3 4 5 ...

3068 commits