llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	5a1c98e8d2	fft	2024-12-18 14:02:23 +02:00
Georgi Gerganov	e728cfd297	compute hann window	2024-12-18 14:02:23 +02:00
Georgi Gerganov	a1f08ad338	fix n_embd + remove llama.cpp hacks	2024-12-18 14:02:23 +02:00
Georgi Gerganov	eb1b70f42a	hann window	2024-12-18 14:02:23 +02:00
Georgi Gerganov	839035d1bb	head	2024-12-18 14:02:22 +02:00
Georgi Gerganov	fe6dd5aa61	convnext	2024-12-18 14:02:22 +02:00
Georgi Gerganov	b3ba05e5bc	layer norm	2024-12-18 14:02:22 +02:00
Georgi Gerganov	435cfd788b	pos net	2024-12-18 14:02:22 +02:00
Georgi Gerganov	3046fde420	attn	2024-12-18 14:02:22 +02:00
Georgi Gerganov	13dd8941a4	resnet	2024-12-18 14:02:22 +02:00
Georgi Gerganov	3d08d62b6c	resnet conv	2024-12-18 14:02:21 +02:00
Georgi Gerganov	5296c96ca8	group norm	2024-12-18 14:02:21 +02:00
Georgi Gerganov	6ef14091c0	first conv	2024-12-18 14:02:21 +02:00
Georgi Gerganov	aac7e04953	extract features	2024-12-18 14:02:21 +02:00
Georgi Gerganov	ff2ea75fb4	wip	2024-12-18 14:02:21 +02:00
Georgi Gerganov	f169965158	llama : add OuteTTS support (wip)	2024-12-18 14:02:20 +02:00
Georgi Gerganov	e65556f174	server : do not normalize embeddings when there is no pooling ggml-ci	2024-12-18 14:02:05 +02:00
Georgi Gerganov	1b18b2d7b0	server : be explicit about the pooling type in the tests ggml-ci	2024-12-18 14:01:22 +02:00
Georgi Gerganov	06e85401b0	server : output embeddings for all tokens when pooling = none ggml-ci	2024-12-18 14:00:50 +02:00
Georgi Gerganov	89eaf5036a	server : add "tokens" output ggml-ci	2024-12-18 13:59:47 +02:00
Georgi Gerganov	152610eda9	server : output embeddings for all tokens when pooling = none (#10861 ) * server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : update readme [no ci] * server : fix spacing [no ci] Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * server : be explicit about the pooling type in the tests ggml-ci * server : update /embeddings and /v1/embeddings endpoints ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * server : update readme ggml-ci * server : fixes * tests : update server tests ggml-ci * server : update readme [no ci] * server : remove rebase artifact --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-12-18 13:01:41 +02:00
Georgi Gerganov	0e70ba686e	server : add "tokens" output (#10853 ) * server : add "tokens" output ggml-ci * server : update readme ggml-ci * server : return tokens ids only if requested ggml-ci * tests : improve "tokens" type check Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * server : remove "tokens" from the OAI endpoint ggml-ci --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-12-18 11:05:29 +02:00
Xuan Son Nguyen	46828872c3	server : (embeddings) using same format for "input" and "content" (#10872 ) * server : (embeddings) using same format for "input" and "content" * fix test case * handle empty input case * fix test	2024-12-18 10:55:09 +02:00
redbeard	6b064c92b4	docs: Fix HIP (née hipBLAS) in README (#10880 ) Related to #10524 / `be0e350c` references to hipBLAS have been removed across the repository. This fixes the link from the repositories `README.md`. Signed-off-by: Brian 'redbeard' Harrington <redbeard@dead-city.org>	2024-12-18 10:35:00 +02:00
Diego Devesa	4da69d1abd	Revert "llama : add Falcon3 support (#10864 )" (#10876 ) This reverts commit `382bc7f2e8`.	2024-12-18 01:36:46 +01:00
DAN™	d62b532c52	Use model->gguf_kv for loading the template instead of using the C API. (#10868 ) * Bump model_template to 16384 bytes to support larger chat templates. * Use `model->gguf_kv` for efficiency.	2024-12-17 23:24:22 +01:00
Johannes Gäßler	081b29bd2a	tests: add tests for GGUF (#10830 )	2024-12-17 19:09:35 +01:00
Georgi Gerganov	5437d4aaf5	sync : ggml	2024-12-17 18:36:02 +02:00
Georgi Gerganov	78f766768d	cmake : fix "amd64" processor string (whisper/2638)	2024-12-17 18:35:49 +02:00
gn64	8dd19a4812	vulkan : fix soft_max.comp division by zero (whisper/2633) This change prevents a division by zero error when p.KY is 0.	2024-12-17 18:35:49 +02:00
Daniel Bevenius	130d0c90bd	ggml : remove return from ggml_gallocr_allocate_node (ggml/1048) This commit removes the return statement from ggml_gallocr_allocate_node function. The motivation behind this change is to make the code more readable and consistent.	2024-12-17 18:35:49 +02:00
Daniel Bevenius	3919da8e33	ggml : add check for grad_accs (ggml/1046) * ggml : add check for grad_accs This commit adds a check for grad_accs in ggml_graph_get_grad and ggml_graph_get_grad_acc functions. This is necessary to avoid segfaults when grad_accs is not initialized. The motivation for this change is that I find it nice to be able to print out a computation graph using ggml_graph_print but this function segfaults when grad_accs is not initialized: ```console (gdb) p g1 $2 = (ggml_cgraph ) 0x7ffff66004b0 (gdb) p g1 $3 = {size = 2048, n_nodes = 1, n_leafs = 2, nodes = 0x7ffff6600500, grads = 0x0, grad_accs = 0x0, leafs = 0x7ffff6604500, visited_hash_set = {size = 4099, used = 0x7ffff6610518, keys = 0x7ffff6608500}, order = GGML_CGRAPH_EVAL_ORDER_LEFT_TO_RIGHT} (gdb) p ggml_graph_print(g1) === GRAPH === n_nodes = 1 Program received signal SIGSEGV, Segmentation fault. 0x0000555555579775 in ggml_graph_get_grad (cgraph=0x7ffff66004b0,node=0x7ffff6600340) at /ggml/ggml/src/ggml.c:5990 5990 return igrad != GGML_HASHSET_FULL && ggml_bitset_get(cgraph->visited_hash_set.used, igrad) ? cgraph->grads[igrad] : NULL; ``` * squash! ggml : add check for grad_accs Fix the check in ggml_graph_get_grad. The check was incorrectly using cgraph->grad_accs instead of cgraph->grads.	2024-12-17 18:35:48 +02:00
Georgi Gerganov	0006f5a74a	ggml : update ggml_backend_cpu_device_supports_op (#10867 ) * ggml : fix cpy op for IQ-quants to use reference impl ggml-ci * ggml : disable tests involving i-matrix quantization * ggml : update ggml_backend_cpu_device_supports_op ggml-ci	2024-12-17 18:35:42 +02:00
krystiancha	05c3a444b8	server : fill usage info in embeddings and rerank responses (#10852 ) * server : fill usage info in embeddings response * server : fill usage info in reranking response	2024-12-17 18:00:24 +02:00
Billel Mokeddem	382bc7f2e8	llama : add Falcon3 support (#10864 )	2024-12-17 17:24:56 +02:00
Ruan	4f51968aca	readme : update typos (#10863 )	2024-12-17 11:47:20 +02:00
Xuan Son Nguyen	227d7c5a7f	server : (UI) fix missing async generator on safari (#10857 ) * server : (UI) fix missing async generator on safari * fix	2024-12-17 09:52:09 +01:00
Eve	7b1ec53f56	vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809 ) * ensure mul mat shaders work on systems with subgroup size less than 32 more fixes add test * only s_warptile_mmq needs to be run with 32 threads or more	2024-12-17 06:52:55 +01:00
Zhiyuan Li	160bc039c8	rwkv6: add wkv6 support for Vulkan backend (#10829 ) * rwkv_wkv6 vulkan shader * RWKV_WKV6 Vulkan op tests passed Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Apply code format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * add [[unroll]] and remove unnecessary conditions * add uma support * fix erros in EditorConfig Checker --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Molly Sophia <mollysophia379@gmail.com>	2024-12-16 22:00:46 +01:00
Georgi Gerganov	08ea539df2	unicode : improve naming style (#10838 ) * unicode : improve naming style ggml-ci * cont [no ci]	2024-12-16 12:31:45 +02:00
Georgi Gerganov	644fd71b44	sampling : refactor + optimize penalties sampler (#10803 ) * sampling : refactor + optimize penalties sampler ggml-ci * common : apply ignore_eos as logit bias ggml-ci * batched : remove penalties sampler * params : allow penalty_last_n == -1 to be equal to context size ggml-ci * common : by default, move the penalties at the end of the sampling chain ggml-ci * common : ignore all EOG tokens Co-authored-by: Diego Devesa <slarengh@gmail.com> * common : move back the penalties at the front of the sampling chain ggml-ci * readme : restore hint about --ignore-eos flag [no ci] * llama : minor ggml-ci * webui : update --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-12-16 12:31:14 +02:00
Bartowski	4ddd199f6f	llava : Allow locally downloaded models for QwenVL (#10833 ) * Allow locally downloaded models for QwenVL * Define model_path * rm trailing space --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-12-15 21:43:25 +01:00
Valentin Mamedov	a0974156f3	llama : add Deepseek MoE v1 & GigaChat models (#10827 ) * Add deepseek v1 arch & gigachat template * improve template code * add readme * delete comments * remove comment * fix format * lint llama.cpp * fix order of deepseek and deepseek2, move gigachat temlate to the end of func * fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need * remove comments * move deepseek above deepseek2 * change placement of gigachat chat template	2024-12-15 19:02:46 +02:00
Georgi Gerganov	87cf323cef	scripts : change build path to "build-bench" for compare-commits.sh (#10836 )	2024-12-15 18:44:47 +02:00
Vinesh Janarthanan	5478bbcd17	server: (UI) add syntax highlighting and latex math rendering (#10808 ) * add code highlighting and math formatting * code cleanup * build public/index.html * rebuild public/index.html * fixed coding style * fixed coding style * style fixes * highlight: smaller bundle size, fix light & dark theme * remove katex * add bundle size check * add more languages * add php * reuse some langs * use gzip * Revert "remove katex" This reverts commit `c0e5046acc`. * use better maintained @vscode/markdown-it-katex * fix gzip non deterministic * ability to add a demo conversation for dev * fix latex rendering * add comment * latex codeblock as code --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-15 12:55:54 +01:00
Georgi Gerganov	b5ae1ddff9	gguf-py : bump to v0.13.0	2024-12-15 13:16:42 +02:00
Michelle Tan	89d604f2c8	server: Fix `has_next_line` in JSON response (#10818 ) * Update server JSON response. * Add unit test to check `has_new_line` JSON response * Remove `has_new_line` unit test changes. * Address code review comment: type check for `has_new_line` in unit test	2024-12-14 23:29:45 +01:00
Evgeny Kurnevsky	e52aba537a	nix: allow to override rocm gpu targets (#10794 ) This allows to reduce compile time when you are building for a single GPU.	2024-12-14 10:17:36 -08:00
HimariO	ba1cb19cdd	llama : add Qwen2VL support + multimodal RoPE (#10361 ) * Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend__supports_op` of unsupported backends remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-14 14:43:46 +02:00
cduk	56eea0781c	Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default (#10771 ) Signed-off-by: Charles Darke <s.cduk@toodevious.com> Co-authored-by: Charles Darke <s.cduk@toodevious.com>	2024-12-13 23:21:49 +01:00

1 2 3 4 5 ...

4375 commits