llama.cpp

Author	SHA1	Message	Date
Molly Sophia	9cad1ca194	rwkv: skip computing output for unused tokens for hybrid models Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	cffd099aad	rwkv7: Add some model type variants Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	1fdc00b255	Add `_set_vocab_rwkv_world` as a common function Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	922ebbe93d	rwkv7: converter script simplification Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	2175aebdb1	Apply code-format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	f6be4dc661	Add support for ARWKV7 Hybrid models Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	e9ba411d3e	WKV7 Vulkan bugfix Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	2187607471	WKV7 Metal Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	3a2a97af28	ggml: metal unary exp & neg There isn't much peformance gain though. Just for more op coverage Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	d564c4b534	Fix metal wkv6 inference Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
zhiyuan li	65307d279f	update tests for 1b6 3b 7b	2025-02-10 13:02:19 +08:00
zhiyuan li	84b4f81ef1	initial support for apple	2025-02-10 13:02:19 +08:00
Molly Sophia	e7794cb274	WKV7 Vulkan & sycl Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	9cd24dd3eb	wkv7 CUDA impl Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	6dcc21e7f5	WIP: Add support for rwkv v7 Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Molly Sophia	5445300758	ggml: Add op l2_norm Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-10 13:02:19 +08:00
Eric Curtin	19d3c8293b	There's a better way of clearing lines (#11756 ) Use the ANSI escape code for clearing a line. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-02-09 10:34:49 +00:00
Jeff Bolz	98f6b0fd1e	vulkan: account for lookup tables when checking shared memory size (#11502 )	2025-02-09 08:43:51 +01:00
Xuan-Son Nguyen	55ac8c7791	server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759 ) * redo Settings modal UI * add python code interpreter * fix auto scroll * build * fix overflow for long output lines * bring back sticky copy button * adapt layout on mobile view * fix multiple lines output and color scheme * handle python exception * better state management * add webworker * add headers * format code * speed up by loading pyodide on page load * (small tweak) add small animation to make it feels like claude	2025-02-08 21:54:50 +01:00
Woof Dog	e6e6583199	server : (webui) increase edit textarea size (#11763 )	2025-02-08 20:09:55 +01:00
Georgi Gerganov	aaa5505307	server : minor log updates (#11760 ) ggml-ci	2025-02-08 18:08:43 +02:00
Georgi Gerganov	bdcf8b6a56	cont : fix mmap flag print (#11699 )	2025-02-08 16:49:38 +02:00
Karol Kontny	4d3465c5ae	ggml: Fix data race in ggml threadpool (#11736 ) After the barrier in last iteration is executed, still the loop termination condition will be executed. However main thread can destroy the cgraph object and its nodes already, then another thread will access it, but the thing is already gone. Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the prior situation is possible. Last syncronization should be done after the loop to ensure the cgraph/cplan won't be accessed after the main thread exits from the function.	2025-02-08 15:30:53 +01:00
Johannes Gäßler	d80be897ac	CUDA: fix min. version for movmatrix (#11751 )	2025-02-08 10:46:07 +01:00
Nikolaos Pothitos	3ab410f55f	readme : update front-end framework (#11753 ) After the migration to React with #11688	2025-02-08 10:43:04 +01:00
Xuan-Son Nguyen	0cf867160c	server : (webui) fix numeric settings being saved as string (#11739 ) * server : (webui) fix numeric settings being saved as string * add some more comments	2025-02-08 10:42:34 +01:00
Eric Curtin	d2fe216fb2	Make logging more verbose (#11714 ) Debugged an issue with a user who was on a read-only filesystem. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-02-07 14:42:46 +00:00
Georgi Gerganov	ed926d8833	llama : fix defrag logic (#11707 ) * llama : fix defrag logic ggml-ci * cont : better logic ggml-ci * cont : clamp fragmentation to 0.0 ggml-ci	2025-02-07 16:05:34 +02:00
Christian Fillion	2d219b389e	vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729 ) Silently insert U+FFFD(s) (Unicode replacement character) instead until the next valid codepoint can be found. This fixes `llama_tokenize` throwing an exception across the C API boundary or libllama's module boundary (the caller's runtime might be incompatible!) Returing a proper error code might be desirable, however the signature of `llama_tokenize` doesn't allow it as all return values already have existing meaning.	2025-02-07 15:55:47 +02:00
magicse	333820d749	llama : fix progress dots (#11730 ) * Update llama.cpp For display progress dots in terminal. Without this it didn't display dots progress during loading model from file. * Update llama.cpp removed trailing spaces	2025-02-07 15:48:47 +02:00
Jeff Bolz	c026ba3c23	vulkan: print shared memory size (#11719 )	2025-02-07 11:26:03 +01:00
Christian Fillion	7ee953a64a	llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727 ) The C API in llama.h claims users can implement `llama_sampler_i` to create custom `llama_sampler`. The sampler chain takes ownership and calls `llama_sampler_free` on them. However, `llama_sampler_free` is hard-coded to use `delete`. This is undefined behavior if the object wasn't also allocated via `new` from libllama's C++ runtime. Callers in C and C-compatible languages do not use C++'s `new` operator. C++ callers may not be sharing the same heap as libllama.	2025-02-07 11:33:27 +02:00
Akarshan Biswas	ec3bc8270b	SYCL: remove XMX info from print devices (#11712 )	2025-02-07 09:27:53 +00:00
Daniel Bevenius	b7552cfcbc	common : add default embeddings presets (#11677 ) * common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: https://github.com/ggerganov/llama.cpp/issues/10932	2025-02-07 09:15:22 +01:00
Jinyang He	225bbbfa39	ggml : optimize and build warning fix for LoongArch (#11709 ) * ggml : optimize convert f32<->f16 for loongarch_asx * ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16 * ggml : Fix warnings when run cpu CI locally on LoongArch	2025-02-07 09:38:31 +02:00
tv1wnd	855cd0734a	llama : fix old glm4 models (#11670 )	2025-02-06 22:48:51 +01:00
Georgi Gerganov	8a59053f63	sync : ggml	2025-02-06 21:23:03 +02:00
Patrick Peng	1d20e53c40	rpc: fix known RCE in rpc-server (ggml/1103) Add bounds checking in `rpc_server::copy_tensor` to prevent out-of-bounds writes + Check if `(uint8_t *)dst->data + ggml_nbytes(src)` remains within the destination buffer’s allocated region.	2025-02-06 21:22:54 +02:00
Xuan-Son Nguyen	2fb3c32a16	server : (webui) migrate project to ReactJS with typescript (#11688 ) * init version * fix auto scroll * bring back copy btn * bring back thought process * add lint and format check on CI * remove lang from html tag * allow multiple generations at the same time * lint and format combined * fix unused var * improve MarkdownDisplay * fix more latex * fix code block cannot be selected while generating	2025-02-06 17:32:29 +01:00
Tei Home	9ab42dc722	docs: update fedora cuda guide for 12.8 release (#11393 ) * docs: update fedora cuda guide for 12.8 release * docs: build cuda update	2025-02-06 12:16:15 +00:00
Akarshan Biswas	194b2e69f8	SYCL: Adjust support condition for norm operators (#11674 ) SYCL does not support non contiguous tensors for norm operations	2025-02-06 11:42:35 +00:00
Georgi Gerganov	9dd7a0390f	llama : add log about loading model tensors (#11699 )	2025-02-06 13:41:37 +02:00
Adrien Gallouët	c0d4843225	build : fix llama.pc (#11658 ) Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>	2025-02-06 13:08:13 +02:00
junchao-zhao	8d4d2be143	ggml : fix LoongArch compile error with 128-bit SIMD (#11701 )	2025-02-06 11:20:00 +02:00
Jeff Bolz	2c6c8df56d	vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521 ) * vulkan: optimize coopmat2 iq2/iq3 callbacks * build: trigger CI on GLSL compute shader changes	2025-02-06 07:15:30 +01:00
Rémy O	8a7e3bf17a	vulkan: initial support for IQ4_XS quantization (#11501 )	2025-02-06 07:09:59 +01:00
Jeff Bolz	1b598b3058	vulkan: use smaller combined allocations to avoid fragmentation (#11551 )	2025-02-06 07:02:18 +01:00
Charles Duffy	902368a06b	metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690 ) Avoids breakage in nix flake build introduced by `b0569130c5`	2025-02-06 09:52:31 +08:00
Matvey Soloviev	c3db0480bb	readme : add link to Autopen under UIs (#11684 ) Autopen (https://github.com/blackhole89/autopen) is a graphical text editor that uses llama.cpp to tokenize the buffer on the fly, score the buffer, visualise token logits and allow you to switch back and forth between different possible completions at any point. It hopefully meets the criteria for inclusion, as the dependency on llama.cpp is stated prominently.	2025-02-06 01:55:25 +01:00
Georgi Gerganov	d774ab3acc	metal : adjust support conditions for norm operators (#11671 ) cont #11659 ggml-ci	2025-02-05 10:57:42 +02:00

1 2 3 4 5 ...

4693 commits