llama.cpp

Author	SHA1	Message	Date
Xuan Son Nguyen	7791845e2c	Merge branch 'master' into xsn/webui_pyodide	2025-02-08 20:15:07 +01:00
Woof Dog	e6e6583199	server : (webui) increase edit textarea size (#11763 )	2025-02-08 20:09:55 +01:00
Xuan Son Nguyen	85da9172b6	(small tweak) add small animation to make it feels like claude	2025-02-08 20:01:57 +01:00
Xuan Son Nguyen	475b2906ba	speed up by loading pyodide on page load	2025-02-08 19:39:26 +01:00
Xuan Son Nguyen	84fe6c4e93	format code	2025-02-08 18:02:32 +01:00
Xuan Son Nguyen	69fa94af58	add headers	2025-02-08 17:56:30 +01:00
Xuan Son Nguyen	8e092c4a15	add webworker	2025-02-08 17:54:54 +01:00
Georgi Gerganov	aaa5505307	server : minor log updates (#11760 ) ggml-ci	2025-02-08 18:08:43 +02:00
Georgi Gerganov	bdcf8b6a56	cont : fix mmap flag print (#11699 )	2025-02-08 16:49:38 +02:00
Karol Kontny	4d3465c5ae	ggml: Fix data race in ggml threadpool (#11736 ) After the barrier in last iteration is executed, still the loop termination condition will be executed. However main thread can destroy the cgraph object and its nodes already, then another thread will access it, but the thing is already gone. Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the prior situation is possible. Last syncronization should be done after the loop to ensure the cgraph/cplan won't be accessed after the main thread exits from the function.	2025-02-08 15:30:53 +01:00
Xuan Son Nguyen	84919d2fbf	better state management	2025-02-08 15:25:39 +01:00
Xuan Son Nguyen	e1f03c4009	handle python exception	2025-02-08 15:20:24 +01:00
Xuan Son Nguyen	22e826336a	fix multiple lines output and color scheme	2025-02-08 15:16:26 +01:00
Xuan Son Nguyen	19a95daf78	adapt layout on mobile view	2025-02-08 14:58:30 +01:00
Xuan Son Nguyen	6f1fcbcc0f	bring back sticky copy button	2025-02-08 14:53:51 +01:00
Xuan Son Nguyen	fbf2853f54	fix overflow for long output lines	2025-02-08 14:41:20 +01:00
Xuan Son Nguyen	be22b41fe3	build	2025-02-08 13:12:38 +01:00
Xuan Son Nguyen	115f75c5b1	fix auto scroll	2025-02-08 13:06:46 +01:00
Xuan Son Nguyen	483a3bc2ad	add python code interpreter	2025-02-08 13:06:46 +01:00
Xuan Son Nguyen	422e53e607	redo Settings modal UI	2025-02-08 13:06:46 +01:00
Johannes Gäßler	d80be897ac	CUDA: fix min. version for movmatrix (#11751 )	2025-02-08 10:46:07 +01:00
Nikolaos Pothitos	3ab410f55f	readme : update front-end framework (#11753 ) After the migration to React with #11688	2025-02-08 10:43:04 +01:00
Xuan-Son Nguyen	0cf867160c	server : (webui) fix numeric settings being saved as string (#11739 ) * server : (webui) fix numeric settings being saved as string * add some more comments	2025-02-08 10:42:34 +01:00
Eric Curtin	d2fe216fb2	Make logging more verbose (#11714 ) Debugged an issue with a user who was on a read-only filesystem. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-02-07 14:42:46 +00:00
Georgi Gerganov	ed926d8833	llama : fix defrag logic (#11707 ) * llama : fix defrag logic ggml-ci * cont : better logic ggml-ci * cont : clamp fragmentation to 0.0 ggml-ci	2025-02-07 16:05:34 +02:00
Christian Fillion	2d219b389e	vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729 ) Silently insert U+FFFD(s) (Unicode replacement character) instead until the next valid codepoint can be found. This fixes `llama_tokenize` throwing an exception across the C API boundary or libllama's module boundary (the caller's runtime might be incompatible!) Returing a proper error code might be desirable, however the signature of `llama_tokenize` doesn't allow it as all return values already have existing meaning.	2025-02-07 15:55:47 +02:00
magicse	333820d749	llama : fix progress dots (#11730 ) * Update llama.cpp For display progress dots in terminal. Without this it didn't display dots progress during loading model from file. * Update llama.cpp removed trailing spaces	2025-02-07 15:48:47 +02:00
Jeff Bolz	c026ba3c23	vulkan: print shared memory size (#11719 )	2025-02-07 11:26:03 +01:00
Christian Fillion	7ee953a64a	llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727 ) The C API in llama.h claims users can implement `llama_sampler_i` to create custom `llama_sampler`. The sampler chain takes ownership and calls `llama_sampler_free` on them. However, `llama_sampler_free` is hard-coded to use `delete`. This is undefined behavior if the object wasn't also allocated via `new` from libllama's C++ runtime. Callers in C and C-compatible languages do not use C++'s `new` operator. C++ callers may not be sharing the same heap as libllama.	2025-02-07 11:33:27 +02:00
Akarshan Biswas	ec3bc8270b	SYCL: remove XMX info from print devices (#11712 )	2025-02-07 09:27:53 +00:00
Daniel Bevenius	b7552cfcbc	common : add default embeddings presets (#11677 ) * common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: https://github.com/ggerganov/llama.cpp/issues/10932	2025-02-07 09:15:22 +01:00
Jinyang He	225bbbfa39	ggml : optimize and build warning fix for LoongArch (#11709 ) * ggml : optimize convert f32<->f16 for loongarch_asx * ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16 * ggml : Fix warnings when run cpu CI locally on LoongArch	2025-02-07 09:38:31 +02:00
tv1wnd	855cd0734a	llama : fix old glm4 models (#11670 )	2025-02-06 22:48:51 +01:00
Georgi Gerganov	8a59053f63	sync : ggml	2025-02-06 21:23:03 +02:00
Patrick Peng	1d20e53c40	rpc: fix known RCE in rpc-server (ggml/1103) Add bounds checking in `rpc_server::copy_tensor` to prevent out-of-bounds writes + Check if `(uint8_t *)dst->data + ggml_nbytes(src)` remains within the destination buffer’s allocated region.	2025-02-06 21:22:54 +02:00
Xuan-Son Nguyen	2fb3c32a16	server : (webui) migrate project to ReactJS with typescript (#11688 ) * init version * fix auto scroll * bring back copy btn * bring back thought process * add lint and format check on CI * remove lang from html tag * allow multiple generations at the same time * lint and format combined * fix unused var * improve MarkdownDisplay * fix more latex * fix code block cannot be selected while generating	2025-02-06 17:32:29 +01:00
Tei Home	9ab42dc722	docs: update fedora cuda guide for 12.8 release (#11393 ) * docs: update fedora cuda guide for 12.8 release * docs: build cuda update	2025-02-06 12:16:15 +00:00
Akarshan Biswas	194b2e69f8	SYCL: Adjust support condition for norm operators (#11674 ) SYCL does not support non contiguous tensors for norm operations	2025-02-06 11:42:35 +00:00
Georgi Gerganov	9dd7a0390f	llama : add log about loading model tensors (#11699 )	2025-02-06 13:41:37 +02:00
Adrien Gallouët	c0d4843225	build : fix llama.pc (#11658 ) Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>	2025-02-06 13:08:13 +02:00
junchao-zhao	8d4d2be143	ggml : fix LoongArch compile error with 128-bit SIMD (#11701 )	2025-02-06 11:20:00 +02:00
Jeff Bolz	2c6c8df56d	vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521 ) * vulkan: optimize coopmat2 iq2/iq3 callbacks * build: trigger CI on GLSL compute shader changes	2025-02-06 07:15:30 +01:00
Rémy O	8a7e3bf17a	vulkan: initial support for IQ4_XS quantization (#11501 )	2025-02-06 07:09:59 +01:00
Jeff Bolz	1b598b3058	vulkan: use smaller combined allocations to avoid fragmentation (#11551 )	2025-02-06 07:02:18 +01:00
Charles Duffy	902368a06b	metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690 ) Avoids breakage in nix flake build introduced by `b0569130c5`	2025-02-06 09:52:31 +08:00
Matvey Soloviev	c3db0480bb	readme : add link to Autopen under UIs (#11684 ) Autopen (https://github.com/blackhole89/autopen) is a graphical text editor that uses llama.cpp to tokenize the buffer on the fly, score the buffer, visualise token logits and allow you to switch back and forth between different possible completions at any point. It hopefully meets the criteria for inclusion, as the dependency on llama.cpp is stated prominently.	2025-02-06 01:55:25 +01:00
Georgi Gerganov	d774ab3acc	metal : adjust support conditions for norm operators (#11671 ) cont #11659 ggml-ci	2025-02-05 10:57:42 +02:00
Johannes Gäßler	fa62da9b2d	CUDA: support for mat. mul. with ne03 != ne13 (#11656 )	2025-02-05 08:58:31 +01:00
SAMI	1ec208083c	llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644 ) * Added quantization for visual projector * Added README * Fixed the clip quantize implementation in the file * Fixed the gcc warning regarding minor linting * Removed trailing whitespace	2025-02-05 10:45:40 +03:00
Olivier Chafik	9f4cc8f8d3	`sync`: minja (#11641 ) * `sync`: minja `182de30cda` https://github.com/google/minja/pull/46 https://github.com/google/minja/pull/45	2025-02-05 01:00:12 +00:00

1 2 3 4 5 ...

4690 commits