llama.cpp

Author	SHA1	Message	Date
Alex Brooks	72ce3e5833	Merge `262000fa4d` into `d7b31a9d84`	2025-02-10 08:49:44 -07:00
Alex-Brooks	262000fa4d	Add v prefix to vision feature layer log Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	06703820dc	Fix notes rendering Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	17bf6ad304	Update notes for alternative to legacy llm conversion script Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	78f765e8a5	Update comment for vision feature layer init Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	2327897175	Cleanup logs Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	188a068a04	Standardize vision feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	3a191f8edb	Use 10 for max number of patches Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	d85580c41c	Avoid dropping last image encoder layer in llava models Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	65935431b4	fix num gridpoints and use all layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	ab71c9e9c4	Pull vision feature layers out of gguf keys Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	ae291e5405	Fix hardcoded concat for multiple feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	e1ec851121	Increase max flattened gridpoints to 64 Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	987f76840a	Fix linear 2 substitution index Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	7905f9dd40	Fix projector linear substitution Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	61d4ae4699	Make siglip / openclip mutuall exclusive Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	50504063b2	Add transformers llava next tensor name mapping Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	cc1c135367	Clean up llava surgery and remove name substitution hacks Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	92046a103d	Add vision feature layer to gguf params Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	bc66d1931b	remove hardcoded path Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	fd0111c043	Add example for converting mmgranite to gguf Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Alex-Brooks	6ccf234031	Add super wip scripts for multimodal granite gguf Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-10 07:53:37 -07:00
Olivier Chafik	d7b31a9d84	sync: minja (`a72057e519`) (#11774 )	2025-02-10 09:34:09 +00:00
pascal-lc	9ac3457b39	Update README.md [no ci] (#11781 ) typo: `\` -> `/` Change the UNIX path separator to` \`.	2025-02-10 09:05:57 +01:00
Danny Milosavljevic	c2a67efe38	vulkan: Make Vulkan optional at runtime (#11493 ). (#11494 ) Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-02-10 07:17:21 +01:00
Wagner Bruna	b044a0fe3c	vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (#11592 )	2025-02-10 07:08:22 +01:00
Eric Curtin	19d3c8293b	There's a better way of clearing lines (#11756 ) Use the ANSI escape code for clearing a line. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-02-09 10:34:49 +00:00
Jeff Bolz	98f6b0fd1e	vulkan: account for lookup tables when checking shared memory size (#11502 )	2025-02-09 08:43:51 +01:00
Xuan-Son Nguyen	55ac8c7791	server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759 ) * redo Settings modal UI * add python code interpreter * fix auto scroll * build * fix overflow for long output lines * bring back sticky copy button * adapt layout on mobile view * fix multiple lines output and color scheme * handle python exception * better state management * add webworker * add headers * format code * speed up by loading pyodide on page load * (small tweak) add small animation to make it feels like claude	2025-02-08 21:54:50 +01:00
Woof Dog	e6e6583199	server : (webui) increase edit textarea size (#11763 )	2025-02-08 20:09:55 +01:00
Georgi Gerganov	aaa5505307	server : minor log updates (#11760 ) ggml-ci	2025-02-08 18:08:43 +02:00
Georgi Gerganov	bdcf8b6a56	cont : fix mmap flag print (#11699 )	2025-02-08 16:49:38 +02:00
Karol Kontny	4d3465c5ae	ggml: Fix data race in ggml threadpool (#11736 ) After the barrier in last iteration is executed, still the loop termination condition will be executed. However main thread can destroy the cgraph object and its nodes already, then another thread will access it, but the thing is already gone. Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the prior situation is possible. Last syncronization should be done after the loop to ensure the cgraph/cplan won't be accessed after the main thread exits from the function.	2025-02-08 15:30:53 +01:00
Johannes Gäßler	d80be897ac	CUDA: fix min. version for movmatrix (#11751 )	2025-02-08 10:46:07 +01:00
Nikolaos Pothitos	3ab410f55f	readme : update front-end framework (#11753 ) After the migration to React with #11688	2025-02-08 10:43:04 +01:00
Xuan-Son Nguyen	0cf867160c	server : (webui) fix numeric settings being saved as string (#11739 ) * server : (webui) fix numeric settings being saved as string * add some more comments	2025-02-08 10:42:34 +01:00
Eric Curtin	d2fe216fb2	Make logging more verbose (#11714 ) Debugged an issue with a user who was on a read-only filesystem. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-02-07 14:42:46 +00:00
Georgi Gerganov	ed926d8833	llama : fix defrag logic (#11707 ) * llama : fix defrag logic ggml-ci * cont : better logic ggml-ci * cont : clamp fragmentation to 0.0 ggml-ci	2025-02-07 16:05:34 +02:00
Christian Fillion	2d219b389e	vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729 ) Silently insert U+FFFD(s) (Unicode replacement character) instead until the next valid codepoint can be found. This fixes `llama_tokenize` throwing an exception across the C API boundary or libllama's module boundary (the caller's runtime might be incompatible!) Returing a proper error code might be desirable, however the signature of `llama_tokenize` doesn't allow it as all return values already have existing meaning.	2025-02-07 15:55:47 +02:00
magicse	333820d749	llama : fix progress dots (#11730 ) * Update llama.cpp For display progress dots in terminal. Without this it didn't display dots progress during loading model from file. * Update llama.cpp removed trailing spaces	2025-02-07 15:48:47 +02:00
Jeff Bolz	c026ba3c23	vulkan: print shared memory size (#11719 )	2025-02-07 11:26:03 +01:00
Christian Fillion	7ee953a64a	llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727 ) The C API in llama.h claims users can implement `llama_sampler_i` to create custom `llama_sampler`. The sampler chain takes ownership and calls `llama_sampler_free` on them. However, `llama_sampler_free` is hard-coded to use `delete`. This is undefined behavior if the object wasn't also allocated via `new` from libllama's C++ runtime. Callers in C and C-compatible languages do not use C++'s `new` operator. C++ callers may not be sharing the same heap as libllama.	2025-02-07 11:33:27 +02:00
Akarshan Biswas	ec3bc8270b	SYCL: remove XMX info from print devices (#11712 )	2025-02-07 09:27:53 +00:00
Daniel Bevenius	b7552cfcbc	common : add default embeddings presets (#11677 ) * common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: https://github.com/ggerganov/llama.cpp/issues/10932	2025-02-07 09:15:22 +01:00
Jinyang He	225bbbfa39	ggml : optimize and build warning fix for LoongArch (#11709 ) * ggml : optimize convert f32<->f16 for loongarch_asx * ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16 * ggml : Fix warnings when run cpu CI locally on LoongArch	2025-02-07 09:38:31 +02:00
tv1wnd	855cd0734a	llama : fix old glm4 models (#11670 )	2025-02-06 22:48:51 +01:00
Georgi Gerganov	8a59053f63	sync : ggml	2025-02-06 21:23:03 +02:00
Patrick Peng	1d20e53c40	rpc: fix known RCE in rpc-server (ggml/1103) Add bounds checking in `rpc_server::copy_tensor` to prevent out-of-bounds writes + Check if `(uint8_t *)dst->data + ggml_nbytes(src)` remains within the destination buffer’s allocated region.	2025-02-06 21:22:54 +02:00
Xuan-Son Nguyen	2fb3c32a16	server : (webui) migrate project to ReactJS with typescript (#11688 ) * init version * fix auto scroll * bring back copy btn * bring back thought process * add lint and format check on CI * remove lang from html tag * allow multiple generations at the same time * lint and format combined * fix unused var * improve MarkdownDisplay * fix more latex * fix code block cannot be selected while generating	2025-02-06 17:32:29 +01:00
Tei Home	9ab42dc722	docs: update fedora cuda guide for 12.8 release (#11393 ) * docs: update fedora cuda guide for 12.8 release * docs: build cuda update	2025-02-06 12:16:15 +00:00

1 2 3 4 5 ...

4703 commits