Commit graph

4703 commits

Author SHA1 Message Date
Alex Brooks
72ce3e5833
Merge 262000fa4d into d7b31a9d84 2025-02-10 08:49:44 -07:00
Alex-Brooks
262000fa4d Add v prefix to vision feature layer log
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
06703820dc Fix notes rendering
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
17bf6ad304 Update notes for alternative to legacy llm conversion script
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
78f765e8a5 Update comment for vision feature layer init
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
2327897175 Cleanup logs
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
188a068a04 Standardize vision feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
3a191f8edb Use 10 for max number of patches
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
d85580c41c Avoid dropping last image encoder layer in llava models
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
65935431b4 fix num gridpoints and use all layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
ab71c9e9c4 Pull vision feature layers out of gguf keys
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
ae291e5405 Fix hardcoded concat for multiple feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
e1ec851121 Increase max flattened gridpoints to 64
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
987f76840a Fix linear 2 substitution index
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
7905f9dd40 Fix projector linear substitution
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
61d4ae4699 Make siglip / openclip mutuall exclusive
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
50504063b2 Add transformers llava next tensor name mapping
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
cc1c135367 Clean up llava surgery and remove name substitution hacks
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
92046a103d Add vision feature layer to gguf params
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
bc66d1931b remove hardcoded path
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
fd0111c043 Add example for converting mmgranite to gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Alex-Brooks
6ccf234031 Add super wip scripts for multimodal granite gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-10 07:53:37 -07:00
Olivier Chafik
d7b31a9d84
sync: minja (a72057e519) (#11774) 2025-02-10 09:34:09 +00:00
pascal-lc
9ac3457b39
Update README.md [no ci] (#11781)
typo: `\` -> `/`
Change the UNIX path separator to` \`.
2025-02-10 09:05:57 +01:00
Danny Milosavljevic
c2a67efe38
vulkan: Make Vulkan optional at runtime (#11493). (#11494)
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-02-10 07:17:21 +01:00
Wagner Bruna
b044a0fe3c
vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (#11592) 2025-02-10 07:08:22 +01:00
Eric Curtin
19d3c8293b
There's a better way of clearing lines (#11756)
Use the ANSI escape code for clearing a line.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-02-09 10:34:49 +00:00
Jeff Bolz
98f6b0fd1e
vulkan: account for lookup tables when checking shared memory size (#11502) 2025-02-09 08:43:51 +01:00
Xuan-Son Nguyen
55ac8c7791
server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759)
* redo Settings modal UI

* add python code interpreter

* fix auto scroll

* build

* fix overflow for long output lines

* bring back sticky copy button

* adapt layout on mobile view

* fix multiple lines output and color scheme

* handle python exception

* better state management

* add webworker

* add headers

* format code

* speed up by loading pyodide on page load

* (small tweak) add small animation to make it feels like claude
2025-02-08 21:54:50 +01:00
Woof Dog
e6e6583199
server : (webui) increase edit textarea size (#11763) 2025-02-08 20:09:55 +01:00
Georgi Gerganov
aaa5505307
server : minor log updates (#11760)
ggml-ci
2025-02-08 18:08:43 +02:00
Georgi Gerganov
bdcf8b6a56
cont : fix mmap flag print (#11699) 2025-02-08 16:49:38 +02:00
Karol Kontny
4d3465c5ae
ggml: Fix data race in ggml threadpool (#11736)
After the barrier in last iteration is executed, still the loop termination
condition will be executed. However main thread can destroy the cgraph object
and its nodes already, then another thread will access it, but the thing is already gone.
Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the
prior situation is possible.

Last syncronization should be done after the loop to ensure the cgraph/cplan won't be
accessed after the main thread exits from the function.
2025-02-08 15:30:53 +01:00
Johannes Gäßler
d80be897ac
CUDA: fix min. version for movmatrix (#11751) 2025-02-08 10:46:07 +01:00
Nikolaos Pothitos
3ab410f55f
readme : update front-end framework (#11753)
After the migration to React with #11688
2025-02-08 10:43:04 +01:00
Xuan-Son Nguyen
0cf867160c
server : (webui) fix numeric settings being saved as string (#11739)
* server : (webui) fix numeric settings being saved as string

* add some more comments
2025-02-08 10:42:34 +01:00
Eric Curtin
d2fe216fb2
Make logging more verbose (#11714)
Debugged an issue with a user who was on a read-only filesystem.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-02-07 14:42:46 +00:00
Georgi Gerganov
ed926d8833
llama : fix defrag logic (#11707)
* llama : fix defrag logic

ggml-ci

* cont : better logic

ggml-ci

* cont : clamp fragmentation to 0.0

ggml-ci
2025-02-07 16:05:34 +02:00
Christian Fillion
2d219b389e
vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729)
Silently insert U+FFFD(s) (Unicode replacement character) instead until the
next valid codepoint can be found.

This fixes `llama_tokenize` throwing an exception across the C API boundary
or libllama's module boundary (the caller's runtime might be incompatible!)

Returing a proper error code might be desirable, however the signature
of `llama_tokenize` doesn't allow it as all return values already have
existing meaning.
2025-02-07 15:55:47 +02:00
magicse
333820d749
llama : fix progress dots (#11730)
* Update llama.cpp

For display progress dots in terminal.
Without this it didn't display dots progress during loading model from file.

* Update llama.cpp

removed trailing spaces
2025-02-07 15:48:47 +02:00
Jeff Bolz
c026ba3c23
vulkan: print shared memory size (#11719) 2025-02-07 11:26:03 +01:00
Christian Fillion
7ee953a64a
llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727)
The C API in llama.h claims users can implement `llama_sampler_i` to
create custom `llama_sampler`. The sampler chain takes ownership and
calls `llama_sampler_free` on them. However, `llama_sampler_free` is
hard-coded to use `delete`. This is undefined behavior if the object
wasn't also allocated via `new` from libllama's C++ runtime. Callers
in C and C-compatible languages do not use C++'s `new` operator. C++
callers may not be sharing the same heap as libllama.
2025-02-07 11:33:27 +02:00
Akarshan Biswas
ec3bc8270b
SYCL: remove XMX info from print devices (#11712) 2025-02-07 09:27:53 +00:00
Daniel Bevenius
b7552cfcbc
common : add default embeddings presets (#11677)
* common : add default embeddings presets

This commit adds default embeddings presets for the following models:
- bge-small-en-v1.5
- e5-small-v2
- gte-small

These can be used with llama-embedding and llama-server.

For example, with llama-embedding:
```console
./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?"
```

And with llama-server:
```console
./build/bin/llama-server --embd-gte-small-default
```
And the embeddings endpoint can then be called with a POST request:
```console
curl --request POST \
    --url http://localhost:8080/embeddings \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello, how are you?"}'
```

I'm not sure if these are the most common embedding models but hopefully
this can be a good starting point for discussion and further
improvements.

Refs: https://github.com/ggerganov/llama.cpp/issues/10932
2025-02-07 09:15:22 +01:00
Jinyang He
225bbbfa39
ggml : optimize and build warning fix for LoongArch (#11709)
* ggml : optimize convert f32<->f16 for loongarch_asx

* ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16

* ggml : Fix warnings when run cpu CI locally on LoongArch
2025-02-07 09:38:31 +02:00
tv1wnd
855cd0734a
llama : fix old glm4 models (#11670) 2025-02-06 22:48:51 +01:00
Georgi Gerganov
8a59053f63
sync : ggml 2025-02-06 21:23:03 +02:00
Patrick Peng
1d20e53c40
rpc: fix known RCE in rpc-server (ggml/1103)
Add bounds checking in `rpc_server::copy_tensor` to prevent out-of-bounds writes
+ Check if  `(uint8_t *)dst->data + ggml_nbytes(src)` remains within the destination buffer’s allocated region.
2025-02-06 21:22:54 +02:00
Xuan-Son Nguyen
2fb3c32a16
server : (webui) migrate project to ReactJS with typescript (#11688)
* init version

* fix auto scroll

* bring back copy btn

* bring back thought process

* add lint and format check on CI

* remove lang from html tag

* allow multiple generations at the same time

* lint and format combined

* fix unused var

* improve MarkdownDisplay

* fix more latex

* fix code block cannot be selected while generating
2025-02-06 17:32:29 +01:00
Tei Home
9ab42dc722
docs: update fedora cuda guide for 12.8 release (#11393)
* docs: update fedora cuda guide for 12.8 release

* docs: build cuda update
2025-02-06 12:16:15 +00:00