Commit graph

1299 commits

Author SHA1 Message Date
Xuan Son Nguyen
0959cc18ee Merge branch 'master' into xsn/vision_2 2025-01-25 12:16:34 +01:00
Eric Curtin
01f37edf1a
Update llama-run README.md (#11386)
For consistency

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-24 09:39:24 +00:00
stduhpf
c07e87f38b
server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364)
* webui : put DeepSeek R1 CoT in a collapsible <details> element

* webui: refactor split

* webui: don't use regex to split cot and response

* webui: format+qol

* webui: no loading icon if the model isn't generating

* ui fix, add configs

* add jsdoc types

* only filter </think> for assistant msg

* build

* update build

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-24 09:02:38 +01:00
Xuan Son Nguyen
b72d7557f3 Merge branch 'master' into xsn/vision_2 2025-01-23 23:07:20 +01:00
Eric Curtin
05f63cc9ee
Update documentation (#11373)
To show -n, -ngl, --ngl is acceptable.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 20:04:31 +00:00
Eric Curtin
f7fb43cd0b
Add -ngl (#11372)
Most other llama.cpp cli tools accept -ngl with a single dash.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 16:16:18 +00:00
Xuan Son Nguyen
5845661640
server : add more clean up when cancel_tasks is called (#11340)
* server : add more clean up when cancel_tasks is called

* fix recv_with_timeout

* std::remove_if

* fix std::remove_if
2025-01-23 13:56:05 +01:00
Xuan Son Nguyen
25a97ce4cb correct positions for siglip 2025-01-23 13:34:13 +01:00
Xuan Son Nguyen
8586d23c8a minicpm working without uhd 2025-01-23 12:14:06 +01:00
Eric Curtin
f211d1dc10
Treat hf.co/ prefix the same as hf:// (#11350)
ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://

Treat them similarly.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 10:38:20 +00:00
Xuan Son Nguyen
c0d93dd509 minicpmv works but missing uhd slices 2025-01-22 22:42:00 +01:00
Diego Devesa
6152129d05
main : update README documentation for batch size (#11353)
* main : update README documentation for batch size

* fix formatting

* minor
2025-01-22 19:22:20 +01:00
Diego Devesa
12c2bdf2de
server : fix draft context not being released (#11354) 2025-01-22 17:44:40 +01:00
Xuan Son Nguyen
9716c7bff7 temporary refactor llama_vision_graph_builder 2025-01-22 14:40:35 +01:00
Xuan Son Nguyen
32daa38333 Merge branch 'master' into xsn/vision_2 2025-01-22 13:28:31 +01:00
Jiří Podivín
96f4053934
Adding logprobs to /v1/completions (#11344)
Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2025-01-22 12:51:32 +01:00
tc-mb
3e3357fd77
llava : support Minicpm-omni (#11289)
* init

* add readme

* update readme

* no use make

* update readme

* update fix code

* fix editorconfig-checker

* no change convert py

* use clip_image_u8_free
2025-01-22 09:35:48 +02:00
Xuan Son Nguyen
ad38e87329 rename everywhere 2025-01-21 15:53:39 +01:00
Olivier Chafik
6171c9d258
Add Jinja template support (#11016)
* Copy minja from 58f0ca6dd7

* Add --jinja and --chat-template-file flags

* Add missing <optional> include

* Avoid print in get_hf_chat_template.py

* No designated initializers yet

* Try and work around msvc++ non-macro max resolution quirk

* Update test_chat_completion.py

* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

* Refactor test-chat-template

* Test templates w/ minja

* Fix deprecation

* Add --jinja to llama-run

* Update common_chat_format_example to use minja template wrapper

* Test chat_template in e2e test

* Update utils.py

* Update test_chat_completion.py

* Update run.cpp

* Update arg.cpp

* Refactor common_chat_* functions to accept minja template + use_jinja option

* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

* Revert LLAMA_CHATML_TEMPLATE refactor

* Normalize newlines in test-chat-templates for windows tests

* Forward decl minja::chat_template to avoid eager json dep

* Flush stdout in chat template before potential crash

* Fix copy elision warning

* Rm unused optional include

* Add missing optional include to server.cpp

* Disable jinja test that has a cryptic windows failure

* minja: fix vigogne (https://github.com/google/minja/pull/22)

* Apply suggestions from code review

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Finish suggested renamings

* Move chat_templates inside server_context + remove mutex

* Update --chat-template-file w/ recent change to --chat-template

* Refactor chat template validation

* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)

* Warn against missing eos / bos tokens when jinja template references them

* rename: common_chat_template[s]

* reinstate assert on chat_templates.template_default

* Update minja to b8437df626

* Update minja to https://github.com/google/minja/pull/25

* Update minja from https://github.com/google/minja/pull/27

* rm unused optional header

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-21 13:18:51 +00:00
Xuan Son Nguyen
e28245f35f
export-lora : fix tok_embd tensor (#11330) 2025-01-21 14:07:12 +01:00
Eric Curtin
2e2f8f093c
linenoise.cpp refactoring (#11301)
More RAII mainly

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-21 09:32:35 +00:00
Georgi Gerganov
80d0d6b4b7
common : add -hfd option for the draft model (#11318)
* common : add -hfd option for the draft model

* cont : fix env var

* cont : more fixes
2025-01-20 22:29:43 +02:00
Georgi Gerganov
9f7add1cde
examples : fix add_special conditions (#11311) 2025-01-20 16:36:08 +02:00
Georgi Gerganov
92bc493917
tests : increase timeout when sanitizers are enabled (#11300)
* tests : increase timeout when sanitizers are enabled

* tests : add DEFAULT_HTTP_TIMEOUT
2025-01-19 20:22:30 +02:00
Georgi Gerganov
b9daaffe02
simple-chat : fix BOS being added to each message (#11278) 2025-01-19 18:12:09 +02:00
Xuan Son Nguyen
6cabdda0df add back convert hf to gguf 2025-01-18 22:56:04 +01:00
Xuan Son Nguyen
0a81051ae2 llama : second attempt to refactor vision API 2025-01-18 20:56:35 +01:00
Eric Curtin
a1649cc13f
Adding linenoise.cpp to llama-run (#11252)
This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:

https://github.com/ericcurtin/linenoise.cpp

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-18 14:42:31 +00:00
Xuan Son Nguyen
f30f099228
server : implement cancellable request (#11285)
* server : implement cancellable request

* fix typo

* httplib 0.18.5

* fix i underflow
2025-01-18 14:12:05 +01:00
LostRuins Concedo
6390a998bf
tts : add guide tokens support (#11186)
* Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences.

* applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start
2025-01-18 12:20:57 +02:00
codezjx
3edfa7d375
llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) 2025-01-17 14:57:56 +02:00
Radoslav Gerganov
667d72846c
rpc : early register backend devices (#11262)
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.

ref: #10609
2025-01-17 10:57:09 +02:00
Georgi Gerganov
f11cfdfd7f
ci : use -no-cnv in gguf-split tests (#11254)
* ci : use -no-cnv in gguf-split tests

ggml-ci

* ci : use -no-cnv in requantize tests

ggml-ci

* scripts : fix [no ci]
2025-01-15 18:28:35 +02:00
Daniel Bevenius
0ccd7f3eb2
examples : add embd_to_audio to tts-outetts.py [no ci] (#11235)
This commit contains a suggestion for adding the missing embd_to_audio
function from tts.cpp to tts-outetts.py. This introduces a depencency
numpy which I was not sure if that is acceptable or not (only PyTorch
was mentioned in referened PR).

Also the README has been updated with instructions to run the example
with llama-server and the python script.

Refs: https://github.com/ggerganov/llama.cpp/pull/10784#issuecomment-2548377734
2025-01-15 05:44:38 +01:00
ebraminio
c5bf0d1bd7
server : Improve code snippets direction between RTL text (#11221) 2025-01-14 11:39:33 +01:00
ebraminio
504af20ee4
server : (UI) Improve messages bubble shape in RTL (#11220)
I simply have overlooked message bubble's tail placement for RTL
text as I use the dark mode and that isn't visible there and this
fixes it.
2025-01-13 20:23:31 +01:00
Xuan Son Nguyen
84a44815f7
cli : auto activate conversation mode if chat template is available (#11214)
* cli : auto activate conversation mode if chat template is detected

* add warn on bad template

* update readme (writing with the help of chatgpt)

* update readme (2)

* do not activate -cnv for non-instruct models
2025-01-13 20:18:12 +01:00
ebraminio
437e05f714
server : (UI) Support for RTL text as models input or output (#11208) 2025-01-13 14:46:39 +01:00
Eric Curtin
924518e2e5
Reset color before we exit (#11205)
We don't want colors to leak post termination of llama-run.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-12 18:23:10 +00:00
Georgi Gerganov
afa8a9ec9b
llama : add llama_vocab, functions -> methods, naming (#11110)
* llama : functions -> methods (#11110)

* llama : add struct llama_vocab to the API (#11156)

ggml-ci

* hparams : move vocab params to llama_vocab (#11159)

ggml-ci

* vocab : more pimpl (#11165)

ggml-ci

* vocab : minor tokenization optimizations (#11160)

ggml-ci

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* lora : update API names (#11167)

ggml-ci

* llama : update API names to use correct prefix (#11174)

* llama : update API names to use correct prefix

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* minor [no ci]

* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174)

ggml-ci

* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174)

ggml-ci

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-12 11:32:42 +02:00
Daniel Bevenius
ba8a1f9c5b
examples : add README.md to tts example [no ci] (#11155)
* examples : add README.md to tts example [no ci]

* squash! examples : add README.md to tts example [no ci]

Fix heading to be consistent with other examples, and add a quickstart
section to README.md.

* squash! examples : add README.md to tts example [no ci]

Fix spelling mistake.
2025-01-10 13:16:16 +01:00
Daniel Bevenius
8eceb888d7
server : add tooltips to settings and themes btn (#11154)
* server : add tooltips to settings and themes btn

This commit adds tooltips to the settings and themes buttons in the
webui. The tooltip will be displayed below the actual buttons when
hovered over.

The motivation for this change is to clarify the purpose of the themes
button.

* squash! server : add tooltips to settings and themes btn

This commit adds a tooltip to the '...' button when a chat has been
started. The tooltip is "Chat options" which think could be a good
description as the dropdown contains options to delete or download the
current chat.

* rm tooltip for 3 dots button

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-09 11:28:29 +01:00
Eric Curtin
1bf839b1e8
Enhance user input handling for llama-run (#11138)
The main motivation for this change is it was not handing
ctrl-c/ctrl-d correctly. Modify `read_user_input` to handle EOF,
"/bye" command, and empty input cases. Introduce `get_user_input`
function to manage user input loop and handle different return
cases.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-08 18:47:05 +00:00
Georgi Gerganov
a3c1232c3f
arg : option to exclude arguments from specific examples (#11136)
* arg : option to exclude arguments from specific examples

ggml-ci

* readme : remove old args [no ci]
2025-01-08 12:55:36 +02:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes (#11030)
* GGUF: C++ refactor, backend support, misc fixes

remove ggml_tensor.backend

update CODEOWNERS [no ci]

remove gguf_get_data from API

revise GGUF API data types
2025-01-07 18:01:58 +01:00
Eric Curtin
dc7cef9f37
llama-run : fix context size (#11094)
Set `n_ctx` equal to `n_batch` in `Opt` class. Now context size is
a more reasonable 2048.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-06 23:45:28 +01:00
Georgi Gerganov
e6e7c75d94
server : fix extra BOS in infill endpoint (#11106)
* server : fix extra BOS in infill endpoing

ggml-ci

* server : update infill tests
2025-01-06 15:36:08 +02:00
Georgi Gerganov
47182dd03f
llama : update llama_model API names (#11063)
* llama : deprecate llama_free_model, add llama_model_free

ggml-ci

* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`

ggml-ci
2025-01-06 10:55:18 +02:00
Georgi Gerganov
3e6e7a6bc2
tokenize : escape the prompt (#11058)
* tokenize : escape the prompt

* tokenize : update help
2025-01-06 10:54:25 +02:00
Georgi Gerganov
727368c60f
llama : use LLAMA_TOKEN_NULL (#11062)
ggml-ci
2025-01-06 10:52:15 +02:00