Eric Curtin
05f63cc9ee
Update documentation ( #11373 )
...
To show -n, -ngl, --ngl is acceptable.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 20:04:31 +00:00
Eric Curtin
f7fb43cd0b
Add -ngl ( #11372 )
...
Most other llama.cpp cli tools accept -ngl with a single dash.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 16:16:18 +00:00
Xuan Son Nguyen
5845661640
server : add more clean up when cancel_tasks is called ( #11340 )
...
* server : add more clean up when cancel_tasks is called
* fix recv_with_timeout
* std::remove_if
* fix std::remove_if
2025-01-23 13:56:05 +01:00
Eric Curtin
f211d1dc10
Treat hf.co/ prefix the same as hf:// ( #11350 )
...
ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://
Treat them similarly.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 10:38:20 +00:00
amd-dwang
955a6c2d91
Vulkan-run-test: fix mmq_wg_denoms ( #11343 )
...
There should be a copy-and-paste error here.
*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.
2025-01-23 08:14:28 +01:00
Jeff Bolz
1971adf55e
vulkan: sort shaders for more deterministic binary ( #11315 )
...
Fixes #11306 .
2025-01-23 08:07:50 +01:00
Jeff Bolz
5245729e33
vulkan: fix diag_mask_inf ( #11323 )
...
With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline.
2025-01-23 08:01:17 +01:00
Olivier Chafik
46415d7a51
Fix lazy trigger handling
2025-01-22 19:08:19 +00:00
Olivier Chafik
c2d836f9d0
Update real tool call tests (use less models)
2025-01-22 18:47:32 +00:00
Olivier Chafik
a46de6a03a
Add grammar options + rename builder to common_grammar_builder
2025-01-22 18:36:04 +00:00
Olivier Chafik
cdfa8b9d4f
Update chat-template.hpp
2025-01-22 18:35:24 +00:00
Olivier Chafik
5e358ade59
fix msg init warning
2025-01-22 18:35:20 +00:00
Diego Devesa
6152129d05
main : update README documentation for batch size ( #11353 )
...
* main : update README documentation for batch size
* fix formatting
* minor
2025-01-22 19:22:20 +01:00
Georgi Gerganov
16d3df7ab0
readme : add plugin links ( #11355 )
2025-01-22 19:44:26 +02:00
Diego Devesa
12c2bdf2de
server : fix draft context not being released ( #11354 )
2025-01-22 17:44:40 +01:00
Olivier Chafik
f0231a586e
fix common_chat_msg invocations
2025-01-22 16:25:51 +00:00
Olivier Chafik
d186721e41
Merge remote-tracking branch 'origin/master' into tool-call
2025-01-22 16:22:16 +00:00
Olivier Chafik
c64d2becb1
minja
: sync at 0f5f7f2b37
( #11352 )
2025-01-22 16:16:27 +00:00
Olivier Chafik
9ccc62b3c9
Sync minja after https://github.com/google/minja/pull/29
2025-01-22 14:32:18 +00:00
Jiří Podivín
96f4053934
Adding logprobs to /v1/completions ( #11344 )
...
Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2025-01-22 12:51:32 +01:00
Olivier Chafik
30d33d9f68
Update test_chat_completion.py
2025-01-22 11:42:36 +00:00
Olivier Chafik
c6a22edc57
Greedy sampling in tool call tests
2025-01-22 11:41:43 +00:00
Olivier Chafik
cce1166b37
Update tool-call.cpp
2025-01-22 11:25:26 +00:00
Olivier Chafik
a4226365bf
nits
2025-01-22 11:23:37 +00:00
Olivier Chafik
63387c6dca
smaller diff
2025-01-22 11:14:25 +00:00
Olivier Chafik
82b6e9a5c3
merge common_tool_calls into common_chat_msg
2025-01-22 11:05:05 +00:00
Olivier Chafik
01b345be0f
Merge remote-tracking branch 'origin/master' into tool-call
2025-01-22 10:02:23 +00:00
Olivier Chafik
a94f3b2727
common
: utils to split / join / repeat strings (from json converter) (#11342 )
...
* Factor string_join, string_split, string_repeat into common
* json: refactor to surface a versatile builder
* Update common.cpp
2025-01-22 09:51:44 +00:00
tc-mb
3e3357fd77
llava : support Minicpm-omni ( #11289 )
...
* init
* add readme
* update readme
* no use make
* update readme
* update fix code
* fix editorconfig-checker
* no change convert py
* use clip_image_u8_free
2025-01-22 09:35:48 +02:00
Olivier Chafik
2dd09c792f
more cleanups
2025-01-22 03:20:47 +00:00
Olivier Chafik
28cac497a6
drop llama_sampler_accept_str
2025-01-22 02:38:04 +00:00
Olivier Chafik
e211629b89
Merge branch 'string_utils' into tool-call
2025-01-22 02:27:10 +00:00
Olivier Chafik
5140d7a00b
Update common.cpp
2025-01-22 02:25:09 +00:00
Olivier Chafik
41a613bbd3
Merge branch 'string_utils' into tool-call
2025-01-22 02:22:20 +00:00
Olivier Chafik
03fe80f1bb
drop unused fs_list_files
2025-01-22 02:22:03 +00:00
Olivier Chafik
4de5cf8a10
json: refactor to surface a versatile builder
2025-01-22 02:19:23 +00:00
Olivier Chafik
9a5acbb4a3
Factor string_join, string_split, string_repeat into common
2025-01-22 02:17:34 +00:00
Olivier Chafik
9e8b43f993
follow enum naming style for tool call styles
2025-01-22 02:13:02 +00:00
Olivier Chafik
5268ec8947
Refactor string helpers into common
2025-01-22 02:08:18 +00:00
Olivier Chafik
d77fecc3dc
shrink diff in json conversion code
2025-01-22 01:54:17 +00:00
Olivier Chafik
3972945798
common_tool_call rename
2025-01-22 01:54:08 +00:00
Olivier Chafik
ef61a4c79e
minimize diffs
2025-01-22 01:46:51 +00:00
Olivier Chafik
dbf841b0d2
Push laziness down to grammar impl
2025-01-22 01:25:54 +00:00
Olivier Chafik
77f4098c83
Delete update_jinja_goldens.py
2025-01-21 14:41:59 +00:00
Olivier Chafik
f6e73dac43
Remove examples/agent (moved to https://gist.github.com/ochafik/9246d289b7d38d49e1ee2755698d6c79 )
2025-01-21 14:41:56 +00:00
Olivier Chafik
b49d0521e9
rm tests/test-minja from makefile
2025-01-21 14:12:38 +00:00
Olivier Chafik
fec0260366
Merge remote-tracking branch 'origin/master' into tool-call
2025-01-21 13:44:58 +00:00
Olivier Chafik
6171c9d258
Add Jinja template support ( #11016 )
...
* Copy minja from 58f0ca6dd7
* Add --jinja and --chat-template-file flags
* Add missing <optional> include
* Avoid print in get_hf_chat_template.py
* No designated initializers yet
* Try and work around msvc++ non-macro max resolution quirk
* Update test_chat_completion.py
* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
* Refactor test-chat-template
* Test templates w/ minja
* Fix deprecation
* Add --jinja to llama-run
* Update common_chat_format_example to use minja template wrapper
* Test chat_template in e2e test
* Update utils.py
* Update test_chat_completion.py
* Update run.cpp
* Update arg.cpp
* Refactor common_chat_* functions to accept minja template + use_jinja option
* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
* Revert LLAMA_CHATML_TEMPLATE refactor
* Normalize newlines in test-chat-templates for windows tests
* Forward decl minja::chat_template to avoid eager json dep
* Flush stdout in chat template before potential crash
* Fix copy elision warning
* Rm unused optional include
* Add missing optional include to server.cpp
* Disable jinja test that has a cryptic windows failure
* minja: fix vigogne (https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626
* Update minja to https://github.com/google/minja/pull/25
* Update minja from https://github.com/google/minja/pull/27
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-21 13:18:51 +00:00
Xuan Son Nguyen
e28245f35f
export-lora : fix tok_embd tensor ( #11330 )
2025-01-21 14:07:12 +01:00
Radoslav Gerganov
6da5bec81c
rpc : better caching of the base buffer pointer ( #11331 )
...
There is no need to use map, just store the base pointer in the buffer
context.
2025-01-21 15:06:41 +02:00