Commit graph

4158 commits

Author SHA1 Message Date
ochafik
adc673c355 agent: add --think "tool", default to local tools endpoint, support --temperature, fix --seed 2024-12-05 21:32:08 +00:00
ochafik
f9b1969097 Update README.md 2024-11-09 19:00:53 +00:00
ochafik
5789f69d2d minja: don't explode upon referencing a field on an array (fixes Hermes tool use template) 2024-11-09 18:57:09 +00:00
ochafik
c059aecd37 agent: memorize, search_memory (sqlite-vec + sqlite-lembed), fetch + docling (pdf -> markdown), sparql for dbpedia and wikidata 2024-11-09 18:25:34 +00:00
ochafik
bc52c0a4f0 agent: add missing tool name in response! 2024-10-31 15:01:17 +00:00
ochafik
479c1520b1 tool-call: fix qwen template test 2024-10-31 14:49:59 +00:00
ochafik
fe967b61a1 Update README.md 2024-10-31 14:37:55 +00:00
ochafik
f5f74751b9 nits 2024-10-31 14:28:52 +00:00
ochafik
c4a8050120 Update README.md 2024-10-31 14:27:40 +00:00
ochafik
9477c54676 tool-call: functionary-small-v3.2 test now green 2024-10-31 14:11:34 +00:00
ochafik
b35aa4ae1c tool-call: add LLAMA_UPDATE_GOLDENS env for test-chat-template 2024-10-31 13:53:33 +00:00
ochafik
c773516d57 tool-call: don't use -fa w/ Mistral-Nemo (hard crashes?) 2024-10-31 13:53:11 +00:00
ochafik
f5b7825595 tool-call: code_interpreter & system + tool call support for all jinja templates! 2024-10-31 13:52:46 +00:00
ochafik
c395d4804f tool-call: behaviour-based detection of template features 2024-10-31 13:45:10 +00:00
ochafik
e8d9d711f6 Update tool_call.feature 2024-10-31 04:50:38 +00:00
ochafik
7d9c90f46b tool-call: nemo tweak (accept raw sql again) 2024-10-31 04:39:40 +00:00
ochafik
542853b34b tool-call: greedy sampling in server tests + tweak prompt 2024-10-31 04:38:22 +00:00
ochafik
be9de3ed8a Update llama-sampling.cpp 2024-10-31 03:58:15 +00:00
ochafik
61655b9cdd Merge remote-tracking branch 'origin/master' into tool-call 2024-10-31 01:45:07 +00:00
Olivier Chafik
e4d5449638 tool-calls: test Qwen2.5-7B-Instruct-Q4_K_M.gguf 2024-10-30 21:40:15 +00:00
Sergio López
61408e7fad
kompute: add backend registry / device interfaces (#10045)
Get in line with the other backends by supporting the newer
backend/device registry interfaces.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-10-30 17:01:52 +01:00
Diego Devesa
b9e02e8184
ggml : fix memory leaks when loading invalid gguf files (#10094)
* ggml : fix gguf string leak when reading kv pairs fails

* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type

* ggml : avoid crashing on failed memory allocations when loading a gguf file
2024-10-30 14:51:21 +01:00
ochafik
5227321dfd tool-call: when slow server tests fail, hint to run python scripts/fetch_server_test_models.py 2024-10-30 12:40:22 +00:00
ochafik
35ac17f3f1 tool-call: fix missing initializer errors 2024-10-30 12:38:34 +00:00
Rich Dougherty
6763f713bb
readme : more lora detail in main example readme (#10064) 2024-10-30 13:22:39 +01:00
Rich Dougherty
79a2bc042d
convert : more detailed convert lora usage docs (#10065) 2024-10-30 13:22:21 +01:00
ochafik
3ebdb2b805 tool-call: support tool_use variant in llama_chat_template_from_model + drop llama_get_chat_template 2024-10-30 10:07:10 +00:00
xctan
fc83a9e584
ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)
* ggml : RISC-V vector gemv for q4_0_8x8

* ggml : Added WIP rvv q4_0_8x8 gemm

* ggml : Added initial implementation of rvv gemm

* ggml : optimize gemm to avoid register spillover

* ggml : Fix GCC rvv load alignment issue

* ggml : Format gemm rvv code

* ggml : Fix a typo in RVV q4_0_8_8 GEMM
2024-10-30 09:00:40 +02:00
Diego Devesa
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026) 2024-10-30 02:01:23 +01:00
Olivier Chafik
92c384a5e8 nits 2024-10-29 17:24:59 +00:00
Olivier Chafik
773ff91b7a tool-call: force printing of lazy grammar trigger tokens to regularize function call parsing 2024-10-29 15:26:51 +00:00
Olivier Chafik
fa4c1119c9 tool-call: use functionary-small-v3.2-Q8_0.gguf in test (Q4_K_M too dumb for function call) 2024-10-29 15:25:37 +00:00
Olivier Chafik
64287a328d tool-call: test Hermes-3-Llama-3.1-8B 2024-10-29 14:52:25 +00:00
Changyeon Kim
8f275a7c45
ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763)
* ggml: Add POOL2D OP for GPU ACC to the Vulkan.

- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* [fix] Correct the incorrect order of the parameters.

fix casting to int.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

---------

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
2024-10-29 09:52:56 +01:00
Georgi Gerganov
8d8ff71536
llama : remove Tail-Free sampling (#10071)
ggml-ci
2024-10-29 10:42:05 +02:00
ochafik
aefac1e5cb tool-call: update scripts/fetch_server_test_models.py 2024-10-28 23:57:23 +00:00
ochafik
b825440c81 tool-call: use Q4_K_M models 2024-10-28 23:56:40 +00:00
ochafik
74d71a673e agent: simplify syntax (default tools to local w/ default port) 2024-10-28 23:54:01 +00:00
ochafik
b51c71c734 tool-call: remove duplicate script to fetch templates 2024-10-28 21:35:18 +00:00
arch-btw
61715d5cc8
llama : Add IBM granite template (#10013)
* Add granite template to llama.cpp

* Add granite template to test-chat-template.cpp

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Update tests/test-chat-template.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Added proper template and expected output

* Small change to \n

Small change to \n

* Add code space &

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Fix spacing

* Apply suggestions from code review

* Update src/llama.cpp

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-10-28 18:45:33 +01:00
Georgi Gerganov
07028f9d74
flake.lock: Update (#10063)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/4c2fcb090b1f3e5b47eaa7bd33913b574a11e0a0?narHash=sha256-/uilDXvCIEs3C9l73JTACm4quuHUsIHcns1c%2BcHUJwA%3D' (2024-10-18)
  → 'github:NixOS/nixpkgs/2768c7d042a37de65bb1b5b3268fc987e534c49d?narHash=sha256-AlcmCXJZPIlO5dmFzV3V2XF6x/OpNWUV8Y/FMPGd8Z4%3D' (2024-10-23)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-10-28 08:41:24 -07:00
ochafik
ec547e4137 tool-call: add tests: tool_call=none, parallel_tool_calls=true 2024-10-28 10:04:00 +00:00
R0CKSTAR
524afeec9d
musa: workaround for Guilty Lockup in cleaning src0 (#10042)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-28 10:02:48 +01:00
Georgi Gerganov
8125e6cbfc
server : don't overfill the batch during infill (#10018)
ggml-ci
2024-10-28 08:49:32 +02:00
ochafik
168add7ec8 Update tool_call.feature 2024-10-28 02:06:00 +00:00
ochafik
dd6d0241a7 tool-call: script to prefetch models used in server tests 2024-10-28 02:01:00 +00:00
ochafik
7fde6d0091 tool_call: test no tool call on a real model + rename scenarios 2024-10-28 02:00:09 +00:00
ochafik
c88095e3fc space nits 2024-10-28 00:27:04 +00:00
ochafik
9a86ea79a2 tool-call: slow tool call integration tests 2024-10-28 00:26:40 +00:00
Georgi Gerganov
8841ce3f43
llama : switch KQ multiplication to F32 precision by default (#10015)
ggml-ci
2024-10-27 20:59:58 +02:00