Commit graph

  • 657b4cdb22 Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) Olivier Chafik 2025-01-31 00:44:31 +00:00
  • 8971dd8645 Fix --jinja when there's no tools or schema (typo was forcing JSON) Olivier Chafik 2025-01-30 23:15:36 +00:00
  • 37b572e196 fixed hardcode qk=128 bug zhycheng614 2025-01-30 22:55:50 +00:00
  • 23649e5416 WIP zhycheng614 2025-01-30 22:03:13 +00:00
  • 553f1e46e9
    ci: ccache for all github worfklows (#11516) b4600 Olivier Chafik 2025-01-30 22:01:06 +00:00
  • 5615e2dc03 common: Add missing va_end Steve Grubb 2025-01-30 16:59:19 -05:00
  • 4aba26547a vulkan: initial support for IQ1_S and IQ1_M quantizations Rémy O 2025-01-30 04:28:49 +01:00
  • 8b576b6c55
    Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639) b4599 Olivier Chafik 2025-01-30 19:13:58 +00:00
  • d59d939f9d improve doc string for LLAMA_LLGUIDANCE Michal Moskal 2025-01-30 11:06:36 -08:00
  • 59da9696fd simplify #includes Michal Moskal 2025-01-30 11:01:57 -08:00
  • d0491ce8f6 on containers, install ccache after apt-get (+ dedupe existing ccache steps) Olivier Chafik 2025-01-30 18:51:47 +00:00
  • 7634ce2918 build: trigger CI on GLSL compute shader changes Jeff Bolz 2025-01-30 11:54:30 -06:00
  • 34f54dd114 Fix typo Olivier Chafik 2025-01-30 17:53:10 +00:00
  • b373f8c05b Reinstate cache keys specific to each job Olivier Chafik 2025-01-30 17:51:18 +00:00
  • 729d2d3666 Disable chat_completion tests of non-tool jinja mode Olivier Chafik 2025-01-30 17:43:57 +00:00
  • e4628c0643 revert change Olivier Chafik 2025-01-30 17:29:17 +00:00
  • 6d4762b374 Revert "docker: ccache" Olivier Chafik 2025-01-30 17:25:49 +00:00
  • eafee682ce vulkan: optimize coopmat2 iq2/iq3 callbacks Jeff Bolz 2025-01-30 11:20:42 -06:00
  • 3d415e19fc single cache to rule them all, 1d eviction Olivier Chafik 2025-01-30 17:16:36 +00:00
  • deef03d1fa bump ccache-action version to get evict-old-files support Olivier Chafik 2025-01-30 17:03:49 +00:00
  • 7cfadd7610 vulkan: Avoid using too much host-visible vidmem, which can lead to fragmentation Jeff Bolz 2025-01-29 08:44:18 -06:00
  • 8cc4937e69 use evict-old-files: job Olivier Chafik 2025-01-30 16:53:04 +00:00
  • 9779bf3ae0 Om 🍉 omroy12 2025-01-30 22:00:20 +05:30
  • 19d8922c43 docker: ccache Olivier Chafik 2025-01-30 16:24:56 +00:00
  • 425e6a138a ci: ccache Olivier Chafik 2025-01-30 16:24:46 +00:00
  • 614fd079da Merge remote-tracking branch 'origin/master' into cuda-releases Olivier Chafik 2025-01-30 16:10:54 +00:00
  • 3bd6abebd2 try and avoid weird server test failure (spillage / parallelism between completion & tool call tests?) Olivier Chafik 2025-01-30 15:40:25 +00:00
  • 27d135c970 HIP: require at least HIP 5.5 b4598 uvos 2025-01-29 19:36:00 +01:00
  • 6af1ca48cb HIP: Prepare reduction operators for wave 64 uvos 2025-01-29 19:12:42 +01:00
  • c300e68ef4 CUDA/HIP: add warp_size to cuda_device_info uvos 2025-01-29 17:46:23 +01:00
  • 5f13c244b3
    Update examples/llava/clip.cpp Xuan-Son Nguyen 2025-01-30 16:20:46 +01:00
  • 6108d4c78f
    Apply suggestions from code review Xuan-Son Nguyen 2025-01-30 16:19:53 +01:00
  • 0536d004cb
    Apply suggestions from code review Xuan-Son Nguyen 2025-01-30 16:17:16 +01:00
  • 1029ff9028 force printing </tool_call> on hermes 2 model if/as it's a special token Olivier Chafik 2025-01-30 15:13:26 +00:00
  • f538bf5b15 fix bug in minicpm-v code caitianchi 2025-01-30 22:47:30 +08:00
  • a40ba49fa6
    Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-01-30 16:39:58 +02:00
  • 5add261ae8 test: leave model_hf_file blank Xuan Son Nguyen 2025-01-30 15:35:38 +01:00
  • 82052466d6 log prompt + nits Olivier Chafik 2025-01-30 14:29:16 +00:00
  • f223df0271 Format test-chat.cpp Olivier Chafik 2025-01-30 14:09:54 +00:00
  • 5a64af6c70 add llama_sampler_init_grammar_lazy instead of renaming the non-lazy Olivier Chafik 2025-01-30 14:02:37 +00:00
  • 7ab6685724 server : update help metrics processing/deferred Daniel Bevenius 2025-01-30 13:48:58 +01:00
  • 7d59bf44ed deprecate llama_sampler_init_grammar -> llama_sampler_grammar_init Olivier Chafik 2025-01-30 12:49:56 +00:00
  • 2bb3fed337 nit: fix py import Olivier Chafik 2025-01-30 12:42:34 +00:00
  • 9685043274 Update scripts/fetch_server_test_models.py to new compact hf_repo syntax + switch Hermes models Olivier Chafik 2025-01-30 12:05:07 +00:00
  • 0c171f5463 Update test_chat_completion.py Olivier Chafik 2025-01-30 11:56:10 +00:00
  • 0d3ad163f0
    Merge branch 'ggerganov:master' into support_glm_edge_model piDack 2025-01-30 19:53:57 +08:00
  • 119d3bf986 Add environmental variable GGML_KLEIDIAI_SME Charles Xu 2025-01-30 12:50:08 +01:00
  • 06c4ca56c7 Update test_chat_completion.py Olivier Chafik 2025-01-30 11:49:16 +00:00
  • 3dcde9ea83 Fix debug + verbose Olivier Chafik 2025-01-30 11:49:13 +00:00
  • c88f4a798d simplify handle_apply_template Xuan Son Nguyen 2025-01-30 12:00:54 +01:00
  • 2d51c459c6 code style changes on test Xuan Son Nguyen 2025-01-30 11:52:31 +01:00
  • 8ef37a3c07 Merge remote-tracking branch 'origin/master' into tool-call Olivier Chafik 2025-01-30 10:50:02 +00:00
  • 3d804dec76
    sync: minja (#11499) b4595 Olivier Chafik 2025-01-30 10:30:27 +00:00
  • ffd0821c57
    vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496) b4594 mgroeber9110 2025-01-30 11:10:59 +01:00
  • 4314e56c4f
    server : use lambda instead of std::bind (#11507) Daniel Bevenius 2025-01-30 11:05:00 +01:00
  • 496e5bf46b
    server : (docs) added response format for /apply-template [no ci] (#11503) Isaac McFadyen 2025-01-30 04:11:53 -05:00
  • a4ee5ca8f5 implemented compilation time q4_0 group size variants - for cpu zhycheng614 2025-01-30 07:04:12 +00:00
  • 7919256c57
    readme : reference examples relative links (#11505) Guspan Tanadi 2025-01-30 12:58:02 +07:00
  • a8240b5860 server : use lambda instead of std::bind Daniel Bevenius 2025-01-30 06:14:10 +01:00
  • 9591af1fc5 increase http timeout to 12 ochafik 2025-01-30 04:50:59 +00:00
  • 7635912f73 llama 3.2 1b now fails the weather tool call? ochafik 2025-01-30 04:49:52 +00:00
  • b831a6e0d3 rm unused llama_param ochafik 2025-01-30 04:49:02 +00:00
  • e0449763a4
    server : update json snippets in README.md [no ci] (#11492) Daniel Bevenius 2025-01-30 05:48:14 +01:00
  • 18450e690f debug logs are back ochafik 2025-01-30 04:34:14 +00:00
  • 81547e6f9b nits ochafik 2025-01-30 04:20:06 +00:00
  • f8e14bffc3 split chat handler vs. parser around enum again ochafik 2025-01-30 04:11:05 +00:00
  • 473374c315 docs(readme): reference examples relative links Guspan Tanadi 2025-01-30 10:18:52 +07:00
  • 743cfdfdba vulkan: initial support for IQ4_XS quantization Rémy O 2025-01-19 20:53:58 +01:00
  • 13961b3553 fix format liyuhang 2025-01-30 09:19:15 +08:00
  • 590c97931a Update tests readme + add raw output to verbose log ochafik 2025-01-30 00:43:30 +00:00
  • 774557cfb4 llama 3.1: allow {name: & {function: syntax even w/ builtin tools (70B model just likes that!) ochafik 2025-01-30 00:43:06 +00:00
  • d86a1ae80d Unify content + message in server_task_result_cmpl_final (+ avoid string copy) ochafik 2025-01-30 00:13:12 +00:00
  • 77c60e662e Avoid passing tools twice in generic handler (now that minja passes them automatically when needed) ochafik 2025-01-30 00:09:56 +00:00
  • b89286a481
    server: added response format for /apply-template to README.md Isaac McFadyen 2025-01-29 18:50:40 -05:00
  • a810c37c76 Partial revert of LLAMA_CACHE=tmp (unless set explicitly in env) ochafik 2025-01-29 23:16:18 +00:00
  • cbecb35619 Add tool call to hot topics ochafik 2025-01-29 22:44:46 +00:00
  • 64545ac9d5 Somehow /* bad inside block comments, ok fine. ochafik 2025-01-29 22:38:52 +00:00
  • 2b2456978a Add cli mode to test-chat to generate template summaries markdown ochafik 2025-01-29 22:33:16 +00:00
  • 84bc083faf Remove server tests LLAMA_CACHE override (tests are serial, and the cache is easier to prefill w/ scripts/fetch_server_test_models.py) ochafik 2025-01-29 21:43:14 +00:00
  • bc8a61138f nits ochafik 2025-01-29 21:42:12 +00:00
  • 36c776f329 Finish renaming of chat inputs vs. params [skip ci] ochafik 2025-01-29 21:29:45 +00:00
  • ed7c622d78 Rename: common/chat.*, common_chat_{inputs -> params} ochafik 2025-01-29 21:18:49 +00:00
  • 7e1c85cd3e HIP: require at least HIP 5.5 uvos 2025-01-29 19:36:00 +01:00
  • af71052cdd HIP: Prepare reduction operators for wave 64 uvos 2025-01-29 19:12:42 +01:00
  • 6e676c8030 sync: minja ochafik 2025-01-29 20:31:28 +00:00
  • 563a2bd0cd sync: minja ochafik 2025-01-29 20:30:52 +00:00
  • fdf735e0ff server : update json snippets in README.md [no ci] Daniel Bevenius 2025-01-29 17:49:17 +01:00
  • ba27e98582 Unify llama 3.x chat handling again (allow {"type": "function", "name": ... prefix) ochafik 2025-01-29 18:29:18 +00:00
  • fe8d4df76b Correctly identify LF token for GPT-2 style BPE tokenizer mgroeber9110 2025-01-29 20:27:37 +01:00
  • eb7cf15a80
    server : add /apply-template endpoint for additional use cases of Minja functionality (#11489) b4589 Nigel Bosch 2025-01-29 12:45:44 -06:00
  • 7b5e0803c8 Move templates/ under models/ ochafik 2025-01-29 18:16:35 +00:00
  • d06448a06a Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes Michal Moskal 2025-01-29 10:15:18 -08:00
  • 682026f84b Create meta-llama-Llama-3.1-8B-Instruct.jinja ochafik 2025-01-29 18:09:59 +00:00
  • babdefc4dd Merge remote-tracking branch 'origin/master' into tool-call ochafik 2025-01-29 17:54:57 +00:00
  • 0f8af536c9 nits ochafik 2025-01-29 17:50:44 +00:00
  • 77dd67c28c tool-calls: disable crashing tests ochafik 2025-01-29 17:36:18 +00:00
  • 66ee4f297c
    vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) b4588 Rémy Oudompheng 2025-01-29 18:29:39 +01:00
  • 76f6ab19ad Update test_tool_call.py ochafik 2025-01-29 17:04:30 +00:00
  • 5475357458 fail llama_sampler_init_llg() at runtime Michal Moskal 2025-01-29 08:53:11 -08:00
  • 41eec4622b rm unused templates, rename one ochafik 2025-01-29 16:50:54 +00:00