Commit graph

  • 326e7002b3 update test_calc_result ochafik 2025-02-04 03:13:13 +00:00
  • f0154a6479 Fix / test models/templates/llama-cpp-deepseek-r1.jinja ochafik 2025-02-04 03:09:15 +00:00
  • a682d1216d fix / test parsing of r1 parser ochafik 2025-02-04 02:23:31 +00:00
  • c0a71b1330 wip: fix mllama error YiYing He 2025-01-20 15:49:34 +08:00
  • 88c513f4c9 examples: add mllama implementation YiYing He 2025-01-15 17:44:28 +08:00
  • 8bb33d3285 ggml: apply the unpad operator patch YiYing He 2025-01-15 17:08:54 +08:00
  • 45a89e0cec llama: apply the mllama support patch YiYing He 2025-01-15 17:07:09 +08:00
  • 9a6847c857 move trigger_words init inside non-llguidance branch ochafik 2025-02-04 01:13:01 +00:00
  • 18a11f43f0 tool-call: r1: fix grammar ochafik 2025-02-04 01:12:44 +00:00
  • e84ee88f50 r1: fix inadvertent newline in grammar before <|tool▁call▁end|> ochafik 2025-02-04 00:36:38 +00:00
  • ce28224de8 tool-call: r1: add one more trigger approx "<|tool calls begin|>" Olivier Chafik 2025-02-04 00:28:40 +00:00
  • bff549deb6 simplify hack to fix original template's backfill from minja Olivier Chafik 2025-02-04 00:14:48 +00:00
  • bbd45bf6a2 sync: minja Olivier Chafik 2025-02-04 00:14:15 +00:00
  • 30ea3591c9 update to minja's new api Olivier Chafik 2025-02-03 23:53:27 +00:00
  • 11c1f0c7d4 actually we want eos_token in the template to infer tool call examples, explicitly skipped in new template options Olivier Chafik 2025-02-03 23:52:28 +00:00
  • bc6d910f6d Merge branch 'master' into r1-toolcall Olivier Chafik 2025-02-03 23:51:31 +00:00
  • cde3833239
    tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616) b4628 Olivier Chafik 2025-02-03 23:49:27 +00:00
  • 108da907f0 sync: minja https://github.com/google/minja/pull/46 Olivier Chafik 2025-02-03 23:31:49 +00:00
  • b3451785ac
    server : (webui) revert hacky solution from #11626 (#11634) Xuan-Son Nguyen 2025-02-04 00:10:52 +01:00
  • dcf3dcdd10 log_server_request: rm try catch, add reminder Xuan Son Nguyen 2025-02-03 23:27:42 +01:00
  • 722187fce8 server : (webui) revert hacky solution from #11626 xsn/revert_bad_server_ux Xuan Son Nguyen 2025-02-03 23:23:11 +01:00
  • 1d1e6a90bc
    server : (webui) allow typing and submitting during llm response (#11626) Woof Dog 2025-02-03 22:16:27 +00:00
  • 352f79ca9f Merge branch 'master' into llamacli-tools Mason M 2025-02-03 17:17:10 -04:00
  • 1c302e18ba simpler hacky fixes for original broken template (+ fix minja example syntax polyfill) Olivier Chafik 2025-02-03 20:34:44 +00:00
  • a2da5a2c0b cmake: include folder and common folder is private to llama library #11630 brian khuu 2025-02-04 07:13:38 +11:00
  • 5cb76bc193 fix: examples/swiftui llama_model -> llama_vocab Maksym Matviievskyi 2025-02-03 20:10:26 +00:00
  • a444c15209 ci: add bash script to check if llama-impl.h was included in example folder erronously brian khuu 2025-02-04 07:06:08 +11:00
  • c6214ee9d6 rm unneeded vocab Olivier Chafik 2025-02-03 19:59:50 +00:00
  • 7dc271fb37 tool-calls: add deepseek r1 template + accommodate broken official template slightly better Olivier Chafik 2025-02-03 19:59:33 +00:00
  • 7639123169
    Add llm_client Rust crate to readme bindings Shelby Jenkins 2025-02-03 13:57:09 -06:00
  • 0be7f652e9 Merge branch 'jinja-chatml' into r1-toolcall Olivier Chafik 2025-02-03 19:35:54 +00:00
  • d73448de1c Simplify default chatml logic Olivier Chafik 2025-02-03 19:22:53 +00:00
  • d8b8c80cd5 server : (webui) allow typing and submitting during llm response woof-dog 2025-02-03 20:12:41 +01:00
  • 569610ee77 tool-calls: accommodate variety of wrong tool call opening tags both Qwen 32B and 7B distills like to spit out Olivier Chafik 2025-02-03 18:57:55 +00:00
  • c397bd1f5f tweak delta logic Olivier Chafik 2025-02-03 17:57:38 +00:00
  • df3474e2c2 tool-calls: r1: add missing <|tool▁calls▁end|> to grammar! Olivier Chafik 2025-02-03 17:33:14 +00:00
  • 08271b5505 Merge branch 'jinja-chatml' into r1-toolcall Olivier Chafik 2025-02-03 17:32:38 +00:00
  • b2dd490926 add missing try catch around jinja parsing to default to chatml Olivier Chafik 2025-02-03 17:32:12 +00:00
  • 4cb0e1d873 Merge branch 'jinja-chatml' into r1-toolcall Olivier Chafik 2025-02-03 17:15:14 +00:00
  • 2b3c4829a3 fix build / rm diff Olivier Chafik 2025-02-03 16:34:43 +00:00
  • b34cf89719
    Merge ad622ca97e into 5598f475be Dhruv Anand 2025-02-03 21:24:44 +05:30
  • 5e602e9152 squash! server : use httplib status codes Daniel Bevenius 2025-02-03 16:48:14 +01:00
  • 5598f475be
    server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622) Daniel Bevenius 2025-02-03 16:45:38 +01:00
  • 802c94524f server : use httplib status codes Daniel Bevenius 2025-02-03 16:22:43 +01:00
  • 5193c202ce server : remove CPPHTTPLIB_NO_EXCEPTIONS define Daniel Bevenius 2025-02-03 15:30:26 +01:00
  • 4247f3d210 server : add try..catch to places not covered by set_exception_handler Xuan Son Nguyen 2025-02-03 15:30:08 +01:00
  • aa98e59038 fix bad merge Olivier Chafik 2025-02-03 14:01:49 +00:00
  • 5d18d76b69 fix double bos issue (drop bos/eos tokens from jinja template) Olivier Chafik 2025-02-03 13:59:16 +00:00
  • cf83623a47 fix typo Olivier Chafik 2025-02-03 13:58:46 +00:00
  • bfc342ca97 squash! server : add server_task_type field to server_task_result Daniel Bevenius 2025-02-03 14:33:44 +01:00
  • 2991954b7d De-duplicate fmt and format functions and optimize optimize_fmt Eric Curtin 2025-01-31 14:08:30 +00:00
  • 8ec05832fa
    sync : ggml Georgi Gerganov 2025-02-03 14:57:08 +02:00
  • 21c84b5d2d
    CUDA: fix Volta FlashAttention logic (#11615) b4623 Johannes Gäßler 2025-02-03 13:25:56 +01:00
  • 1eca8916b5
    llama : fix rwkv inference (#11618) Molly Sophia 2025-02-03 20:17:50 +08:00
  • 5ee63ee4e4 CUDA: fix Volta FlashAttention logic Johannes Gäßler 2025-02-03 11:23:41 +01:00
  • 94bc968f7d
    HIP: add doc on small default launch bounds fxzjshm 2025-02-03 19:41:42 +08:00
  • 00e6bfeca9 llama : fix rwkv inference Molly Sophia 2025-02-03 17:45:45 +08:00
  • 91108c14f0 revert minor regex diff ochafik 2025-02-03 11:20:48 +00:00
  • 4033365e00 ci: add bash script to check if llama-impl.h was included in example folder erronously brian khuu 2025-02-03 22:01:47 +11:00
  • a76073cf88 minimize diffs ochafik 2025-02-03 10:58:52 +00:00
  • 77ae97e7d6 Update test_tool_call.py ochafik 2025-02-03 10:28:30 +00:00
  • d92cb67e37
    server : (webui) Fix Shift+Enter handling (#11609) mashdragon 2025-02-03 09:42:55 +00:00
  • f24cc3c2db build index.html.gz Xuan Son Nguyen 2025-02-03 10:15:31 +01:00
  • ccfdca810e Added -no-cnv flag to force instruct models to continuously generate tokens AbdulMuqeet Mohammed 2025-02-03 00:13:50 -05:00
  • 1e9acd2d31 tool-call: allow --jinja --chat-template chatml ochafik 2025-02-03 04:07:11 +00:00
  • 46c1f36c65
    Fix Shift+Enter handling mashdragon 2025-02-03 04:04:05 +00:00
  • 7e44fbe908 test multiline non-tool-call responses in test-chat ochafik 2025-02-03 03:01:43 +00:00
  • 568a4f5b0e fix command r7b normal response regex + add to server test ochafik 2025-02-03 02:48:15 +00:00
  • 5e6f2a21ae add deepseek models to server tool call section in readme ochafik 2025-02-03 02:44:42 +00:00
  • 19bea4ecc3 tell DS R1 not to overthink (weather test) ochafik 2025-02-03 02:20:03 +00:00
  • ae9d5812a7 tool-calls: add DeepSeek R1 Qwen 7B to server test_hello_world ochafik 2025-02-03 02:15:25 +00:00
  • 04be723b33 tool-call: fix command-r7b parsing when response is multiline ochafik 2025-02-03 02:13:55 +00:00
  • 73d08d49cf tool-call: allow --jinja --chat-template chatml ochafik 2025-02-03 02:13:28 +00:00
  • 08716281f2 rename tests ochafik 2025-02-03 01:21:35 +00:00
  • c80cb30938 update logs ochafik 2025-02-03 01:21:09 +00:00
  • 28345877e4 server/oai: ensure content is null when there are tool calls ochafik 2025-02-03 01:20:45 +00:00
  • 04d511b5b5 Avoid double bos w/ jinja ochafik 2025-02-03 01:20:11 +00:00
  • 130ca222c9 DeepSeek R1: parse thoughts / return in separate field in API (non streamed mode) ochafik 2025-02-03 01:19:15 +00:00
  • 87de852b7f pass vocab to common_chat_params_init ochafik 2025-02-03 01:16:02 +00:00
  • 7e3e0d98a0
    add --rpc-layers flag to explicitly set RPC layers Karl-Johan Alm 2025-02-03 11:24:29 +09:00
  • 106d2b3667 Only write ccache when pushing to master, and evict files after 12h of unused ochafik 2025-02-03 02:23:45 +00:00
  • d3b60b8ad8 minja: enhance backfill of templates w/o tools description (use example tool call delta!) ochafik 2025-02-03 01:03:04 +00:00
  • 67eb5ea7f1
    Merge d8eca4d82e into 6eecde3cc8 Tim Janik 2025-02-02 23:48:33 +01:00
  • 6eecde3cc8
    HIP: fix flash_attn_stream_k_fixup warning (#11604) b4621 Johannes Gäßler 2025-02-02 23:48:29 +01:00
  • 2ed9941ecb
    Update common.cpp magicse 2025-02-03 00:18:43 +02:00
  • 396856b400
    CUDA/HIP: add support for selectable warp size to mmv (#11519) b4620 uvos 2025-02-02 22:40:09 +01:00
  • 4d0598e144
    HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601) b4619 uvos 2025-02-02 22:08:05 +01:00
  • a30c60824c HIP: fix flash_attn_stream_k_fixup warning Johannes Gäßler 2025-02-02 21:34:03 +01:00
  • 19d84e08a3 some optimizations (may be) lexasub 2025-02-02 15:30:08 +04:00
  • 3a2db16467 HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other uvos 2025-02-02 20:36:48 +01:00
  • 90f9b88afb
    nit: more informative crash when grammar sampler fails (#11593) b4618 Olivier Chafik 2025-02-02 19:58:34 +00:00
  • 182f418ba0
    Update ggml/src/ggml-cuda/common.cuh uvos 2025-02-02 20:56:43 +01:00
  • 078ee4ff65 CUDA/HIP: add support for selectable warp size to mmv uvos 2025-01-29 23:37:28 +01:00
  • d8eca4d82e examples/server/public/index.html.gz: npm run build Tim Janik 2025-01-26 22:18:57 +01:00
  • 65be99eee5 examples/server/webui/src/main.js: populate textarea from query string Tim Janik 2024-12-10 17:32:40 +01:00
  • ea489cba8c examples/server/webui/index.html: assign id="msg-send" to the "Send" button Tim Janik 2024-12-10 17:32:40 +01:00
  • 864a0b67a6
    CUDA: use mma PTX instructions for FlashAttention (#11583) b4617 Johannes Gäßler 2025-02-02 19:31:09 +01:00
  • 8c2b61a6c5
    Change umlaut test Alex Fanthome 2025-02-02 18:11:09 +00:00
  • 51670bd43e
    Update ggml/src/ggml-cuda/mma.cuh Johannes Gäßler 2025-02-02 18:30:11 +01:00
  • 37910e42ef
    Update ggml/src/ggml-cuda/mma.cuh Johannes Gäßler 2025-02-02 18:29:59 +01:00