Commit graph

  • 9b59c08db3 readme : more lora detail in main example readme Rich Dougherty 2024-10-29 13:33:16 +13:00
  • aefac1e5cb tool-call: update scripts/fetch_server_test_models.py ochafik 2024-10-28 23:57:23 +00:00
  • b825440c81 tool-call: use Q4_K_M models ochafik 2024-10-28 23:56:40 +00:00
  • 74d71a673e agent: simplify syntax (default tools to local w/ default port) ochafik 2024-10-28 23:54:01 +00:00
  • b51c71c734 tool-call: remove duplicate script to fetch templates ochafik 2024-10-28 21:35:18 +00:00
  • 63c47ab8c3 llama : refactor model loader with backend registry slaren 2024-10-28 22:06:13 +01:00
  • 61715d5cc8
    llama : Add IBM granite template (#10013) b3987 arch-btw 2024-10-28 10:45:33 -07:00
  • 07028f9d74
    flake.lock: Update (#10063) Georgi Gerganov 2024-10-28 17:41:24 +02:00
  • 31a90b3cb6 test Caleb P. Nwokocha 2024-10-28 10:15:26 -05:00
  • 337338813b
    Update src/llama.cpp Xuan Son Nguyen 2024-10-28 16:10:25 +01:00
  • 8b0b64bb75
    Apply suggestions from code review Xuan Son Nguyen 2024-10-28 16:09:26 +01:00
  • dbfa8a7b62
    Merge branch 'ggerganov:master' into master momonga 2024-10-28 21:55:11 +09:00
  • 839cf4ccab
    Fix spacing arch-btw 2024-10-28 04:32:20 -07:00
  • 50ef6ca3b9
    Add code space & arch-btw 2024-10-28 04:20:02 -07:00
  • ec547e4137 tool-call: add tests: tool_call=none, parallel_tool_calls=true ochafik 2024-10-28 10:04:00 +00:00
  • 968b4bac5b
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-10-28 14:39:10 +05:00
  • 524afeec9d
    musa: workaround for Guilty Lockup in cleaning src0 (#10042) b3985 R0CKSTAR 2024-10-28 17:02:48 +08:00
  • ec2be7bf57
    llama : remove Tail-Free sampling Georgi Gerganov 2024-10-28 08:56:12 +02:00
  • 8125e6cbfc
    server : don't overfill the batch during infill (#10018) b3984 Georgi Gerganov 2024-10-28 08:49:32 +02:00
  • c95c957d8b
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-10-28 09:29:39 +05:00
  • f1fc512752 Implement ggml_v_expf() with a fast approximation on AVX/AVX2/AVX512 The code implements a fast, vectorized approximation to exp(x). The approximation trick should work on most systems, but is implemented only for AVX/AVX2/AVX512 to start. J M 2024-10-27 12:34:32 -07:00
  • 168add7ec8 Update tool_call.feature ochafik 2024-10-28 02:06:00 +00:00
  • dd6d0241a7 tool-call: script to prefetch models used in server tests ochafik 2024-10-28 02:01:00 +00:00
  • 7fde6d0091 tool_call: test no tool call on a real model + rename scenarios ochafik 2024-10-28 02:00:09 +00:00
  • c88095e3fc space nits ochafik 2024-10-28 00:27:04 +00:00
  • 9a86ea79a2 tool-call: slow tool call integration tests ochafik 2024-10-28 00:26:40 +00:00
  • 8841ce3f43
    llama : switch KQ multiplication to F32 precision by default (#10015) b3983 Georgi Gerganov 2024-10-27 20:59:58 +02:00
  • 48d5a1f8d0
    server : don't overfill the batch during infill Georgi Gerganov 2024-10-23 17:15:57 +03:00
  • ec9f3b101b nits ochafik 2024-10-27 16:44:54 +00:00
  • 080982ebf3 tool-call: test MistralNemo in forced tools server tests (w/ parallel tool calls disabled) ochafik 2024-10-27 16:39:51 +00:00
  • 60ed87077d
    Small change to \n arch-btw 2024-10-26 18:09:23 -07:00
  • 3cca307ea6
    Added proper template and expected output arch-btw 2024-10-26 18:03:17 -07:00
  • ac031c2ac4 flake.lock: Update github-actions[bot] 2024-10-27 00:22:59 +00:00
  • 439ca3bd04 fix deepseek deseret regex Daniel Hiltgen 2024-10-25 16:25:18 -07:00
  • f7da741559
    Merge 64472c9e97 into cc2983d375 0xez 2024-10-26 22:19:09 +01:00
  • 554500c5ac
    Merge f286589a32 into cc2983d375 Olivier Chafik 2024-10-26 22:19:08 +01:00
  • 0510290e2a
    Merge e5ca7e9507 into cc2983d375 MONONOKE 2024-10-26 22:18:58 +01:00
  • 06014f4260
    Merge 914922d27e into cc2983d375 DrewZt 2024-10-26 22:18:27 +01:00
  • bc2b15012c
    Merge eb357d0822 into cc2983d375 Olivier Chafik 2024-10-26 22:17:55 +01:00
  • 96901379ae
    Merge branch 'ggerganov:master' into master momonga 2024-10-27 03:16:52 +09:00
  • 24c46d9364 Changed "llama" in file names to "jarvis" Caleb P. Nwokocha 2024-10-26 11:45:36 -05:00
  • 52ab617954 Changed "llama" to "jarvis" Caleb P. Nwokocha 2024-10-26 11:10:29 -05:00
  • 7b59859b15
    Merge branch 'ggerganov:master' into mobile_vlm Changyeon Kim 2024-10-27 01:05:16 +09:00
  • 3932fd5740 [fix] Correct the incorrect order of the parameters. Changyeon Kim 2024-10-27 00:50:22 +09:00
  • 4dfbcf9646
    Merge branch 'ggerganov:master' into master Caleb Princewill Nwokocha 2024-10-26 10:27:33 -05:00
  • 9b56176c6a test Caleb P. Nwokocha 2024-10-26 10:10:52 -05:00
  • 8fe174dce0
    Update tests/test-chat-template.cpp arch-btw 2024-10-26 05:21:17 -07:00
  • 26f0911aee
    Update src/llama.cpp arch-btw 2024-10-26 04:56:40 -07:00
  • ee952748ed
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-10-26 14:02:50 +05:00
  • cc2983d375
    sync : ggml b3982 Georgi Gerganov 2024-10-26 10:34:08 +03:00
  • 8c60a8a462
    increase cuda_cpy block size (ggml/996) bssrdf 2024-10-23 14:34:00 -04:00
  • 9e4a2563ea
    scripts : fix amx sync [no ci] Georgi Gerganov 2024-10-26 10:33:31 +03:00
  • b4ba248694 use float literals Michael Coppola 2024-10-26 03:25:53 -04:00
  • 61017976cd Merge branch 'k-shift2' of https://github.com/MaggotHATE/llama.cpp-greedy-rework into k-shift2 MaggotHATE 2024-10-26 12:21:55 +05:00
  • 48b715da28 Fixes and tests MaggotHATE 2024-10-26 12:21:46 +05:00
  • 4f80618716 sampling : add adaptive temperature sampler Michael Coppola 2024-10-26 02:58:07 -04:00
  • 5237aa4218
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-10-26 00:58:48 +05:00
  • 006167dd65
    Update server README.md Burhanuddin Mustafa Lakdawala 2024-10-25 12:52:46 -07:00
  • 668750357e
    metal : support permuted matrix multiplicaions (#10033) Georgi Gerganov 2024-10-25 22:26:15 +03:00
  • 070f9546f6 Fixed style MaggotHATE 2024-10-25 23:28:30 +05:00
  • 87384fb557 K-Shift commit MaggotHATE 2024-10-25 23:22:16 +05:00
  • 8c01771112
    metal : minor Georgi Gerganov 2024-10-25 19:51:40 +03:00
  • 78416536c8
    metal : minor refactor Georgi Gerganov 2024-10-25 19:46:36 +03:00
  • ff252ea48e
    llama : add DRY sampler (#9702) b3978 wwoodsTM 2024-10-25 10:07:34 -06:00
  • d80fb71f8b
    llama: string_split fix (#10022) b3977 Michael Podvitskiy 2024-10-25 17:57:54 +02:00
  • fcc5a22fde Fix logging from llama-llava-cli Gábor Stefanik 2024-10-25 17:05:33 +02:00
  • b96ef696c5 llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string Michael Podvitskiy 2024-10-25 15:51:47 +02:00
  • bb9d36be78
    cont : add comments [no ci] Georgi Gerganov 2024-10-25 16:36:04 +03:00
  • 18989be340
    cont : use nb01 directly for row steps Georgi Gerganov 2024-10-25 16:27:03 +03:00
  • 37057a0fcf ggml : Format gemm rvv code Xiongchuan Tan 2024-10-25 16:32:13 +08:00
  • 862b9598d9 musa: workaround for Guilty Lockup in cleaning src0 Xiaodong Ye 2024-10-25 16:19:33 +08:00
  • c263ca767b remove wrong assert in norm WA for permute(0,1,3,2) mul_mat ggml-ci Meng, Hengyu 2024-10-25 07:41:48 +00:00
  • 2f8bd2b901
    llamafile : extend sgemm.cpp support for Q5_0 models (#10010) b3976 Srihari-mcw 2024-10-25 12:57:41 +05:30
  • bc5ba007b2
    server : check that the prompt fits in the slot's context (#10030) b3975 Georgi Gerganov 2024-10-25 10:13:46 +03:00
  • bd67115981
    Merge branch 'master' into gg/server-check-ctx Georgi Gerganov 2024-10-25 10:13:10 +03:00
  • 30bd00bcf7 agent: fix tools setup Olivier Chafik 2024-10-25 02:00:47 +01:00
  • 5c414a3335 agent: simplify tools setup Olivier Chafik 2024-10-25 01:03:45 +01:00
  • 958367bf53
    server : refactor slot input data, move tokenizer to HTTP thread (#10023) b3974 Xuan Son Nguyen 2024-10-24 21:51:22 +02:00
  • 40f2555797
    ci : fix cmake flags for SYCL Georgi Gerganov 2024-10-24 21:23:33 +03:00
  • 0f4fc8cb28 agent: fix no-cache issue in squid for brave tool Olivier Chafik 2024-10-24 18:59:37 +01:00
  • 7f7acdbec5 use llama_tokens everywhere Xuan Son Nguyen 2024-10-24 16:53:38 +02:00
  • 13ee779313 update docs Xuan Son Nguyen 2024-10-24 16:39:03 +02:00
  • 5e67c4a576 Make Kompute error verbose about unsupported types Eric Curtin 2024-10-24 15:38:41 +01:00
  • 4a9f3e7628 rename completion to inference Xuan Son Nguyen 2024-10-24 16:29:38 +02:00
  • 575b1332ab remove redundant code Xuan Son Nguyen 2024-10-24 16:23:55 +02:00
  • c34ab08a16 fix test Xuan Son Nguyen 2024-10-24 15:59:49 +02:00
  • 07381f7d97 try fixing format_infill Xuan Son Nguyen 2024-10-24 15:56:49 +02:00
  • fea5ca4524 add infill test Xuan Son Nguyen 2024-10-24 15:56:35 +02:00
  • 167a515651
    CUDA: fix insufficient buffer clearing for MMQ (#10032) b3972 Johannes Gäßler 2024-10-24 14:40:23 +02:00
  • 5409a21e1b
    metal : support permuted matrix multiplicaions Georgi Gerganov 2024-10-24 15:05:58 +03:00
  • 03b86416e1 agent: fix deps + make docker compose setup easier to debug Olivier Chafik 2024-10-24 12:30:27 +01:00
  • cff97ad3f4 bring back infill validation Xuan Son Nguyen 2024-10-24 13:20:44 +02:00
  • 49dcb0aa60 CUDA: fix insufficient buffer clearing for MMQ Johannes Gäßler 2024-10-24 11:50:23 +02:00
  • c39665f589
    CUDA: fix MMQ for non-contiguous src0, add tests (#10021) b3971 Johannes Gäßler 2024-10-24 11:09:36 +02:00
  • 1905ba1a22
    server : check that the prompt fits in the slot's context Georgi Gerganov 2024-10-24 10:44:00 +03:00
  • dc408bba7d DRY: Fixed crash issue due to DRY being in chain but uninitialized wwoodsTM 2024-10-24 01:15:04 -06:00
  • 78c78e2af2 ggml : Fix GCC rvv load alignment issue Xiongchuan Tan 2024-10-24 14:18:01 +08:00
  • c039415ecf ggml : optimize gemm to avoid register spillover Xiongchuan Tan 2024-10-22 21:49:40 +08:00
  • 238cd6674e ggml : Added initial implementation of rvv gemm Xiongchuan Tan 2024-10-22 15:42:50 +08:00
  • 3f7fdf24b0 ggml : Added WIP rvv q4_0_8x8 gemm Xiongchuan Tan 2024-10-22 02:42:35 +08:00