Commit graph

  • 2ae9bb7764
    scripts: added inline script metadata per PEP 723 [no-ci] Isaac McFadyen 2025-02-02 11:53:29 -05:00
  • 6c8d01a8bb add regex support slaren 2025-02-02 17:23:32 +01:00
  • bb6b97e71e Merge remote-tracking branch 'origin/master' into sl/custom-tensor-offload slaren 2025-02-02 17:18:30 +01:00
  • 45b1b14866 movmatrix CUDA version: 12.0 -> 11.8 Johannes Gäßler 2025-02-02 16:27:06 +01:00
  • 635628a589
    Merge 4c59b04ac8 into 84ec8a58f7 Siddartha Naidu 2025-02-02 10:21:59 -05:00
  • 84ec8a58f7
    Name colors (#11573) b4616 Eric Curtin 2025-02-02 16:14:48 +01:00
  • 486031c2dd nit: more informative crash when grammar sampler fails ochafik 2025-02-02 12:36:03 +00:00
  • 817f87b2bb add __shfl_sync to HIP Johannes Gäßler 2025-02-02 13:32:28 +01:00
  • 27df617d49 vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation Wagner Bruna 2025-02-02 00:30:12 -03:00
  • e3b7c574b1 __shfl_sync workaround for movmatrix Johannes Gäßler 2025-02-02 12:14:36 +01:00
  • e884d3d530 Merge branch 'master' into xsn/vision_2 Xuan Son Nguyen 2025-02-02 12:06:34 +01:00
  • 10dacabbcd better quantize_row_q8_K Xuan Son Nguyen 2025-02-02 12:04:55 +01:00
  • 0c2ff18cc8 vulkan: Increase MMV batch size and unroll IQ LUT setup Rémy O 2025-02-02 10:22:14 +01:00
  • 7b820e7263 vulkan: add MMV kernels for IQ3 quants Rémy O 2025-02-01 17:47:38 +01:00
  • cc99d98e87 vulkan: implement specialized MMV kernels for IQ2 quantizations Rémy O 2025-02-01 00:12:52 +01:00
  • bfcce4d693
    tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585) b4615 Olivier Chafik 2025-02-02 09:25:38 +00:00
  • 69804487e0
    Fix exotic ci env that lacks ostringstream::str (#11581) b4614 Olivier Chafik 2025-02-02 09:10:15 +00:00
  • 74b0807245
    Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-02-02 11:07:05 +02:00
  • 4a5b654055 bump multi token not preserved log to warning ochafik 2025-02-02 09:06:41 +00:00
  • a278637fb1 make multi token not preserved warning more actionable ochafik 2025-02-02 09:05:08 +00:00
  • 3a37ae4006 comment / warn about preserved tokens not being single tokens ochafik 2025-02-02 09:00:46 +00:00
  • 3e23be7911
    context : store graph build function callback Georgi Gerganov 2025-02-02 10:17:42 +02:00
  • ff227703d6
    sampling : support for llguidance grammars (#10224) b4613 Michał Moskal 2025-02-01 23:55:32 -08:00
  • 0cec062a63
    llama : add support for GLM-Edge and GLM-Edge-V series models (#10573) piDack 2025-02-02 15:48:46 +08:00
  • 31191addf6
    minor : style Georgi Gerganov 2025-02-02 09:47:57 +02:00
  • 58d5154e7d
    Merge 8dfe3d8e97 into 53debe6f3c Milot Mirdita 2025-02-02 08:15:54 +01:00
  • 5fd28b3aca use set::find ochafik 2025-02-02 02:07:32 +00:00
  • a28d9befbc rm msg.thoughts (that's for later / R1) ochafik 2025-02-02 02:06:08 +00:00
  • 548ac5a4c0 fix test-chat ochafik 2025-02-02 02:02:56 +00:00
  • e44a8eb7f1 tool-call: Command R7B (w/ tool_plan return), preserved_tokens & test cleanup ochafik 2025-02-02 01:52:21 +00:00
  • 23e74abb8b Fix exotic ci env that lacks ostringstream::str ochafik 2025-02-01 21:41:01 +00:00
  • 60958f60ea CUDA: use mma PTX instructions for FlashAttention Johannes Gäßler 2025-01-24 21:13:05 +01:00
  • 07820effd9 devops: increase timeout of Vulkan tests again Rémy O 2025-02-01 19:58:00 +01:00
  • 5246ae7edf vulkan: define MMV kernels for IQ1 quantizations Rémy O 2025-02-01 19:27:21 +01:00
  • 53debe6f3c
    ci: use sccache on windows HIP jobs (#11553) b4611 Olivier Chafik 2025-02-01 18:22:38 +00:00
  • eb3041a202 ggml : add NUMA-aware buffer type that allocates pages accordingly to the first-touch policy llama : use NUMA-aware buffer type for KV cache Stanisław Szymczyk 2025-02-01 17:40:45 +01:00
  • bf3ea1fd6e
    Update CMakeLists.txt magicse 2025-02-01 18:16:22 +02:00
  • 2c2671d2c8
    Update CMakeLists.txt magicse 2025-02-01 18:11:45 +02:00
  • 5b3cd1c0b9 Name colors Eric Curtin 2025-01-31 15:04:21 +00:00
  • cfd74c86db
    sync: minja (418a2364b5) (#11574) b4610 Olivier Chafik 2025-02-01 12:24:51 +00:00
  • a967723faa Don't use sccache for HIP windows builds ochafik 2025-02-01 11:52:03 +00:00
  • c8bc6e4ff4 llama : increased max_nodes as large MoE models use massive amounts of nodes during warmup Stanisław Szymczyk 2025-02-01 12:43:14 +01:00
  • 9ca18c21b7 sync: minja (418a2364b5) ochafik 2025-02-01 11:41:58 +00:00
  • ecef206ccb
    Implement s3:// protocol (#11511) b4609 Eric Curtin 2025-02-01 11:30:54 +01:00
  • 83a473a001 llama : use all experts during warmup Stanisław Szymczyk 2025-02-01 10:32:06 +01:00
  • c8ec97d95e
    Merge branch 'ggerganov:master' into support_glm_edge_model piDack 2025-02-01 09:10:49 +08:00
  • e39e2b29d9 updated readme zhycheng614 2025-02-01 01:00:59 +00:00
  • 77b0b28d3e fix format liyuhang 2025-02-01 08:28:20 +08:00
  • 5bbc7362cb
    ci: simplify cmake build commands (#11548) b4608 Olivier Chafik 2025-02-01 00:01:20 +00:00
  • 4ad82587ee tools_json_arr now properly passed to apply-template Mason M 2025-01-31 17:57:00 -04:00
  • 705758989d bump llguidance to 0.6.12 Michal Moskal 2025-01-31 11:39:52 -08:00
  • a617304cb2
    Merge 244811d856 into aa6fb13213 Dmitry Wolf 2025-01-31 22:35:25 +03:00
  • a049afb401 typo in merge Michal Moskal 2025-01-31 11:27:52 -08:00
  • 6b2de55137 Merge branch 'master' into llg Michal Moskal 2025-01-31 11:23:33 -08:00
  • 786b6a9a3f
    Update CMakeLists.txt magicse 2025-01-31 21:03:55 +02:00
  • 4c59b04ac8 Add support for Deepseek-R1 flash attention Siddartha Naidu 2025-01-31 18:48:48 +00:00
  • 183029d88f Add tools option to llama-cli Mason M 2025-01-31 14:37:23 -04:00
  • 9fc1ed18ac set CMAKE_HIP_COMPILER_LAUNCHER env var, not cmake var Olivier Chafik 2025-01-31 18:23:08 +00:00
  • 167c500716 install zip in cuda container Olivier Chafik 2025-01-31 17:34:04 +00:00
  • 341c93162e Merge remote-tracking branch 'origin/master' into ci-nit-build Olivier Chafik 2025-01-31 17:30:24 +00:00
  • 7a8bb50714 ci: attempt to use sccache + HIP Olivier Chafik 2025-01-31 17:14:08 +00:00
  • aa6fb13213
    ci: use sccache on windows instead of ccache (#11545) b4607 Olivier Chafik 2025-01-31 17:12:40 +00:00
  • ae175fe700 ci + cuda: checkout w/ history when packaging needed Olivier Chafik 2025-01-31 16:55:58 +00:00
  • 89da8df649 fix typo Olivier Chafik 2025-01-31 16:05:29 +00:00
  • 1b8f9caa05 minimize diff Olivier Chafik 2025-01-31 15:48:07 +00:00
  • fa38b8efaf Merge remote-tracking branch 'origin/master' into cuda-releases Olivier Chafik 2025-01-31 15:40:52 +00:00
  • 90e9dcf625 ci: simplify cmake build commands Olivier Chafik 2025-01-31 14:43:30 +00:00
  • 0e5da5e190 revert superfluous cmake --parallel flags! Olivier Chafik 2025-01-31 14:22:00 +00:00
  • a83f528688
    tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539) b4606 Olivier Chafik 2025-01-31 14:15:25 +00:00
  • 45a1c2027f
    Apply suggestions from code review Olivier Chafik 2025-01-31 14:14:28 +00:00
  • 64b3e5d8f1 Parallel cmake builds & ctests Olivier Chafik 2025-01-31 14:07:53 +00:00
  • e6c6ee385f Update CMakeLists.txt Olivier Chafik 2025-01-31 13:57:22 +00:00
  • b1bcd309fc
    fix stop regression (#11543) b4605 Olivier Chafik 2025-01-31 13:48:31 +00:00
  • c6f6579785 squash! server : add server_task_type field to server_task_result Daniel Bevenius 2025-01-31 14:44:11 +01:00
  • 0e09ec84b5 squash! server : add server_task_type field to server_task_result Daniel Bevenius 2025-01-31 14:27:38 +01:00
  • 4156384967 Detect sccache in cmake Olivier Chafik 2025-01-31 13:21:25 +00:00
  • 6cc2956f3f server : add server_task_type field to server_task_result Daniel Bevenius 2025-01-31 14:14:39 +01:00
  • 2d2d07618e server : extract handle_slot_erase to handle_slot_impl Daniel Bevenius 2025-01-31 14:12:02 +01:00
  • 5d3491e789
    Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-01-31 15:11:02 +02:00
  • fba6cb6ed1 shuffle ccache vars Olivier Chafik 2025-01-31 13:09:40 +00:00
  • 6edb2c8731 Try sccache on ci for windows Olivier Chafik 2025-01-31 12:48:24 +00:00
  • 46e513861c fix stop regression Olivier Chafik 2025-01-31 12:29:39 +00:00
  • 7c34af40fb squash! server : add handle_slot_type lambda Daniel Bevenius 2025-01-31 13:19:21 +01:00
  • ed82223b5c readme: function calling *is* supported now Olivier Chafik 2025-01-31 12:09:33 +00:00
  • fa20249305 Add proper tool call docs to server README Olivier Chafik 2025-01-31 11:53:06 +00:00
  • b094ed3176 Implement s3:// protocol Eric Curtin 2025-01-30 12:50:46 +00:00
  • 8a44b2fb92 Fix empty content for functionary v3.2 tool call Olivier Chafik 2025-01-31 11:36:44 +00:00
  • 422df5da08 Llama 3.x tools: accept / trigger on more varied spaced outputs Olivier Chafik 2025-01-31 10:28:24 +00:00
  • b31259bfcf More debug logs Olivier Chafik 2025-01-31 10:27:16 +00:00
  • 479918599b Force-disable parallel_tool_calls if template doesn't support it Olivier Chafik 2025-01-31 10:27:07 +00:00
  • caed1ef76f sync: minja (tool call name optional https://github.com/google/minja/pull/36) Olivier Chafik 2025-01-31 09:28:23 +00:00
  • 5408cb8459 An empty tool_call_id is better than none! Olivier Chafik 2025-01-31 09:28:00 +00:00
  • 76543311ac llama : avoid ggml_cont() is possible in DeepSeek V2 implementation Stanisław Szymczyk 2025-01-30 18:25:36 +01:00
  • 5783575c9d
    Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533) b4604 Olivier Chafik 2025-01-31 08:24:29 +00:00
  • 4a2b196d03
    server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531) b4603 Olivier Chafik 2025-01-31 08:12:40 +00:00
  • c5825e76ec server : add handle_slot_type lambda Daniel Bevenius 2025-01-31 07:49:14 +01:00
  • 97a7157e11
    Update README.md vincent 2025-01-31 07:36:08 +01:00
  • 1bd3047a93
    common: Add missing va_end (#11529) Steve Grubb 2025-01-31 00:58:55 -05:00
  • a2df2787b3
    server : update help metrics processing/deferred (#11512) b4601 Daniel Bevenius 2025-01-31 06:04:53 +01:00
  • c7a32e761d common : use GGUF for imatrix output by default Francis Couture-Harpin 2025-01-30 19:56:20 -05:00