Commit graph

  • da356c8c08 doc: add cuda guide for fedora Tei Home 2025-01-09 19:21:31 +08:00
  • 0b87a2dd10 examples : add README.md to tts example [no ci] Daniel Bevenius 2025-01-09 11:41:19 +01:00
  • 8eceb888d7
    server : add tooltips to settings and themes btn (#11154) Daniel Bevenius 2025-01-09 11:28:29 +01:00
  • f8feb4b01a
    model: Add support for PhiMoE arch (#11003) b4453 Pierrick Hymbert 2025-01-09 11:21:41 +01:00
  • 3db9cea0ad rm tooltip for 3 dots button Xuan Son Nguyen 2025-01-09 11:20:46 +01:00
  • c0dd28d16a
    doc: add phimoe as supported model Pierrick HYMBERT 2024-12-29 14:57:35 +01:00
  • 3199b2f301
    doc: minor Pierrick Hymbert 2024-12-28 19:38:08 +01:00
  • e0e23b5c37
    doc: minor Pierrick Hymbert 2024-12-28 19:37:59 +01:00
  • 7385f7d7e2
    python linter Pierrick HYMBERT 2024-12-28 15:49:47 +01:00
  • 4be934c453
    model: support phimoe Pierrick HYMBERT 2024-12-28 15:11:12 +01:00
  • be0e950c91
    media : remove old img [no ci] Georgi Gerganov 2025-01-09 11:15:15 +02:00
  • 37518b7dda Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function. Andreas Kieslinger 2025-01-09 09:15:12 +00:00
  • d9feae1c06
    llama-chat : add phi 4 template (#11148) b4451 Xuan Son Nguyen 2025-01-09 10:07:33 +01:00
  • 0817bc9bdc squash! server : add tooltips to settings and themes btn Daniel Bevenius 2025-01-09 08:37:56 +01:00
  • 8e221289ce server : add tooltips to settings and themes btn Daniel Bevenius 2025-01-09 08:00:31 +01:00
  • c9463641af Merge https://github.com/ggerganov/llama.cpp into vulkan Eve 2025-01-08 21:59:37 -05:00
  • 923e9a8377 q3_k use hmask simd from cpu avx version Eve 2025-01-08 21:13:09 -05:00
  • fe71a8c4a1 q3_k optimizations Eve 2025-01-07 22:05:51 -05:00
  • cc28742ca3 q2_k better dequant Eve 2025-01-07 21:20:33 -05:00
  • 91f1d9ce99 better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations Eve 2025-01-07 19:57:55 -05:00
  • 6f5d62b098 q5_k Eve 2025-01-06 17:13:23 -05:00
  • cdf70cf27f better q4_k scales Eve 2025-01-05 22:43:12 -05:00
  • b4ae7005e6 unpack should be u16, add vim swap to gitignore (about time) Eve 2025-01-05 21:59:43 -05:00
  • 173077180f Revert "try precalculating products of a and q2_k scales" Eve 2025-01-05 17:01:34 -05:00
  • bdd98c74e2 try precalculating products of a and q2_k scales Eve 2025-01-05 14:59:56 -05:00
  • ccc6243371 Refactor llama-run to split out opt struct Eric Curtin 2025-01-09 01:14:39 +00:00
  • 64b2ec8c75 fix: Vulkan shader gen binary path when not cross compiling Gilad S 2025-01-09 00:31:29 +02:00
  • 937b81ffd8 llama-chat : add phi 4 template Xuan Son Nguyen 2025-01-08 23:08:54 +01:00
  • 8d59d91171
    fix: add missing msg in static_assert (#11143) b4450 hydai 2025-01-09 04:03:28 +08:00
  • 8a1d9c25fa
    gguf-py : move scripts directory (#11116) gguf-v0.14.0 Vinesh Janarthanan 2025-01-08 12:54:58 -06:00
  • 5fdc8029ae
    fix: add missing msg in static_assert hydai 2025-01-09 02:51:58 +08:00
  • 1bf839b1e8
    Enhance user input handling for llama-run (#11138) Eric Curtin 2025-01-08 18:47:05 +00:00
  • 799f149ed9
    empty commit - trigger ci Vinesh Janarthanan 2025-01-08 12:46:12 -06:00
  • 90a478bd4d Package linux cuda releases for various caps Olivier Chafik 2025-01-08 17:50:26 +00:00
  • f8c6f0af38
    retrigger ci Vinesh Janarthanan 2025-01-08 10:24:55 -06:00
  • 828a19de1b Enhance user input handling for llama-run Eric Curtin 2025-01-08 14:13:09 +00:00
  • f7cd13301c
    ci : use actions from ggml-org (#11140) b4447 Xuan Son Nguyen 2025-01-08 16:09:20 +01:00
  • f662391f02 ci : use actions from ggml-org Xuan Son Nguyen 2025-01-08 16:02:58 +01:00
  • 4d2b3d8804
    lora : improve compat with mergekit-extract-lora (#11131) b4446 Xuan Son Nguyen 2025-01-08 15:59:53 +01:00
  • c07d437bbd
    llama : avoid hardcoded QK_K (#11061) b4445 Georgi Gerganov 2025-01-08 16:19:36 +02:00
  • a1f82956f7 add some hints Xuan Son Nguyen 2025-01-08 15:14:19 +01:00
  • ed10ff58a6 Fix: Adds missing reference to maintain_cuda_graph() definition. Andreas Kieslinger 2025-01-08 14:06:07 +00:00
  • eb3ea69850 Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability. Andreas Kieslinger 2025-01-08 13:47:14 +00:00
  • 22c2429496 Refactor: Moves cuda graph update check to separate function. Andreas Kieslinger 2025-01-08 13:18:18 +00:00
  • ba0533100d Refactor: Moves cuda graph executable update step to separate function. akieslinger 2024-12-19 10:40:19 +01:00
  • 99a3755a3c
    sync : ggml Georgi Gerganov 2025-01-08 13:40:30 +02:00
  • c792dcf488
    ggml : allow loading backend with env variable (ggml/1059) b4443 Radoslav Gerganov 2025-01-05 09:50:37 +02:00
  • 65a431dbbc Merge branch 'master' into xsn/mergekit_extract_lora_compat Xuan Son Nguyen 2025-01-08 12:09:30 +01:00
  • 80ccf5d725
    ci : pin dependency to specific version (#11137) Xuan Son Nguyen 2025-01-08 12:07:20 +01:00
  • f2ef9dc23d will this fix ec? Xuan Son Nguyen 2025-01-08 12:03:38 +01:00
  • a3c1232c3f
    arg : option to exclude arguments from specific examples (#11136) Georgi Gerganov 2025-01-08 12:55:36 +02:00
  • 8cef75c743
    llamafile : ppc64le MMA INT8 implementation (#10912) b4440 amritahs-ibm 2025-01-08 16:24:19 +05:30
  • 40df0eb434 ci : pin dependency to specific version Xuan Son Nguyen 2025-01-08 11:48:32 +01:00
  • f564e0212e correct norm name & condition Xuan Son Nguyen 2025-01-08 11:36:26 +01:00
  • 0d52a69e4b
    ci : fix cmake option (#11125) b4439 Georgi Gerganov 2025-01-08 11:29:34 +02:00
  • d0b6102f0b
    readme : remove old args [no ci] Georgi Gerganov 2025-01-08 11:28:10 +02:00
  • 41ecc246b1
    arg : option to exclude arguments from specific examples Georgi Gerganov 2025-01-08 11:23:09 +02:00
  • 02f0430141
    Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117) b4438 Mathieu Baudier 2025-01-08 09:18:13 +01:00
  • bec2183f2c
    fix: Vulkan shader gen binary path when Cross-compiling (#11096) b4437 ag2s20150909 2025-01-08 16:17:29 +08:00
  • 1301f81134 bump pypi gguf to v0.14.0 VJHack 2025-01-08 00:19:10 -06:00
  • 5c82e1f37b fixed README urls VJHack 2025-01-08 00:18:14 -06:00
  • dc5b790140 llamafile_sgemm API - INT8 implementation Amrita H S 2024-12-20 01:20:21 -05:00
  • 11e0c733ac Merge branch 'master' into xsn/mergekit_extract_lora_compat Xuan Son Nguyen 2025-01-07 22:55:22 +01:00
  • 0615cdd7a4 correct comment Xuan Son Nguyen 2025-01-07 22:55:04 +01:00
  • b37af1424a use lora->get_scale Xuan Son Nguyen 2025-01-07 22:32:26 +01:00
  • e444b8e0c2 support mergekit-extract-lora Xuan Son Nguyen 2025-01-07 22:03:06 +01:00
  • 30645aad85 vulkan: optimize coopmat2 q2_k dequant function Jeff Bolz 2025-01-07 14:52:02 -06:00
  • 53ff6b9b9f
    GGUF: C++ refactor, backend support, misc fixes (#11030) Johannes Gäßler 2025-01-07 18:01:58 +01:00
  • 1c69b0eaba llama-bench : whitespace formatting Stanisław Szymczyk 2025-01-07 17:47:53 +01:00
  • 4ee662fca1 const methods Johannes Gäßler 2025-01-07 17:11:02 +01:00
  • e7ff4506fe
    ci : fix cmake option Georgi Gerganov 2025-01-07 18:07:53 +02:00
  • bb6569ee5e llama-bench : add -gp <pp,tg> test measuring token generation rate at given prompt length Stanisław Szymczyk 2025-01-07 17:03:15 +01:00
  • 017cc5f446
    ggml-backend : only offload from host buffers (fix) (#11124) b4435 Diego Devesa 2025-01-07 16:11:57 +01:00
  • 2717f2eadc Remove unnecessary #ifdef directive Mathieu Baudier 2025-01-07 16:09:16 +01:00
  • 573284f2c8 Perform Vulkan extensions checks in a more sensible order Mathieu Baudier 2025-01-07 16:07:54 +01:00
  • 339ee31d2f ggml-backend : only offload from host buffers (fix) slaren 2025-01-07 15:36:28 +01:00
  • a3d50bc022
    ggml-backend : only offload from host buffers (#11120) b4434 Diego Devesa 2025-01-07 12:38:05 +01:00
  • 6131ffdf28 ggml-backend : only offload from host buffers slaren 2025-01-07 12:15:42 +01:00
  • aed0afb408
    Update ggml/src/ggml-cuda/gla.cu Molly Sophia 2025-01-07 17:00:40 +08:00
  • 86319f60c3
    Update CMakeLists.txt ag2s20150909 2025-01-07 15:13:10 +08:00
  • a4dd490069
    rpc : code cleanup (#11107) b4433 Radoslav Gerganov 2025-01-07 08:37:02 +02:00
  • c0d6f790d0
    SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087) b4432 Akarshan Biswas 2025-01-07 11:56:07 +05:30
  • 94081b6fd8
    Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6 Akarshan Biswas 2025-01-07 11:17:37 +05:30
  • 198fc8c901
    Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6" Akarshan Biswas 2025-01-07 11:15:51 +05:30
  • 24aad1e173 updated readme VJHack 2025-01-06 21:32:07 -06:00
  • 3886e22509 Moved scripts dir and fixed pyproject.toml VJHack 2025-01-06 21:13:18 -06:00
  • 9a34877739
    Merge a362c74aa2 into dc7cef9f37 Max Krasnyansky 2025-01-06 16:20:00 -08:00
  • 93fbfd022c (wip) support mergekit-extracted lora Xuan Son Nguyen 2025-01-07 00:35:16 +01:00
  • dc7cef9f37
    llama-run : fix context size (#11094) b4431 Eric Curtin 2025-01-06 22:45:28 +00:00
  • 2c293edad9 Disable GL_KHR_cooperative_matrix Vulkan extension if not available. Mathieu Baudier 2025-01-06 17:48:39 +01:00
  • ecebbd292d
    llama : remove unused headers (#11109) b4430 Georgi Gerganov 2025-01-06 17:52:35 +02:00
  • 96be8c3264
    github : add cmd line field to bug report (#11090) Xuan Son Nguyen 2025-01-06 16:34:49 +01:00
  • bbb17be39d
    llama : remove unused headers Georgi Gerganov 2025-01-06 15:55:44 +02:00
  • e6e7c75d94
    server : fix extra BOS in infill endpoint (#11106) b4428 Georgi Gerganov 2025-01-06 15:36:08 +02:00
  • 3070be1776 rpc : code cleanup Radoslav Gerganov 2025-01-06 15:14:50 +02:00
  • 1fde710ceb xxx xunchan 2025-01-06 21:00:26 +08:00
  • fd662c6dc0 xxx xunchan 2025-01-06 20:55:37 +08:00
  • fe8caafea0
    server : update infill tests Georgi Gerganov 2025-01-06 14:55:36 +02:00
  • 5ddc13309c xxx xunchan 2025-01-06 20:53:06 +08:00
  • 011baa4036
    server : fix extra BOS in infill endpoing Georgi Gerganov 2025-01-06 14:45:56 +02:00