Commit graph

  • 524acb4279
    use sycl printf over fprintf Abhilash Majumder 2024-12-12 14:48:23 +05:30
  • 14f64dab74
    Merge branch 'ggerganov:master' into cuda-build-doc Yann Follet 2024-12-12 17:15:04 +08:00
  • ba661a4df5
    add a stdout for unsupported op Abhilash Majumder 2024-12-12 13:36:42 +05:30
  • 1c40582610 Also disable coopmats on amdvlk 0cc4m 2024-12-12 07:58:19 +00:00
  • 9131c592f7 Fix subgroup size control extension support check 0cc4m 2024-12-12 07:01:07 +00:00
  • 5a9d9f7600 Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats 0cc4m 2024-12-08 14:41:40 +00:00
  • ffd7c1d04c
    cleanup spaces Abhilash Majumder 2024-12-12 13:13:15 +05:30
  • 46bcfe4c30
    Remove TODO Akarshan Biswas 2024-12-12 12:58:11 +05:30
  • 90fe556e6d
    SYCL: remove extra empty lines and a comment Akarshan Biswas 2024-12-12 12:54:36 +05:30
  • 8dfac46dad
    SYCL: Use GGML_UNUSED for unused variables Akarshan Biswas 2024-12-12 12:32:00 +05:30
  • 6b6f756bf9
    Merge branch 'ggerganov:master' into vulkan_llvmpipe Eve 2024-12-12 03:57:19 +00:00
  • 064360aa00 ensure mul mat shaders work on systems with subgroup size less than 32 Eve 2024-12-11 21:50:24 -05:00
  • ce70d910d9 docs: update server streaming mode documentation (#9519) CentricStorm 2024-12-11 22:40:40 +00:00
  • 3b2601d07c Enable rule key ordering for grammars ParthSareen 2024-12-11 17:20:41 -08:00
  • 5555c0c1f6
    docs: update server streaming mode documentation (#9519) CentricStorm 2024-12-11 22:40:40 +00:00
  • 40c07240cb docs: update server streaming mode documentation CentricStorm 2024-09-17 05:33:44 +01:00
  • 973f328b1e
    Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0 Georgi Gerganov 2024-12-11 23:14:46 +02:00
  • fb18934a97
    gguf-py : bump version to 0.11.0 gguf-v0.11.0 gg/gguf-py-0.11.0 Georgi Gerganov 2024-12-11 23:13:31 +02:00
  • 235f6e14bf
    server : (UI) add tok/s, get rid of completion.js (#10786) gguf-py gguf ggu Xuan Son Nguyen 2024-12-11 20:52:14 +01:00
  • e4aca8845f fix auto scroll Xuan Son Nguyen 2024-12-11 20:36:45 +01:00
  • ab1f7e0326 only extract timings when it's enabled Xuan Son Nguyen 2024-12-11 20:32:24 +01:00
  • 10f773415c fix BASE_URL Xuan Son Nguyen 2024-12-11 19:38:10 +01:00
  • 4219698eb0 sync Xuan Son Nguyen 2024-12-11 19:31:27 +01:00
  • dd09094a22 add tok/s info Xuan Son Nguyen 2024-12-11 19:28:13 +01:00
  • 95e294b19d extract chat bubble to a component Xuan Son Nguyen 2024-12-11 18:55:05 +01:00
  • bd2f59e50a get rid of completion.js Xuan Son Nguyen 2024-12-11 17:56:43 +01:00
  • 1a31d0dc00
    Update README.md (#10772) qingy1337 2024-12-11 07:16:32 -08:00
  • 92f77a640f
    ci : pin nodejs to 22.11.0 (#10779) Xuan Son Nguyen 2024-12-11 14:59:41 +01:00
  • 484d2f31ae
    bug-fix: snprintf prints NULL in place of the last character (#10419) b4304 kallewoof 2024-12-11 22:48:04 +09:00
  • 7828013689 update docs Xuan Son Nguyen 2024-12-11 14:47:49 +01:00
  • 74dc729c0b server : fix logprobs, make it openai-compatible Xuan Son Nguyen 2024-12-11 14:38:57 +01:00
  • 4b4d92b098
    docs: fix server documentation formatting (#10776) CentricStorm 2024-12-11 10:47:43 +00:00
  • 39b4c47b01
    SYCL: clean comments and variables step 3 Akarshan Biswas 2024-12-11 16:10:36 +05:30
  • 8f123ae71d
    SYCL: clean comments step 2 Akarshan Biswas 2024-12-11 15:58:40 +05:30
  • d7edc55003
    Merge branch 'master' into qwen2-vl HimariO 2024-12-11 17:42:21 +08:00
  • 53c8765fb2 ci : pin nodejs to 22.11.0 Xuan Son Nguyen 2024-12-11 08:54:56 +01:00
  • 7006dd784c server: Propagate standby_timeout after it has been initialized johannes 2024-12-11 08:41:51 +01:00
  • 4fd58a8013 server: Initialize standby_timeout over constructor instead of passing as argument johannes 2024-12-11 08:33:24 +01:00
  • acbac00f0d server: Return shutdown_handler to its initial state and use running = false for termination johannes 2024-12-11 08:32:12 +01:00
  • cb0daca00b
    SYCL: wkv6 remove a comment Akarshan Biswas 2024-12-11 11:29:04 +05:30
  • b0e27ad9ec
    SYCL gemm.hpp: use const cast to properly support dnnl::memory Akarshan Biswas 2024-12-11 11:27:13 +05:30
  • 274842d976
    SYCL gemm.hpp: remove pragma directives Akarshan Biswas 2024-12-11 11:11:03 +05:30
  • 5a766c12ae
    Merge branch 'master' into refactor Akarshan Biswas 2024-12-11 11:08:54 +05:30
  • cc7cd62ee7
    SYCL poo2d kernel: set NAN for invalid pooling op Akarshan Biswas 2024-12-11 11:07:32 +05:30
  • 42eec5d60e docs: fix server documentation formatting CentricStorm 2024-12-11 05:04:20 +00:00
  • 7dda9aad23
    SYCL: remove the unused variables instead of commenting it out Akarshan Biswas 2024-12-11 08:54:19 +05:30
  • 4b5470fcdd
    ggml-sycl.cpp: fix some trailing whitespaces Akarshan Biswas 2024-12-11 08:46:57 +05:30
  • 8564c35ac4
    Update README.md qingy1337 2024-12-10 17:41:49 -08:00
  • 9fdf8ad826 add constexpr and static assert lihan 2024-12-11 09:38:33 +08:00
  • 618708c549
    Update ggml/src/ggml-cuda/concat.cu Diego Devesa 2024-12-11 02:25:48 +01:00
  • f8a5b04441
    Use a lambda to avoid code duplication a3sh 2024-12-11 09:14:30 +08:00
  • 43041d2eb3
    ggml: load all backends from a user-provided search path (#10699) b4302 Gilad S. 2024-12-11 02:47:21 +02:00
  • e950fe63ae fix: change NULL to nullptr Gilad S 2024-12-11 02:14:13 +02:00
  • c4b78a035e
    fix: change NULL to nullptr Gilad S. 2024-12-11 02:09:30 +02:00
  • 51b9545fcd Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default Charles Darke 2024-12-10 22:46:43 +01:00
  • 4f3a7e279b Force max subgroup size for coopmat shaders 0cc4m/vulkan-subgroup-size-control-amd 0cc4m 2024-12-10 20:27:04 +00:00
  • b685daf386
    vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) b4301 Jeff Bolz 2024-12-10 14:23:17 -06:00
  • 2dc175fb2b Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats 0cc4m 2024-12-08 14:41:40 +00:00
  • dafae66cc2
    vulkan: dynamic subgroup size for the remaining k quants (#10745) b4300 Eve 2024-12-10 19:33:23 +00:00
  • 140ef267b5 vulkan: request round-to-even for fp16 in im2col/rope_head Jeff Bolz 2024-12-10 11:59:24 -06:00
  • f5f3fca063 Add chat template Billel Mokeddem 2024-12-10 17:36:39 +00:00
  • 5d31c23f5e Use llama vocab Billel Mokeddem 2024-12-10 17:24:57 +00:00
  • ae4b922614
    imatrix : Add imatrix to --no-context-shift (#10766) b4299 Bartowski 2024-12-10 12:23:50 -05:00
  • 750cb3e246
    CUDA: rename macros to avoid conflicts with WinAPI (#10736) b4298 Andreas Kieslinger 2024-12-10 18:23:24 +01:00
  • a86ad841f1
    server : add flag to disable the web-ui (#10762) (#10751) b4297 Yüg 2024-12-10 17:22:34 +00:00
  • a05e2afcc2
    vulkan: disable spirv-opt for coopmat shaders (#10763) b4296 Jeff Bolz 2024-12-10 11:22:20 -06:00
  • e6f42b185d server : add flag to disable the web-ui (#10762) eugenio.segala 2024-12-10 10:35:57 +00:00
  • 135b3d683b
    Add imatrix to --no-context-shift Bartowski 2024-12-10 11:33:16 -05:00
  • 1bf38cffdf server/bench: - support openAI streaming standard output with [DONE]\n\n - export k6 raw results in csv - fix too many tcp idle connection in tcp_wait - add metric time to emit first token Pierrick HYMBERT 2024-12-10 17:18:16 +01:00
  • 98207838f8 vulkan: disable spirv-opt for coopmat shaders Jeff Bolz 2024-12-10 08:44:14 -06:00
  • 9129362c6f
    SYCL softmax: Initialize nreduce as size_t Akarshan Biswas 2024-12-10 20:08:19 +05:30
  • fe5afd4a2d
    SYCL: mmq add condition to prevent blocks_per_tile_x_row variable from becoming 0 Akarshan Biswas 2024-12-10 20:00:52 +05:30
  • 71d84a5eaf
    [SYCL] Remove pragma directives from mmq.cpp Akarshan Biswas 2024-12-10 18:47:13 +05:30
  • 32164aa48d
    Initialize nreduce as size_t Akarshan Biswas 2024-12-10 17:28:48 +05:30
  • fb2e66e825
    add a newline at the end of the file Akarshan Biswas 2024-12-10 17:12:13 +05:30
  • e4189e3188 faster uncontiguous concat lihan 2024-12-10 17:08:36 +08:00
  • a708dfc587
    Reduce compiler warnings step 2 Akarshan Biswas 2024-12-10 13:07:18 +05:30
  • 3930184d14
    Try to reduce some unused and typecast warnings Akarshan Biswas 2024-12-09 13:45:16 +05:30
  • 5962b506ba
    Update omni-audio cmake content to make it static (#36) T 2024-12-10 14:50:28 +08:00
  • 8ee6beea08 revert as multi row isnt faster for k quants Eve 2024-12-09 21:35:13 -05:00
  • c2aa654ad6 q5_k Eve 2024-12-09 15:00:58 -05:00
  • c24778df01 add verbosity -1 to log token, so can output only tokens with -lv -1 Yann Follet 2024-12-10 01:53:49 +00:00
  • f70b5148e1 Merge branch 'cuda-build-doc' of https://github.com/YannFollet/llama.cpp into cuda-build-doc Yann Follet 2024-12-10 01:35:13 +00:00
  • de1bb5a4ad Add build command for CUDA with path example Yann Follet 2024-12-10 01:32:10 +00:00
  • 93a5245b0e tool-calls: migrate tests to pytest ochafik 2024-12-10 01:11:08 +00:00
  • a4108f59bd server: Adhere to naming conventions for shutdown_reasons johannes 2024-12-09 23:55:51 +01:00
  • 4fd985af91 server: Update README to include standby-timeout johannes 2024-12-09 23:55:36 +01:00
  • 0468a01c9c server: Improve wording to make clear that standby-timeout is measured in seconds johannes 2024-12-09 23:55:22 +01:00
  • 9a8df14d5c server: Add standby-timeout johannes 2024-12-09 22:56:27 +01:00
  • c0df25a1b6 ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemm_q4_0_4x4_q8_0() Adrien Gallouët 2024-12-02 12:06:48 +00:00
  • 26a8406ba9
    CUDA: fix shared memory access condition for mmv (#10740) b4295 Johannes Gäßler 2024-12-09 20:07:12 +01:00
  • 6768787dd3 CUDA: fix shared memory access condition for mmv Johannes Gäßler 2024-12-09 19:29:22 +01:00
  • c37fb4cf62
    Changes to CMakePresets.json to add ninja clang target on windows (#10668) Srihari-mcw 2024-12-09 23:10:19 +05:30
  • ca966256e1
    Update docs/build.md Max Krasnyansky 2024-12-09 09:39:38 -08:00
  • bdfc59ae7d
    Update docs/build.md Max Krasnyansky 2024-12-09 09:39:33 -08:00
  • dd01e1ef95 Add .cmake file for x64-windows-llvm Srihari-mcw 2024-12-09 20:45:06 +05:30
  • 6ef2db420f Remove additional whitespaces Srihari-mcw 2024-12-09 20:42:43 +05:30
  • 1df36678ab Update with .cmake file Srihari-mcw 2024-12-09 20:40:25 +05:30
  • b24ab8634f add m-rope testcase to test-backend-ops HimariO 2024-12-09 22:13:11 +08:00
  • 3ba7664de9 minor updates on debug util, bug fixs HimariO 2024-12-09 22:12:30 +08:00