Commit graph

  • e3acca3acd Add get_max_size to SYCL backend. 0cc4m 2024-01-28 17:23:42 +01:00
  • fceae56a86 Replace static_cast with function-style cast Michael Klimenko 2024-01-28 17:20:01 +01:00
  • 8612864108
    ggml : fix f16 mad Georgi Gerganov 2024-01-28 18:10:16 +02:00
  • 9c4c15add8
    Merge branch 'master' into vulkan Georgi Gerganov 2024-01-28 18:00:49 +02:00
  • 0f648573dd
    ggml : add unified SYCL backend for Intel GPUs (#2690) b1995 Abhilash Majumder 2024-01-28 21:26:23 +05:30
  • 3a428a1097
    metal : improve precision Georgi Gerganov 2024-01-28 17:47:22 +02:00
  • fcfdc56cc2 Add comment regarding fixed UB Michael Klimenko 2024-01-28 16:43:11 +01:00
  • b764b8f1d0
    flake.lock: Update (#5162) Georgi Gerganov 2024-01-28 16:54:54 +02:00
  • ecc466a460
    metal : add tests, fix scaling, support C > 32 Georgi Gerganov 2024-01-28 15:42:57 +02:00
  • 77f6976a87
    metal : move output into local memory + optimize Georgi Gerganov 2024-01-28 13:15:00 +02:00
  • 855645a023 Spacing fix Paul Tsochantaris 2024-01-28 11:23:13 +00:00
  • c846c451d1 Merge branch 'master' into metal-memory-reduction Paul Tsochantaris 2024-01-28 11:22:09 +00:00
  • 1b592aa897 Keeping the ggml_metal_kernel structure Paul Tsochantaris 2024-01-28 11:21:57 +00:00
  • 68cfcd4711 Faster iq3_xxs and iq2_xs dot products on CUDA Iwan Kawrakow 2024-01-28 12:24:47 +02:00
  • c35004b90d Add review fixes Michael Klimenko 2024-01-28 10:32:27 +01:00
  • 9241c3a2ac
    Apply min_p to unsorted tokens (#5115) b1993 Johannes Gäßler 2024-01-28 09:59:49 +01:00
  • b3dd7d975f
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-01-28 10:53:16 +02:00
  • 3201a6df2c add opecnl op func and call ShuyRoy 2024-01-28 16:36:58 +08:00
  • 3f1b793f0c Apply min_p to unsorted tokens JohannesGaessler 2024-01-24 16:43:39 +01:00
  • b2b2bf988c
    Tests for min_p, sampling queue (#5147) b1992 Johannes Gäßler 2024-01-28 09:35:14 +01:00
  • af4980bfed
    readme : add link to rust bindings (#5148) Marcus Dunn 2024-01-28 00:30:44 -08:00
  • fb75fc04e1
    scripts : parse wtype in server-llm.sh Georgi Gerganov 2024-01-28 10:06:38 +02:00
  • f2e69d28c0
    llama : add support for Orion-14B (#5118) b1990 sharpHL 2024-01-28 16:00:30 +08:00
  • f514e6795f
    Update llama.cpp Georgi Gerganov 2024-01-28 10:00:15 +02:00
  • 5918c9855e
    Update llama.cpp Georgi Gerganov 2024-01-28 09:58:56 +02:00
  • 39baaf55a1
    docker : add server-first container images (#5157) b1989 Kyle Mistele 2024-01-28 01:55:31 -06:00
  • 51cde1935e iq3_xxs: slightly better grid points Iwan Kawrakow 2024-01-28 08:34:35 +02:00
  • d394ca7f3c update the cmd example jianyuzh 2024-01-28 09:15:57 +08:00
  • ef130243b8 flake.lock: Update github-actions[bot] 2024-01-28 00:17:14 +00:00
  • 6ba560d4ce Restore another missing cast Michael Klimenko 2024-01-28 00:35:25 +01:00
  • 80aab2e4ed Fix MacOS build Michael Klimenko 2024-01-28 00:30:23 +01:00
  • e258f2943a Merge branch 'master' into metal-memory-use-reduction Paul Tsochantaris 2024-01-27 22:43:15 +00:00
  • 266349ae89 Releasing MTLFunction references after Metal pipeline construction Paul Tsochantaris 2024-01-27 22:43:01 +00:00
  • 4e067a0b6d Restore bind to lambda, requires C++14 Michael Klimenko 2024-01-27 22:45:23 +01:00
  • 0f18ada7bb Remove trailing whitespace Michael Klimenko 2024-01-27 22:37:07 +01:00
  • 55b008cdec Add additional fixes Michael Klimenko 2024-01-27 22:29:31 +01:00
  • 48ad459efc Simplify context use, optimize matmul shader for warp size 64 (AMD GCN), fix split_k matmul shader optimization 0cc4m 2024-01-27 21:36:48 +01:00
  • e41d94972c Replace size check with empty Michael Klimenko 2024-01-27 21:23:27 +01:00
  • c3fe181c44 Add fixes to reduce the amount of warnings Michael Klimenko 2024-01-27 21:17:15 +01:00
  • 734cf1096b fix(doc): update container tag from server to server-cuda for README example on running server container with CUDA Kyle Mistele 2024-01-27 12:03:29 -06:00
  • 7298e97947 doc: update n-gpu-layers to show correct GPU usage Kyle Mistele 2024-01-27 11:53:49 -06:00
  • 2455a8d6c3 update impl FSSRepo 2024-01-27 12:23:40 -05:00
  • 40f5570f7f
    Merge branch 'ggerganov:master' into Orion-14B-support sharpHL 2024-01-28 01:18:45 +08:00
  • 97fbb22a5b
    Update llama.cpp sharpHL 2024-01-28 00:57:45 +08:00
  • 154930231d iq3_xxs: ARM_NEON and Metal Iwan Kawrakow 2024-01-27 18:56:11 +02:00
  • 530462550d kompute : use llama_backend_init/llama_backend_free to manage device Jared Van Bortel 2024-01-27 11:55:32 -05:00
  • 82f5d5697b
    Update llama.cpp sharpHL 2024-01-28 00:51:04 +08:00
  • 7cea9735ab Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp into flash-attn-cuda FSSRepo 2024-01-27 11:38:20 -05:00
  • 050d450297 ci : do not run tests for Kompute (no GPU) Jared Van Bortel 2024-01-27 11:06:50 -05:00
  • 6db2b41a76
    llava : support for Yi-VL and fix for mobileVLM (#5093) b1988 John 2024-01-27 16:09:18 +01:00
  • 753eafed0e
    sync : ggml b1987 Georgi Gerganov 2024-01-27 16:59:20 +02:00
  • e976423005
    ggml : check ggml_add src1 type (ggml/708) Judd 2024-01-26 21:04:01 +08:00
  • 35a2ee9143
    Remove unused data and add fixes (#5154) b1985 Michael Klimenko 2024-01-27 15:25:55 +01:00
  • c3b2029698 iq3_xxs: scalar and AVX2 dot products Iwan Kawrakow 2024-01-27 15:48:20 +02:00
  • ec903c0341
    server : add self-extend support (#5104) b1984 Maximilian Winter 2024-01-27 14:38:05 +01:00
  • f7c0e043de Update server.cpp Maximilian Winter 2024-01-27 14:15:36 +01:00
  • 826e6dcad9 Update server.cpp Maximilian Winter 2024-01-27 14:13:34 +01:00
  • f1875b0a93 iq3_xxs: CUDA dot product Iwan Kawrakow 2024-01-27 14:58:32 +02:00
  • f120672964 iq3_xxs: starting to look better Iwan Kawrakow 2024-01-27 13:59:49 +02:00
  • 220f91735a
    Merge branch 'ggerganov:master' into Orion-14B-support sharpHL 2024-01-27 19:49:57 +08:00
  • aac36f92ce
    Update llama.cpp sharpHL 2024-01-27 19:20:28 +08:00
  • db44ddf819
    Update llama.cpp sharpHL 2024-01-27 19:20:02 +08:00
  • 0185aa7440
    Update llama.cpp sharpHL 2024-01-27 19:19:51 +08:00
  • d56cecc2cd
    Update examples/server/server.cpp Maximilian Winter 2024-01-27 11:47:27 +01:00
  • 67096a3539
    Update examples/server/server.cpp Maximilian Winter 2024-01-27 11:47:19 +01:00
  • 26f95fb079
    server : formatting Georgi Gerganov 2024-01-27 12:39:33 +02:00
  • 90faca24fb iq2_xxs: tuning quantization Iwan Kawrakow 2024-01-27 12:32:55 +02:00
  • bf9349c610 iq3_xxs: CUDA dequantize works Iwan Kawrakow 2024-01-27 11:34:52 +02:00
  • 8524d277ec iq3_xxs: quantize/dequantize Iwan Kawrakow 2024-01-27 11:12:58 +02:00
  • 9d7b7e686c Changed descriptions Maximilian Winter 2024-01-27 07:33:11 +01:00
  • 8f20af4ac6 Update README.md Maximilian Winter 2024-01-27 07:24:27 +01:00
  • cb96a91f7c Update server.cpp Maximilian Winter 2024-01-27 07:06:08 +01:00
  • 5e498be648 doc: add information about running with docker to the server README Kyle Mistele 2024-01-27 00:00:30 -06:00
  • d6b3755102 doc: add information about running the server with Docker to README.md Kyle Mistele 2024-01-27 00:00:06 -06:00
  • eac72b08d2 feat: update .github/workflows/docker.yml to build server-first docker containers Kyle Mistele 2024-01-26 23:49:50 -06:00
  • 839592dcd5 feat: add Dockerfiles for each platform that user ./server instead of ./main Kyle Mistele 2024-01-26 23:46:29 -06:00
  • aa41b22f26 Update server.cpp Maximilian Winter 2024-01-27 06:41:50 +01:00
  • c8bc9297c0 Update server.cpp Maximilian Winter 2024-01-27 06:30:07 +01:00
  • aa14068a2b Update server.cpp Maximilian Winter 2024-01-27 06:18:40 +01:00
  • e4cf6867f5 Merge remote-tracking branch 'upstream/master' Maximilian Winter 2024-01-27 02:57:23 +01:00
  • 0a481fe1a9 integrate tensor cores FSSRepo 2024-01-26 20:14:02 -05:00
  • e6edd44d5e ci : attempt to fix Vulkan installer path Jared Van Bortel 2024-01-26 19:36:49 -05:00
  • 4df0e88aed Added description to server readme. Maximilian Winter 2024-01-27 00:37:53 +01:00
  • f0395ac4da Replace the scope of vq allocation Michael Klimenko 2024-01-27 00:35:32 +01:00
  • cc563aaca0 Address review comments Michael Klimenko 2024-01-27 00:32:29 +01:00
  • d745a36b30 Add missing file Michael Klimenko 2024-01-26 23:39:18 +01:00
  • 3349d4ec1d Remove unused data and add fixes Michael Klimenko 2024-01-26 23:32:31 +01:00
  • 4b0c96a9e2 kompute : adapt ggml-kompute API to be compatible with C Jared Van Bortel 2024-01-26 17:16:25 -05:00
  • a1d6df129b
    Add OpenCL add kernel (#5151) b1983 0cc4m 2024-01-26 23:07:32 +01:00
  • 960cfb003f Update server.cpp Maximilian Winter 2024-01-26 22:54:26 +01:00
  • edc2c08943 Fixed prompt caching without self extend Maximilian Winter 2024-01-26 22:53:34 +01:00
  • 57cecad175 main : remove ggml-kompute.h #include Jared Van Bortel 2024-01-26 16:37:33 -05:00
  • 91324851a3 ci : initial attempt at testing Kompute backend Jared Van Bortel 2024-01-26 16:36:31 -05:00
  • 1f32360659 Merge remote-tracking branch 'upstream/master' Maximilian Winter 2024-01-26 22:11:26 +01:00
  • 297fde5f58 editorconfig-checker : exclude .gitmodules Jared Van Bortel 2024-01-26 15:48:35 -05:00
  • 454baebacc op_mul_mat_mat_f32.comp : fix missing final newline Jared Van Bortel 2024-01-26 15:44:13 -05:00
  • bbe7c56c99
    cmake : pass CPU architecture flags to nvcc (#5146) b1982 Jared Van Bortel 2024-01-26 15:34:06 -05:00
  • a5d7765a20 cmake : pass ARCH_FLAGS to -Xcompiler on MSVC Jared Van Bortel 2024-01-26 15:24:29 -05:00
  • cdab4043b3 kompute : fix #includes Jared Van Bortel 2024-01-26 15:08:31 -05:00
  • f6f540e1bd Put add kernel into different string to stay within MSVC string length limit, disable float16 support due to bad results 0cc4m 2024-01-26 21:09:23 +01:00