Commit graph

  • 7df6cc5577 Make updates to rename preset targets Srihari-mcw 2024-12-06 19:29:08 +05:30
  • f162d45a21
    common : bring back --no-warmup to server (#10686) b4276 Xuan Son Nguyen 2024-12-06 13:29:05 +01:00
  • 75499af292 common : bring back --no-warmup to server Xuan Son Nguyen 2024-12-06 11:20:17 +01:00
  • 6c5bc0625f
    server : (refactoring) do not rely on JSON internally (#10643) Xuan Son Nguyen 2024-12-06 11:14:32 +01:00
  • 25be4ccc89 update docs Xuan Son Nguyen 2024-12-06 10:47:52 +01:00
  • e1d17a20e7 Fix rebase typo 0cc4m 2024-12-06 07:32:22 +00:00
  • 934a2c9f00
    Merge 26aac8e289 into 7736837d62 Nexes the Elder 2024-12-06 14:27:02 +08:00
  • 70a78fb934
    Merge ed78de20e4 into 7736837d62 Nexes the Elder 2024-12-06 14:26:52 +08:00
  • 7734012e4f
    Merge 6a262b62f0 into 7736837d62 Uglješa Lukešević 2024-12-06 14:25:23 +08:00
  • e3ad85d258
    Merge 2b2ab6fb6f into 7736837d62 Nexes the Elder 2024-12-06 14:25:15 +08:00
  • 5309f15b35
    Merge a5452db6dd into 7736837d62 Zhenwei Jin 2024-12-06 14:25:08 +08:00
  • 52e281e78e
    Merge fb93f70533 into 7736837d62 Kalab Yibeltal Assefa 2024-12-06 08:24:54 +05:30
  • 0b1b7c85b5
    Merge branch 'ggerganov:master' into vulkan Eve 2024-12-06 02:51:35 +00:00
  • 1fd5f1af08 Update README.md ochafik 2024-12-06 02:16:12 +00:00
  • cbe395d87f minja: remove tests (now in https://github.com/google/minja) ochafik 2024-12-06 02:12:21 +00:00
  • a469f536c0 agent: update readme ochafik 2024-12-06 01:56:07 +00:00
  • 30fbcb2315 agent: more robust squid config ochafik 2024-12-06 01:55:51 +00:00
  • 1afa31289d Merge remote-tracking branch 'origin/master' into tool-call ochafik 2024-12-06 01:21:06 +00:00
  • 0bd133ddc4
    Merge 68ca77a148 into 7736837d62 cocochick 2024-12-06 07:23:12 +08:00
  • dfa59b908f fix unwanted recursive call Xuan Son Nguyen 2024-12-05 23:43:48 +01:00
  • db66153d92 naming oai_compat --> oaicompat Xuan Son Nguyen 2024-12-05 23:34:24 +01:00
  • ffc4441b1d remove virtual for to_json_oai_compat() Xuan Son Nguyen 2024-12-05 23:29:27 +01:00
  • 0a2be72dca add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack Djip007 2024-12-05 04:24:45 +01:00
  • 3a042b4872 Clean Q4_0_N_M ref Djip007 2024-12-04 18:18:35 +01:00
  • 95322e93bf clang-format Djip007 2024-12-03 00:09:44 +01:00
  • 98ea414f81 reformat extra cpu backend. Djip007 2024-11-19 01:32:43 +01:00
  • 9ac05c113d rename ggml-cpu-aarch64.c to .cpp Djip007 2024-12-01 16:56:12 +01:00
  • 4c3d2580b2 small clean up Xuan Son Nguyen 2024-12-05 23:16:27 +01:00
  • fb4b9be602 fix model_alias and completion_probabilities Xuan Son Nguyen 2024-12-05 23:13:06 +01:00
  • 7736837d62
    fix(server) : not show alert when DONE is received (#10674) Plamen Minev 2024-12-05 23:36:41 +02:00
  • a43e1dc66c apply review comments Xuan Son Nguyen 2024-12-05 22:35:07 +01:00
  • adc673c355 agent: add --think "tool", default to local tools endpoint, support --temperature, fix --seed ochafik 2024-12-05 21:32:08 +00:00
  • c66d00806a Add environment variable GGML_VK_DISABLE_COOPMAT to disable VK_KHR_cooperative_matrix support 0cc4m 2024-12-05 20:56:17 +00:00
  • f54afb4f12 Remove redundant checks 0cc4m 2024-12-05 17:57:19 +00:00
  • 3a58a0159b Disable coopmat support on AMD proprietary driver 0cc4m 2024-12-04 16:04:04 +00:00
  • 13a068e261 Vulkan: Add VK_AMD_shader_core_properties2 support to read Compute Unit count for split_k logic 0cc4m 2024-12-03 20:23:06 +00:00
  • 56c67df5fd Vulkan: Unroll more loops for more mul mat mat performance 0cc4m 2024-12-03 19:52:36 +00:00
  • 0cb02c015a Vulkan: Implement accumulator switch for specific mul mat mat shaders 0cc4m 2024-12-03 19:52:12 +00:00
  • f3936a4b02 Rework mulmat shader selection and compilation logic, avoid compiling shaders that won't get used by device 0cc4m 2024-12-01 20:57:28 +00:00
  • 412d5616f9 Add Vulkan MUL_MAT and MUL_MAT_ID accumulator precision selection 0cc4m 2024-12-01 19:26:58 +00:00
  • 2466205cf4
    Update llama-server-cuda.Dockerfile 流年 2024-12-06 04:33:54 +08:00
  • 21c18ee28b Improve performance with better q4_k and q5_k dequant and store unrolling 0cc4m 2024-11-30 18:59:04 +00:00
  • a2b5822c5c Vulkan: Implement VK_KHR_cooperative_matrix support in the matrix matrix multiplication shader 0cc4m 2024-11-18 06:01:31 +00:00
  • 5e7222e5ef
    Update llama-server-cuda.Dockerfile 流年 2024-12-06 03:49:21 +08:00
  • 1a791bcff2
    Update llama-server-cuda.Dockerfile 流年 2024-12-06 03:48:58 +08:00
  • c9c6e01dae
    vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206) b4273 Jeff Bolz 2024-12-05 13:15:05 -06:00
  • de5186908f
    Update llama-server-cuda.Dockerfile 流年 2024-12-06 03:00:41 +08:00
  • 9440faffd3 vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention Jeff Bolz 2024-10-14 11:08:43 -05:00
  • 6fe6247831
    llama : add Minerva 7B model support (#10673) b4272 Riccardo Orlando 2024-12-05 19:30:59 +01:00
  • bf01b18eaa Add Falcon3 support Billel Mokeddem 2024-12-05 16:25:47 +00:00
  • 2df2d52d15 metal : Extend how Llama.cpp locates metal resources (#10675) Robert Ormandi 2024-12-03 15:02:50 -06:00
  • eef45c96e4
    fix(server) : not show alert when DONE is received Plamen Minev 2024-12-05 17:25:51 +02:00
  • 2e560f90ff clarify server_sent_event RFC specs Xuan Son Nguyen 2024-12-05 16:13:52 +01:00
  • 1cf769be67 remove server.hpp Xuan Son Nguyen 2024-12-05 16:04:36 +01:00
  • fbc979bf6e
    cmake : simplify msvc charsets Borislav Stanimirov 2024-12-05 16:24:44 +02:00
  • 8ab173c865 add virtual functions Xuan Son Nguyen 2024-12-05 14:44:06 +01:00
  • b717db8eff
    Merge branch 'ggerganov:master' into master Riccardo Orlando 2024-12-05 13:15:24 +01:00
  • d9698c39d3
    Update convert_hf_to_gguf_update.py Riccardo Orlando 2024-12-05 13:13:56 +01:00
  • 0cd182ebcc
    sync : ggml b4271 Georgi Gerganov 2024-12-05 13:27:42 +02:00
  • a8cbab201d
    ggml: add GGML_SET Metal kernel + i32 CPU kernel (ggml/1037) PAB 2024-12-04 09:19:30 +01:00
  • c2082d93a8
    ggml : add GGML_PAD_REFLECT_1D operation (ggml/1034) PAB 2024-12-03 20:20:04 +01:00
  • d405804be8
    py : update outdated copy-paste instructions [no ci] (#10667) Daniel Bevenius 2024-12-05 08:47:55 +01:00
  • 5e3ad011e2 Updates to build.md file Srihari-mcw 2024-12-05 10:29:24 +05:30
  • c702b45849 Update cmakepreset.json to add clang and ninja based configs Srihari-mcw 2024-11-26 07:35:14 -08:00
  • 9d87c69649 Update cmakepreset.json to use clang with ninja by default Srihari-mcw 2024-11-26 07:24:29 -08:00
  • 7b3837e1f0 py : update outdated copy-paste instructions Daniel Bevenius 2024-12-05 05:11:10 +01:00
  • 591894a077 Merge https://github.com/ggerganov/llama.cpp into vulkan Eve 2024-12-04 21:12:18 -05:00
  • 2f56bac740 additional small optimizations Eve 2024-12-04 21:09:31 -05:00
  • 5fbaf121db remove a multiply Eve 2024-12-04 16:31:14 -05:00
  • 0dd48a6952 faster ssm conv implementatioin lihan 2024-12-05 10:05:32 +08:00
  • c403d895c6 Merge https://github.com/ggerganov/llama.cpp into vulkan Eve 2024-12-04 20:03:31 -05:00
  • fe81134954 merge Eve 2024-12-04 16:32:05 -05:00
  • cb666718b1 refactor handle_completions_generic Xuan Son Nguyen 2024-12-04 23:53:25 +01:00
  • f112d198cd
    Update deprecation-warning.cpp (#10619) b4267 aryantandon01 2024-12-05 03:49:20 +05:30
  • 062f256e6b remove a multiply Eve 2024-12-04 16:31:14 -05:00
  • e147054bf7 SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584) Nicolò Scipione 2024-12-04 02:29:20 +01:00
  • 4153d57c12 fix typo of README.md (#10605) Wang Ran (汪然) 2024-12-04 09:22:50 +08:00
  • 9075271c95 Avoid using __fp16 on ARM with old nvcc (#10616) Frankie Robertson 2024-12-04 02:41:37 +02:00
  • 0a81a82f18 Add docs for creating a static build (#10268) (#10630) Benson Wong 2024-12-03 16:40:36 -08:00
  • 0fa9dc4c06 clip : add sycl support (#10574) piDack 2024-12-04 08:26:37 +08:00
  • fa9abd6c82 vulkan: optimize and reenable split_k (#10637) Jeff Bolz 2024-12-03 13:29:54 -06:00
  • 70f0346f0d server : (web ui) Various improvements, now use vite as bundler (#10599) Xuan Son Nguyen 2024-12-03 19:38:44 +01:00
  • f8fe71abcf scripts : remove amx sync Georgi Gerganov 2024-12-03 19:42:30 +02:00
  • 69c7f204df sync : ggml Georgi Gerganov 2024-12-03 19:40:25 +02:00
  • 0df0452af5 CUDA: remove unnecessary warp reduce in FA (ggml/1032) mahorozte 2024-12-03 21:11:43 +08:00
  • 2b155906e4 feat: add GGML_UNARY_OP_ARGMAX Metal kernel (ggml/1019) PAB 2024-12-02 19:27:24 +01:00
  • e92a46bece metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026) PAB 2024-11-28 09:25:06 +01:00
  • f697baf88f llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636) Xuan Son Nguyen 2024-12-03 12:54:30 +01:00
  • d37b7e0910 readme : add option, update default value, fix formatting (#10271) Nikolaos Pothitos 2024-12-03 12:50:08 +02:00
  • d6753d7068 metal : small-batch mat-mul kernels (#10581) Georgi Gerganov 2024-12-03 11:52:33 +02:00
  • ca7c21358b github : minify link [no ci] (revert) Georgi Gerganov 2024-12-03 11:21:43 +02:00
  • ed8649f8e4 github : minify link [no ci] Georgi Gerganov 2024-12-03 11:20:35 +02:00
  • be2d0048f1 server : fix default draft model parameters (#10586) Georgi Gerganov 2024-12-03 11:20:00 +02:00
  • 1da7b76569
    server : fix speculative decoding with context shift (#10641) b4266 Georgi Gerganov 2024-12-04 22:38:20 +02:00
  • 5293e17154
    Make->CMake JohnnyB 2024-12-04 19:16:21 +00:00
  • eaa12887da add std::move Xuan Son Nguyen 2024-12-04 19:52:28 +01:00
  • 1261086163 minor style fix Xuan Son Nguyen 2024-12-04 19:36:37 +01:00
  • 3b41ad53a3 fix index Xuan Son Nguyen 2024-12-04 19:26:36 +01:00
  • ea1be7f8ac add virtual function Xuan Son Nguyen 2024-12-04 19:18:56 +01:00
  • d2419b3255 many fixes Xuan Son Nguyen 2024-12-04 18:58:16 +01:00