Commit graph

  • b20f068234 Renames the rest of the compute capability macros for consistency. akieslinger 2024-12-09 14:43:12 +01:00
  • 3974cf6bdf Renames GGML_CUDA_MIN_CC_DP4A to GGML_CUDA_CC_DP4A. akieslinger 2024-12-09 14:18:50 +01:00
  • 7b85783217 Reverts erroneous rename in SYCL-code. akieslinger 2024-12-09 14:06:56 +01:00
  • b96d38272b Renames NVIDIA GPU-architecture flags to avoid name clashes with WinAPI. (e.g. CC_PASCAL, GPU architecture or WinAPI pascal compiler flag?) akieslinger 2024-12-09 11:16:11 +01:00
  • 3d98b4cb22
    vulkan: fix compile warnings (#10731) b4293 Jeff Bolz 2024-12-09 01:24:01 -06:00
  • 1a05004743
    cmake : simplify msvc charsets (#10672) b4292 Borislav Stanimirov 2024-12-09 09:15:13 +02:00
  • c97a20529c vulkan: fix compile warnings Jeff Bolz 2024-12-08 22:17:25 -06:00
  • 9af9e80163 use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721) Eve 2024-12-08 17:26:56 -05:00
  • c7bc42cea0 Merge https://github.com/ggerganov/llama.cpp into vulkan Eve 2024-12-08 17:10:57 -05:00
  • 8972f1d35c Merge branch '0cc4m/vulkan-subgroup-size-control' of https://github.com/ggerganov/llama.cpp into vulkan Eve 2024-12-08 17:10:37 -05:00
  • ce8784bdb1
    server : fix format_infill (#10724) b4291 Xuan Son Nguyen 2024-12-08 23:04:29 +01:00
  • 3a81c60698 test_invalid_input_extra_req Xuan Son Nguyen 2024-12-08 22:58:58 +01:00
  • 055aa9e2ea update test Xuan Son Nguyen 2024-12-08 22:53:00 +01:00
  • d47360e5a2 update test Xuan Son Nguyen 2024-12-08 22:21:17 +01:00
  • 5ffc2a0270 use another model Xuan Son Nguyen 2024-12-08 21:36:28 +01:00
  • ac2ea5382c update test Xuan Son Nguyen 2024-12-08 21:29:05 +01:00
  • a4d2572494 rename Xuan Son Nguyen 2024-12-08 21:12:18 +01:00
  • b8d1b1a5e1
    server : fix infill prompt format gg/server-fix-infill Georgi Gerganov 2024-12-08 22:12:11 +02:00
  • 6ec3f77a41 fix Xuan Son Nguyen 2024-12-08 21:11:57 +01:00
  • b46dc2f2e9 server : fix format_infill Xuan Son Nguyen 2024-12-08 21:09:12 +01:00
  • e52522b869
    server : bring back info of final chunk in stream mode (#10722) b4290 Xuan Son Nguyen 2024-12-08 20:38:51 +01:00
  • 06d70147e6
    Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (#10723) stduhpf 2024-12-08 19:19:19 +01:00
  • d94ff95565 Faster NaN-free tanh Stéphane du Hamel 2024-12-08 18:30:18 +01:00
  • 1c163674de fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested Eve 2024-12-08 12:00:02 -05:00
  • bfecabebf7 merge Eve 2024-12-08 11:52:40 -05:00
  • 6de286657d only increase the number of rows for amd and subgroup size 64 Eve 2024-12-08 11:51:27 -05:00
  • 93bdbc656d Vulkan: fix NaN in tanh.comp Stéphane du Hamel 2024-12-08 17:39:57 +01:00
  • 5f45778bba traling space Xuan Son Nguyen 2024-12-08 17:32:55 +01:00
  • 270c5d6529 clarify a bit Xuan Son Nguyen 2024-12-08 17:32:08 +01:00
  • d893770ba4 server : bring back into to final chunk in stream mode Xuan Son Nguyen 2024-12-08 17:24:42 +01:00
  • 595c1a7d93 Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats 0cc4m 2024-12-08 14:41:40 +00:00
  • 43ed389a3f
    llama : use cmake for swift build (#10525) b4288 Diego Devesa 2024-12-08 12:14:54 +01:00
  • 2d20f4cba5
    ci : cont Georgi Gerganov 2024-12-08 12:50:24 +02:00
  • 8ba166a3dd
    ci : cont Georgi Gerganov 2024-12-08 12:42:30 +02:00
  • c9003ce520
    ci : try fix ios build Georgi Gerganov 2024-12-08 12:30:38 +02:00
  • c06405e10d
    server : return stopping_word in the partial response ZXED 2024-12-08 13:06:23 +03:00
  • 2df7e32597
    Revert "swift : <> -> """ Georgi Gerganov 2024-12-08 11:45:05 +02:00
  • ecc93d0558
    vulkan: compile a test shader in cmake to check for coopmat2 support (#10713) b4287 Jeff Bolz 2024-12-08 02:05:55 -06:00
  • a6648b9df7
    server : chunked prefill support gg/server-chunked-prefill Georgi Gerganov 2024-12-08 09:48:18 +02:00
  • 7233425668 vulkan: compile a test shader in cmake to check for coopmat2 support Jeff Bolz 2024-12-07 19:25:27 -06:00
  • 984d4707f3
    Update ggml-vulkan.cpp Eve 2024-12-07 21:24:34 +00:00
  • 62e84d9848
    llama : add 128k yarn context for Qwen (#10698) Robert Collins 2024-12-07 16:12:27 -05:00
  • 4a185ad39b double the number of rows per workgroup Eve 2024-12-07 16:06:40 -05:00
  • e4491d5c55
    ci : disable ios build Georgi Gerganov 2024-12-07 22:29:30 +02:00
  • 575f266167 removing useless line Roberto Tomás Collins 2024-12-07 15:12:55 -05:00
  • 78357f630e Added if statement Malik 2024-12-07 15:09:19 -05:00
  • 3126a2020a
    ci : remove make Georgi Gerganov 2024-12-07 21:55:01 +02:00
  • bd17bc452f cleanup Eve 2024-12-07 14:51:05 -05:00
  • d39ffd9556
    swift : <> -> "" Georgi Gerganov 2024-12-07 21:40:59 +02:00
  • d2ca8bb343 refactor: rename search_path to dir_path Gilad S 2024-12-07 21:23:54 +02:00
  • 8fac078f30 refactor: rename ggml_backend_load_all_in_search_path to ggml_backend_load_all_from_path Gilad S 2024-12-07 21:21:11 +02:00
  • 3573fa8e7b
    server : (refactor) no more json in server_task input (#10691) b4285 Xuan Son Nguyen 2024-12-07 20:21:09 +01:00
  • f5a15fc69e remove ifdefs Eve 2024-12-07 14:20:56 -05:00
  • 47844dc232
    llama : use cmake for swift build slaren 2024-11-26 19:25:44 +01:00
  • 89c2af9099 update readme Xuan Son Nguyen 2024-12-07 20:07:51 +01:00
  • 1949f68f4e add "model_path" to /props Xuan Son Nguyen 2024-12-07 20:05:04 +01:00
  • 65d2e6d675 fix CI by adding safe_json_to_str Xuan Son Nguyen 2024-12-07 19:45:51 +01:00
  • 090a113417 remove task inf_type Xuan Son Nguyen 2024-12-07 19:33:40 +01:00
  • e721f4c6b4 add tests for /props and /slots Xuan Son Nguyen 2024-12-07 19:17:35 +01:00
  • 6bf6e3066c Merge branch 'master' into xsn/refactor_server_struct_input Xuan Son Nguyen 2024-12-07 19:17:24 +01:00
  • 12f17f754d rename mrope related function, params HimariO 2024-12-08 01:32:19 +08:00
  • 4b65c6b90d Merge branch 'vulkan' of https://github.com/netrunnereve/llama.cpp into vulkan Eve 2024-12-07 12:06:53 -05:00
  • 32b994e853
    Merge branch 'ggerganov:master' into vulkan Eve 2024-12-07 17:06:37 +00:00
  • 4eefebc84e Merge branch 'vulkan' of https://github.com/netrunnereve/llama.cpp into vulkan Eve 2024-12-07 12:06:22 -05:00
  • ac2089c378 add mrope unit test, fix few compiler warnings HimariO 2024-12-08 00:47:48 +08:00
  • d9c3ba2b77
    ggml : disable iq4_nl interleave size 8 (#10709) b4284 Georgi Gerganov 2024-12-07 18:38:15 +02:00
  • ce4a7b8493
    server : various fixes (#10704) b4283 Georgi Gerganov 2024-12-07 18:02:05 +02:00
  • fba0e0d3dd
    server : reflect endpoint response changes in the readme Georgi Gerganov 2024-12-07 17:58:35 +02:00
  • 0a7f6933ed
    Update examples/server/server.cpp Xuan Son Nguyen 2024-12-07 16:47:32 +01:00
  • ada8855f4a
    ggml : disable iq4_nl interleave size 8 Georgi Gerganov 2024-12-07 17:35:58 +02:00
  • 01da1ed9b6 fix /slots endpoint Xuan Son Nguyen 2024-12-07 16:35:13 +01:00
  • 6c39aa38f5 add makefile entry, update speical image padding token HimariO 2024-12-07 21:59:54 +08:00
  • 9bb1ae6bea add test for slots endpoint Xuan Son Nguyen 2024-12-07 13:56:14 +01:00
  • 19d8762ab6
    ggml : refactor online repacking (#10446) b4282 Djip007 2024-12-07 13:37:50 +01:00
  • 1221d13df8 add debug logs on repacks. Djip007 2024-12-07 12:50:44 +01:00
  • e115f6f6d1
    Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Djip007 2024-12-07 12:51:34 +01:00
  • 7dc8a3e2d6
    Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Djip007 2024-12-07 12:51:22 +01:00
  • d277fdcf43
    fix for building with no internet connection Mohammadreza Hendiani 2024-12-07 14:42:05 +03:30
  • 1881ffaf3e
    server : show curent seed in slot_params Georgi Gerganov 2024-12-07 12:29:50 +02:00
  • 4e218c7255
    server : various fixes Georgi Gerganov 2024-12-07 12:02:45 +02:00
  • c2a16c0bdb
    server : fix free of spec context and batch (#10651) b4281 Georgi Gerganov 2024-12-07 11:52:44 +02:00
  • 3df784b305
    Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (#10597) b4280 0cc4m 2024-12-07 10:24:15 +01:00
  • 86a1934978
    metal : Extend how Llama.cpp locates metal resources (#10676) b4279 Robert Ormandi 2024-12-07 01:55:01 -06:00
  • eeaf0b9402 Fix coopmat2 MUL_MAT_ID pipeline selection 0cc4m 2024-12-07 06:44:25 +00:00
  • 784a14aa49
    convert : add support for Roberta embeddings (#10695) Sukriti Sharma 2024-12-07 00:02:14 -07:00
  • 1f0b15799b tool-call: add firefunction-v2 style ochafik 2024-12-07 03:09:50 +00:00
  • 5d0033f57a minja: sync @ 916c181c0d ochafik 2024-12-07 02:15:51 +00:00
  • b1fdc8c460 added property for model tensors Roberto Tomás Collins 2024-12-06 21:06:33 -05:00
  • 429aa9b94f fix: Windows search path Gilad S 2024-12-07 02:06:34 +02:00
  • 318600fbcb feat: load all backends from a user-provided search path Gilad S 2024-12-07 02:01:35 +02:00
  • b8c3607a8a add 128k yarn context for Qwen Roberto Tomás Collins 2024-12-06 18:39:08 -05:00
  • 8c73a4fe70 feat: add support for Roberta embeddings Sukriti-Sharma4 2024-12-06 14:01:29 -07:00
  • 055154ad3b
    Merge 1f6855faa0 into c5ede3849f Adrien Gallouët 2024-12-07 04:43:21 +08:00
  • de594d09a6
    Merge branch 'ggerganov:master' into master Michael Coppola 2024-12-06 14:59:33 -05:00
  • 859ce0cf89 bug fix: stop server from sending empty json object before sending on_complete json object during oai chat streaming response Michael Coppola 2024-12-06 14:58:25 -05:00
  • b14b47132a added/corrected control on tensor size for Q4 repacking. Djip007 2024-12-06 20:14:31 +01:00
  • c5ede3849f
    convert : add custom attention mapping Georgi Gerganov 2024-12-06 21:33:15 +02:00
  • 0f4305f3e1
    Update ggml/src/ggml-metal/ggml-metal.m Robert Ormandi 2024-12-06 11:10:45 -06:00
  • bcf86a9d29
    Update ggml/src/ggml-metal/ggml-metal.m Robert Ormandi 2024-12-06 11:10:37 -06:00
  • db97c8b19b server : (refactor) no more json in server_task input Xuan Son Nguyen 2024-12-06 15:01:12 +01:00