Commit graph

  • d1dc8fbd0a accept --dev none to completely disable offloading slaren 2024-11-25 17:24:12 +01:00
  • f4457cb877 llama : accept a list of devices to use to offload a model slaren 2024-11-25 17:16:00 +01:00
  • a9a678a6b2
    Add download chat feature to server chat (#10481) brucepro 2024-11-25 08:11:55 -08:00
  • 56d8a95e67 vulkan: fix group_norm Jeff Bolz 2024-11-25 09:15:33 -06:00
  • 65bcf2e14f ensured that behavior consistent with log Roberto Tomás Collins 2024-11-25 09:51:16 -05:00
  • 9ca2e67762
    server : add speculative decoding support (#10455) b4164 Georgi Gerganov 2024-11-25 16:31:38 +02:00
  • ac5830d729 imatrix-combine-only idea Roberto Tomás Collins 2024-11-25 09:29:04 -05:00
  • 5efd05d89d code style Xuan Son Nguyen 2024-11-25 15:20:31 +01:00
  • 5931c1f233
    ggml : add support for dynamic loading of backends (#10469) b4163 Diego Devesa 2024-11-25 15:13:39 +01:00
  • f6d12e7df8
    tests : fix compile warning b4162 Georgi Gerganov 2024-11-25 15:17:32 +02:00
  • 4ff0831ce6
    metal : use F16 math in mul_mat kernels gg/metal-mul-mat-f16 Georgi Gerganov 2024-11-08 13:21:59 +02:00
  • b756441104
    metal : minor code formatting b4161 Georgi Gerganov 2024-11-25 15:08:04 +02:00
  • ca30560aa3 Github: update issue templates [no ci] Johannes Gäßler 2024-11-25 13:09:21 +01:00
  • 8797958285 ggml-cpu: cmake add arm64 cpu feature check for macos Charles Xu 2024-11-25 12:27:10 +01:00
  • 5a8987793f
    [SYCL] Fix building Win package for oneAPI 2025.0 update (#10483) b4160 Neo Zhang Jianyu 2024-11-25 17:31:10 +08:00
  • b1eaad6a91
    metal : enable mat-vec kernels for bs <= 4 Georgi Gerganov 2024-11-25 10:36:48 +02:00
  • 18e79d40af fix: Fix a vulkan-shaders-gen arugment parsing error Junil Kim 2024-11-25 17:26:06 +09:00
  • 0ba40c3615
    server : add helper function slot.can_speculate() Georgi Gerganov 2024-11-25 10:16:27 +02:00
  • 58652e42c3 Merge remote-tracking branch 'upstream/master' shanshan shen 2024-11-25 08:06:46 +00:00
  • df68663a63 some modification after review shanshan shen 2024-11-25 08:05:54 +00:00
  • 156aa6d934
    server : add speculative decoding support Georgi Gerganov 2024-11-22 13:48:57 +02:00
  • d9d54e498d
    speculative : refactor and add a simpler example (#10362) Georgi Gerganov 2024-11-25 09:58:41 +02:00
  • 5f3c597c08 rm debug arthw 2024-11-25 15:27:01 +08:00
  • a27dc771c3
    Merge a3822fb59b into cce5a90075 Djip007 2024-11-25 15:02:52 +08:00
  • e0b7ee8a26 fix arthw 2024-11-25 14:59:22 +08:00
  • 21eff53e75 Add download chat feature to server chat brucepro 2024-11-24 22:41:30 -08:00
  • d93ee665d8 debug arthw 2024-11-25 13:43:43 +08:00
  • 3af12c198a debug arthw 2024-11-25 13:38:28 +08:00
  • b980a19e1e fix build package for 2025.0 arthw 2024-11-25 12:13:52 +08:00
  • 457a48360a vulkan: further optimize q5_k mul_mat_vec Jeff Bolz 2024-11-24 20:56:37 -06:00
  • b81e5ca026 remove eval-callback test hack since the backend loader now checks the executable directory slaren 2024-11-25 03:47:48 +01:00
  • 6d19135b9b add cpu backend to the swift build slaren 2024-11-25 03:34:29 +01:00
  • ae99c8fa55 suppress error dialogs on windows slaren 2024-11-25 03:29:03 +01:00
  • 53d7f4f658 add version checking slaren 2024-11-24 23:54:16 +01:00
  • bd9f7b4297 refactor cmake build use MODULE target type for dl backend set backend output directory to the runtime directory ggml_backend_load_all searches backends in the system path first, then in the executable directory slaren 2024-11-24 23:22:16 +01:00
  • 3a1f55c0de
    Merge 243fd5dd37 into cce5a90075 Olivier Chafik 2024-11-24 22:58:27 +04:00
  • 402a0e94dc
    Update ggml/src/ggml-backend-impl.h Diego Devesa 2024-11-24 19:12:22 +01:00
  • 8f419181d1
    common : final touches Georgi Gerganov 2024-11-24 19:19:12 +02:00
  • cce5a90075
    flake.lock: Update (#10470) Georgi Gerganov 2024-11-24 18:03:25 +02:00
  • dc39012cba
    llama : fix op mul check with command-r-plus (#10476) b4157 Diego Devesa 2024-11-24 16:10:26 +01:00
  • f519ac63c6 llama : fix op mul check with command-r-plus slaren 2024-11-24 15:48:00 +01:00
  • 4eb126fff0
    common : change defaults [no ci] Georgi Gerganov 2024-11-24 15:39:07 +02:00
  • 7f9cc2058c
    common : refactor args Georgi Gerganov 2024-11-24 14:55:16 +02:00
  • c8880e786c
    speculative : fix compile warning Georgi Gerganov 2024-11-24 12:53:48 +02:00
  • d9fb3b2e01
    speculative : fix the draft sampling Georgi Gerganov 2024-11-24 12:50:17 +02:00
  • be5f611000
    speculative : do not redraft previous drafts Georgi Gerganov 2024-11-24 12:09:31 +02:00
  • 9336db462c
    convert : XLMRoberta Type Vocab Size (#10458) Gabe Goodhart 2024-11-24 02:02:34 -07:00
  • ad1e27a0af
    metal : export ggml_backend_get_features() Georgi Gerganov 2024-11-24 10:53:35 +02:00
  • 808d434901 fixes slaren 2024-11-24 02:05:21 +01:00
  • 5a2618d934 flake.lock: Update github-actions[bot] 2024-11-24 00:24:15 +00:00
  • 96fa2c5e2d
    fix gguf-py: Conversion error when multiple licenses are configured (#9807) momonga 2024-11-24 09:09:22 +09:00
  • 1605605b54 link to libdl on linux slaren 2024-11-24 01:08:56 +01:00
  • ccd8df8a9d add ggml_backend_unload slaren 2024-11-24 00:59:39 +01:00
  • d5a3beb0e0 ggml : add support for dynamic loading of backends slaren 2024-11-24 00:00:52 +01:00
  • 0e02350561 vulkan: Handle GPUs with less shared memory Jeff Bolz 2024-11-23 15:50:45 -06:00
  • 021ca28c14
    Merge branch 'ggerganov:master' into master momonga 2024-11-24 03:17:49 +09:00
  • 55ed008b2d
    ggml : do not use ARM features not included in the build (#10457) b4154 Diego Devesa 2024-11-23 14:41:12 +01:00
  • 3c8b10560a handle generation until context is filled VJHack 2024-11-22 22:45:41 -06:00
  • 757a2d3f04 fix(convert_hf_to_gguf): Support setting token_type_count from "type_vocab_size" Gabe Goodhart 2024-11-22 16:32:08 -07:00
  • 29b273f7d3 vulkan: optimize Q2_K and Q3_K mul_mat_vec Jeff Bolz 2024-11-22 16:24:32 -06:00
  • 3479f516ea udpate prompt template in wrapper Yicheng Qian 2024-11-22 14:01:42 -08:00
  • c7db5b2078 ggml : do not use ARM features not included in the build slaren 2024-11-22 20:24:07 +01:00
  • dd2bc3293b
    Merge branch 'ggerganov:master' into server-chat-templates-custom MaggotHATE 2024-11-22 23:39:53 +05:00
  • 2e197a1f21
    make : build fixes Georgi Gerganov 2024-11-22 16:11:25 +02:00
  • ccc8f63f9f
    speculative : minor fixup Georgi Gerganov 2024-11-22 13:48:39 +02:00
  • f27ddc57d7
    speculative : add --draft-min CLI arg Georgi Gerganov 2024-11-22 12:27:09 +02:00
  • d3c57c1eed Merge remote-tracking branch 'upstream/master' shanshan shen 2024-11-22 10:01:45 +00:00
  • f0e09002c3 improve inferencing performance for ascend npu. shanshan shen 2024-11-22 09:28:44 +00:00
  • 43f41a4c00
    Merge pull request #28 from NexaAI/zack/vlm Zack Li 2024-11-22 01:50:10 -08:00
  • 0d4d0c1559
    speculative : simplify (cont) Georgi Gerganov 2024-11-22 11:31:28 +02:00
  • 6dfcfef078
    ci: Update oneAPI runtime dll packaging (#10428) b4153 蕭澧邦 2024-11-22 17:44:08 +08:00
  • fe8c7b45fd revert CMakeList zack Zhiyuan Li 2024-11-22 09:08:44 +00:00
  • e4c122b93c
    speculative : simplify Georgi Gerganov 2024-11-22 11:05:49 +02:00
  • 460212ac2a change template for inference zack Zhiyuan Li 2024-11-22 09:06:15 +00:00
  • bbf1aaa7ed Merge remote-tracking branch 'origin' into zack/vlm zack Zhiyuan Li 2024-11-22 09:04:28 +00:00
  • 599b3e0cd4
    GitHub: ask for more info in issue templates (#10426) Johannes Gäßler 2024-11-22 08:32:40 +01:00
  • c18610b4ee
    CANN: Support Ascend310P to accelerate F32 and F16 Model (#10216) b4151 leo-pony 2024-11-22 14:07:20 +08:00
  • ecd8ac903c
    Merge branch 'ggerganov:master' into server-chat-templates-custom MaggotHATE 2024-11-22 09:21:47 +05:00
  • 12dd7e0b11
    Merge 9373e2ba58 into a5e47592b6 Zhenwei Jin 2024-11-22 11:53:51 +08:00
  • c50b5d0b0b
    Merge branch 'ggerganov:master' into master momonga 2024-11-22 12:44:24 +09:00
  • 6c4720c4a7 fix: ggml: make GGML compatible with vulkan v1.2.162 Junil Kim 2024-11-22 12:39:04 +09:00
  • 6021be4e60
    Merge 6b45680f21 into a5e47592b6 Brian 2024-11-21 16:51:57 -06:00
  • 0f878a657c
    speculative : manage context in common_speculative Georgi Gerganov 2024-11-21 21:27:14 +02:00
  • d92f518253 Simplify logics even further MaggotHATE 2024-11-21 22:46:44 +05:00
  • b2cf6e73fc Fixed prefix and suffix compatibility MaggotHATE 2024-11-21 22:26:06 +05:00
  • feadeadf50 fix --no-clean Eve 2024-11-21 12:19:53 -05:00
  • a5e47592b6
    cuda : optimize argmax (#10441) b4150 Diego Devesa 2024-11-21 18:18:50 +01:00
  • 33761375d2
    Merge branch 'ggerganov:master' into server-chat-templates-custom MaggotHATE 2024-11-21 21:59:11 +05:00
  • 2f0a01465a Cleanup of unused features MaggotHATE 2024-11-21 21:58:36 +05:00
  • 73f435fdfc Simplified logics and UI MaggotHATE 2024-11-21 21:55:45 +05:00
  • 0ae6edb6db Merge https://github.com/ggerganov/llama.cpp into vulkan Eve 2024-11-21 11:28:30 -05:00
  • ac3973bfc3 revert and update Eve 2024-11-21 11:28:23 -05:00
  • be5295bd51 test Eve 2024-11-21 11:27:18 -05:00
  • fe043ff1ff
    speculative : clean-up and add comments and TODOs [no ci] Georgi Gerganov 2024-11-17 18:55:27 +02:00
  • 71fc16bb6c
    speculative : refactor and add a simpler example Georgi Gerganov 2024-11-15 08:20:28 +02:00
  • 48f94d41d9 ggml : check ne00 <= INT32_MAX in argmax and argsort slaren 2024-11-21 15:01:09 +01:00
  • 2847c406d6 add dropdown for llama.cpp module Johannes Gäßler 2024-11-21 14:13:39 +01:00
  • be29da9d4a Remove the ascend soc_type hard code compile option in CMakelist.txt leo-pony 2024-11-21 21:19:14 +08:00
  • 97e7bba189 more understandable issue description Johannes Gäßler 2024-11-21 14:12:55 +01:00
  • 316f3d3116 fix ub slaren 2024-11-21 13:48:43 +01:00