Commit graph

  • 424e3a52fe llama/kompute: Add multi-GPU support Feng Jiang 2024-08-21 16:52:11 +08:00
  • 56c5f988eb ggml/kompute: Introduce ggml_backend_kompute_get_device_memory() Feng Jiang 2024-08-21 16:44:44 +08:00
  • 97efd5047a ggml/kompute: Introduce ggml_backend_kompute_get_device_count() Feng Jiang 2024-08-21 16:44:12 +08:00
  • cc9514f941 ggml/kompute: Remove unused ggml_backend_kompute_device_{ref, unref}() Feng Jiang 2024-08-21 16:41:26 +08:00
  • f57f8cb3da ggml/kompute: Reimplement kompute_manager Cong Liu 2024-08-21 16:06:00 +08:00
  • e9313f2e6a missing lock Xuan Son Nguyen 2024-09-06 10:24:47 +02:00
  • 2ab3da68e2 fix deque ? Xuan Son Nguyen 2024-09-06 10:22:08 +02:00
  • 4e66c9caad ggml: fix build error when enable GGML_VULKAN_DEBUG Cong Liu 2024-09-06 16:00:26 +08:00
  • 79ce128d2a small correction Xuan Son Nguyen 2024-09-06 09:41:04 +02:00
  • d545ffcb6d clarify test-arg-parser Xuan Son Nguyen 2024-09-06 09:39:08 +02:00
  • 00d129f87f
    Merge 395ae48cb0 into 8ebe8ddebd Liu Jia 2024-09-06 15:24:13 +08:00
  • 8ebe8ddebd
    Improve Vulkan shader build system (#9239) b3673 Markus Tavenrath 2024-09-06 08:56:17 +02:00
  • 6f7ed4ae22 ci: Update HIP SDK to 24.Q3 (ROCm 6.1) Huang Qi 2024-09-05 23:31:19 +08:00
  • 3222aae43d
    only enable sgemm for prompt processing Eve 2024-09-06 03:44:19 +00:00
  • 3676778e82 ggml/kompute: Implement ggml_backend_i.offload_op interface Cong Liu 2024-08-21 15:38:51 +08:00
  • d94ad56f87 ggml/kompute: Use the kp::Manager in ggml_backend_kompute_context instead of global Weishi Li 2024-08-21 14:54:13 +08:00
  • 74ba8516ce ggml/kompute: Move butf into struct ggml_backend_kompute_context Weishi Li 2024-08-21 14:26:51 +08:00
  • e914ac7c68 ggml/kompute: Introducing struct ggml_backend_kompute_buffer_context Ming Xie 2024-08-21 11:18:35 +08:00
  • 3666c861d4 ggml/kompute: Rename ggml_kompute_context to ggml_backend_kompute_context Ming Xie 2024-08-21 10:54:09 +08:00
  • 9bc6db28d0
    ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) b3672 compilade 2024-09-05 21:48:47 -04:00
  • b83f0ca34c
    Merge 806c5a4e5b into 32b2ec88bc Srihari-mcw 2024-09-05 15:43:01 -07:00
  • 32b2ec88bc
    Update build.yml (#9184) b3671 awatuna 2024-09-06 06:34:36 +08:00
  • 1031771faa
    CMake fix: host for msvc compiler can only be x86 or x64 (#8624) Michael Podvitskiy 2024-09-06 00:14:12 +02:00
  • a0bd8f0343 a way to process CMAKE_OSX_ARCHITECTURES as a list Michael Podvitskiy 2024-09-05 22:49:18 +02:00
  • 3a2190b800 CMake fix: host for msvc compiler can only be x86 or x64 Michael Podvitskiy 2024-07-22 09:29:33 +02:00
  • 509ec08e57 bring back --n-predict Xuan Son Nguyen 2024-09-05 21:03:50 +02:00
  • b1657cb934 bring back missing --alias Xuan Son Nguyen 2024-09-05 20:58:10 +02:00
  • fe6df473a3 update server docs Xuan Son Nguyen 2024-09-05 20:26:26 +02:00
  • de378fa483 Merge branch 'master' into xsn/argparser_v3 Xuan Son Nguyen 2024-09-05 20:23:21 +02:00
  • 88e3a4f3bc skip build test-arg-parser on windows Xuan Son Nguyen 2024-09-05 20:20:46 +02:00
  • f5e6a80c3f fix build (2) Xuan Son Nguyen 2024-09-05 20:00:52 +02:00
  • 75d0869ef5 add export-docs example Xuan Son Nguyen 2024-09-05 19:59:55 +02:00
  • 286dcc9dbe fix linux build Xuan Son Nguyen 2024-09-05 19:28:06 +02:00
  • 60ae92bd54 handle env Xuan Son Nguyen 2024-09-05 19:26:21 +02:00
  • b979fc97ba cmake : use ggml-metal.metal from source dir to build default.metallib fix-ninja-metallib-build Jared Van Bortel 2024-09-05 12:17:56 -04:00
  • 753782ae35 add test Xuan Son Nguyen 2024-09-05 16:46:31 +02:00
  • 9ae4d8a96d migrated Xuan Son Nguyen 2024-09-05 15:55:44 +02:00
  • 4db04784f9
    cuda : fix defrag with quantized KV (#9319) b3669 slaren 2024-09-05 11:13:11 +02:00
  • f8d26f1438 Arm AArch64: Documentation updates Dan Johansson 2024-08-27 15:39:22 +02:00
  • 3f038544e8 rpc : update README [no ci] Radoslav Gerganov 2024-09-05 11:59:28 +03:00
  • e46291903a cuda : fix defrag with quantized KV slaren 2024-09-05 03:50:25 +02:00
  • bdf314f38a
    llama-bench : fix NUL terminators in CPU name (#9313) b3668 slaren 2024-09-05 02:19:39 +02:00
  • 9ecc19ae39
    Merge ad1af06737 into 581c305186 Xuan Son Nguyen 2024-09-04 23:25:23 +02:00
  • 75b3a09602 test-backend-ops : add TQ1_0 and TQ2_0 comments for later compilade/bitnet-ternary Francis Couture-Harpin 2024-09-04 14:01:25 -04:00
  • 8d61607656 ggml ; remove unused ggml_mul special case Francis Couture-Harpin 2024-09-04 13:50:08 -04:00
  • 7f3a619c98 Merge branch 'master' into compilade/bitnet-ternary Francis Couture-Harpin 2024-09-04 13:26:50 -04:00
  • 581c305186
    ggml : AVX2 support for Q4_0_8_8 (#8713) b3667 Srihari-mcw 2024-09-04 22:21:22 +05:30
  • 5910ea9427
    [SYCL] Fix DMMV dequantization (#9279) b3666 Ouadie EL FAROUKI 2024-09-04 16:26:33 +01:00
  • bcaa271893 llama-bench : fix NUL terminators in CPU name slaren 2024-09-04 17:03:33 +02:00
  • c8671ae282
    Fix broken links in docker.md (#9306) b3665 杨朱 · Kiki 2024-09-04 19:45:28 +08:00
  • 6a3a2fcc5b (wip) argparser v3 Xuan Son Nguyen 2024-09-04 13:37:09 +02:00
  • 24bfbdee13 Removed unecessary loop unrolling OuadiElfarouki 2024-09-04 12:33:08 +01:00
  • 82e3b03c11
    rpc : make RPC servers come first in the device list (#9296) b3664 Radoslav Gerganov 2024-09-04 11:08:32 +03:00
  • 99cd44677a rpc : rpc_count always zero for non-RPC builds Radoslav Gerganov 2024-09-03 16:58:26 +03:00
  • 7049733bcc rpc : disable options for non-RPC builds Radoslav Gerganov 2024-09-03 16:46:35 +03:00
  • 0178724414 rpc : make RPC servers come first in the device list Radoslav Gerganov 2024-08-30 17:40:37 +03:00
  • e7eb974ec9
    Fix broken links in docker.md 杨朱 · Kiki 2024-09-04 15:13:56 +08:00
  • 9379d3cc17
    readme : rename result_format to response_format (#9300) Pascal Patry 2024-09-04 02:45:40 -04:00
  • df91497567 batched-bench : add --output-format jsonl option Aarni Koskela 2024-09-04 09:35:45 +03:00
  • 988d2c1d5b
    batched-bench : remove unused code Georgi Gerganov 2024-09-04 09:30:53 +03:00
  • c950fc3064 Make updates to reduce number of load instructions Srihari-mcw 2024-08-28 00:10:26 -07:00
  • 364dc964ba Update comments and indentation Srihari-mcw 2024-08-23 03:22:21 -07:00
  • 49af3f5da7 Update code to fix issues occuring due to non alignment of elements to be processed as multiple of 16 in MSVC Srihari-mcw 2024-08-23 03:08:08 -07:00
  • 0c81b7bcea Add AVX2 based implementations for quantize_q8_0_4x8, ggml_gemv_q4_0_8x8_q8_0 and ggml_gemm_q4_0_8x8_q8_0 functions Srihari-mcw 2024-07-26 05:38:24 -07:00
  • 7605ae7daf
    flake.lock: Update (#9261) Georgi Gerganov 2024-09-04 02:36:43 +03:00
  • 83f760f960 readme : rename result_format to response_format Pascal Patry 2024-09-03 15:21:40 -04:00
  • 8cdbe11344 rectified dmmv quant fix OuadiElfarouki 2024-09-03 19:56:54 +01:00
  • a81ba75193 patch mergeing + masking done Yutong Dai 2024-09-03 18:39:46 +00:00
  • 8962422b1c
    llama-bench : add JSONL (NDJSON) output mode (#9288) b3661 Aarni Koskela 2024-09-03 20:58:54 +03:00
  • 5ec3b382da
    Apply whitespace cleaning from code review Aarni Koskela 2024-09-03 20:33:12 +03:00
  • 040fddee1c maybe fix AddressSanitizer? Xuan Son Nguyen 2024-09-03 13:57:51 +02:00
  • 852f6548bb add test Xuan Son Nguyen 2024-09-03 13:14:52 +02:00
  • ba0065fc1b Merge branch 'master' into xsn/slot_state_machine Xuan Son Nguyen 2024-09-03 12:54:04 +02:00
  • e8e3e72509 fix test step Xuan Son Nguyen 2024-09-03 12:54:03 +02:00
  • 69b398ce64 metrics : add n_busy_slots_per_decode Xuan Son Nguyen 2024-09-03 12:02:37 +02:00
  • fbebf65039 fix passkey test Xuan Son Nguyen 2024-09-03 11:54:41 +02:00
  • d3fedaa6d6 add missing notify_one Xuan Son Nguyen 2024-09-03 10:57:36 +02:00
  • a9a9f66692 Removed WhiteSpaces vithulep 2024-09-03 14:10:39 +05:30
  • ec882cc1ef pop_deferred_task Xuan Son Nguyen 2024-09-03 10:34:58 +02:00
  • 6f9ef6b39d add ffmpeg requirement caitianchi 2024-09-03 16:02:48 +08:00
  • f648ca2cee
    llama : add llama_sampling API + move grammar in libllama gg/llama-refactor-sampling Georgi Gerganov 2024-08-05 10:08:25 +03:00
  • 17732711e7 llama-bench : update usage docs Aarni Koskela 2024-09-03 10:12:53 +03:00
  • e03e6cd7d2 llama-bench : add JSONL (NDJSON) output mode Aarni Koskela 2024-09-03 09:25:39 +03:00
  • 2fab55ec0c
    Merge 4adb77f7bc into b69a480af4 Christopher 2024-09-03 12:31:21 +05:30
  • b69a480af4
    readme : refactor API section + remove old hot topics Georgi Gerganov 2024-09-03 10:00:36 +03:00
  • 6a6cfd6c6f Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths vithulep 2024-09-03 12:17:44 +05:30
  • 4dbdb6c82f Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths vithulep 2024-09-03 11:27:22 +05:30
  • 1a8d832c60
    Merge 6ed2f795ae into 48baa61ecc Frank Mai 2024-09-03 04:30:37 +03:00
  • 446d57d7cd add SLOT_STATE_DONE_PROMPT Xuan Son Nguyen 2024-09-02 22:31:23 +02:00
  • 2c81cde493 server : simplify state machine for slot Xuan Son Nguyen 2024-09-02 22:09:35 +02:00
  • 48baa61ecc
    server : test script : add timeout for all requests (#9282) Xuan Son Nguyen 2024-09-02 22:08:38 +02:00
  • 8320cb626e server : test script : add timeout for all requests Xuan Son Nguyen 2024-09-02 20:19:31 +02:00
  • f1485161e5
    src: make tail invalid when kv cell is intersection for mamba (#9249) b3658 Zhenwei Jin 2024-09-03 01:53:23 +08:00
  • da18950038 removed unecessary condition OuadiElfarouki 2024-09-02 17:14:44 +01:00
  • aab435a5a7
    Merge 375de5b1f8 into 048de848ee compilade 2024-09-02 17:12:24 +01:00
  • 048de848ee
    docker : fix missing binaries in full-cuda image (#9278) slaren 2024-09-02 18:11:13 +02:00
  • 40fa68cb46
    readme : add API change notice gg/llama-disambiguate Georgi Gerganov 2024-09-02 18:32:24 +03:00
  • 4e379017e6
    llama : fix comment Georgi Gerganov 2024-09-02 18:32:11 +03:00
  • f771d064a9
    ggml : add pthread includes on FreeBSD (#9258) b3656 yuri@FreeBSD 2024-09-02 08:25:30 -07:00
  • 6e7d133a5f
    server : refactor multitask handling (#9274) b3655 Xuan Son Nguyen 2024-09-02 17:11:51 +02:00