Commit graph

  • 672983a9cc Disable KV cache shifting automatically for unsupported models Molly Sophia 2025-01-03 16:30:25 +08:00
  • f66f582927
    llama : refactor src/llama.cpp (#10902) Georgi Gerganov 2025-01-03 10:18:53 +02:00
  • 9d0156bf0a
    minor [no ci] Georgi Gerganov 2025-01-03 10:17:41 +02:00
  • 840594f401 fix: remove trailing whitespaces. matt23654 2025-01-02 23:28:32 +00:00
  • c1a84be959
    Merge a2d4b6fc81 into 2f0ee84b9b Jesse Gross 2025-01-02 18:06:34 -05:00
  • d2f784d50d convert : correct indentation Stanisław Szymczyk 2025-01-02 21:13:35 +01:00
  • 69dd1e859a
    llama : quant (cont) Georgi Gerganov 2025-01-02 21:57:46 +02:00
  • e06d267ac6
    llama : quant Georgi Gerganov 2025-01-02 21:35:50 +02:00
  • 2f0ee84b9b
    server: bench: minor fixes (#10765) Pierrick Hymbert 2025-01-02 18:06:12 +01:00
  • 1948ae8491 Cleaned up and improved type/error handling. matt23654 2025-01-02 15:53:50 +00:00
  • 272cd0eaea
    common : update lora Georgi Gerganov 2025-01-02 17:26:18 +02:00
  • 8d117a518d
    llama : model loader Georgi Gerganov 2025-01-02 16:01:06 +02:00
  • 736e6922ce
    llama : context (cont) Georgi Gerganov 2025-01-02 13:01:39 +02:00
  • 4b39d7020d
    minor Georgi Gerganov 2024-12-24 09:42:53 +02:00
  • 007064f5ec
    llama : context Georgi Gerganov 2024-12-23 21:05:54 +02:00
  • 5bf9dc5783
    cont Georgi Gerganov 2024-12-23 19:10:27 +02:00
  • add3bfe068
    llama : batch Georgi Gerganov 2024-12-23 18:41:55 +02:00
  • 5f794937d9
    llama : impl Georgi Gerganov 2024-12-23 17:32:31 +02:00
  • 8ab668e122
    llama : kv cache Georgi Gerganov 2024-12-23 15:07:29 +02:00
  • 55791c17f6
    minor Georgi Gerganov 2024-12-23 13:28:56 +02:00
  • 2a3aa05ce9
    rebase Georgi Gerganov 2024-12-23 11:51:26 +02:00
  • 2ebe8fe60e
    examples : fix Georgi Gerganov 2024-12-22 23:32:43 +02:00
  • 30e0c88975
    llama : adapter Georgi Gerganov 2024-12-22 22:28:20 +02:00
  • a25ff12f8e
    llama : hparams Georgi Gerganov 2024-12-22 21:00:44 +02:00
  • 7a3065f368
    llama : model Georgi Gerganov 2024-12-22 20:41:05 +02:00
  • a2dc93ed20
    llama : chat Georgi Gerganov 2024-12-22 19:34:32 +02:00
  • 6c22ce1097
    llama : arch (cont) Georgi Gerganov 2024-12-22 18:56:29 +02:00
  • e9c9209e01
    ci : remove BUILD_SHARED_LIBS=OFF Georgi Gerganov 2024-12-22 18:24:18 +02:00
  • 6b24e6eb97
    llama : mmap Georgi Gerganov 2024-12-22 16:41:46 +02:00
  • cf899ea0d3
    llama : arch Georgi Gerganov 2024-12-22 16:20:20 +02:00
  • 844660ba5d
    llama : control-vector -> adapter Georgi Gerganov 2024-12-22 15:49:03 +02:00
  • 498b68f97d
    llama : scatter llama.cpp into multiple modules (wip) Georgi Gerganov 2024-12-11 18:29:23 +02:00
  • 0da5d86026
    server : allow using LoRA adapters per-request (#10994) b4406 Xuan Son Nguyen 2025-01-02 15:05:18 +01:00
  • 74e460d5e1 remove redundant check Xuan Son Nguyen 2025-01-02 13:54:49 +01:00
  • 9274a6bcaa lora_base Xuan Son Nguyen 2025-01-02 13:52:11 +01:00
  • a90e064262
    Apply suggestions from code review Xuan Son Nguyen 2025-01-02 13:50:49 +01:00
  • 93aca64520 convert : renamed expert_weights_func to expert_gating_func Stanisław Szymczyk 2025-01-02 12:04:58 +01:00
  • a43d4953ba llama : add support for DeepSeek V3 model. Stanisław Szymczyk 2025-01-02 10:15:53 +01:00
  • 8c58711455
    Merge 61221221d7 into a45433ba20 Herman Semenoff 2025-01-02 15:44:48 +05:30
  • 0061955a06 convert : add support for DeepSeek V3 model Stanisław Szymczyk 2025-01-02 10:14:39 +01:00
  • a45433ba20
    readme : add llama-swap to infrastructure section (#11032) Benson Wong 2025-01-01 23:14:54 -08:00
  • 35a1ca9185 fix: Vulkan shader gen binary path Gilad S 2025-01-02 03:48:03 +02:00
  • c3efd7df73 Revert "subgroup iq4_nl, 3% slower than original" Eve 2025-01-01 16:50:37 -05:00
  • 1d949a62c6 subgroup iq4_nl, 3% slower than original Eve 2025-01-01 16:50:22 -05:00
  • b46e8ec78c
    readme: add llama-swap to Infrastructure Benson Wong 2025-01-01 13:03:57 -08:00
  • 61037d7e6e
    list llama-swap under tools in README Benson Wong 2025-01-01 12:59:19 -08:00
  • 1dbd16abb9 move lora change task to queue Xuan Son Nguyen 2025-01-01 19:58:30 +01:00
  • bf7df95798 update docs Xuan Son Nguyen 2025-01-01 19:44:00 +01:00
  • 367f0ab1b4 add slow test with llama 8b Xuan Son Nguyen 2025-01-01 19:36:42 +01:00
  • d67fefb91d Merge branch 'master' into xsn/lora_per_request Xuan Son Nguyen 2025-01-01 16:38:42 +01:00
  • ed038a26e3 bct Eve 2024-12-31 23:00:16 -05:00
  • 3c31ceac88 Revert "32 bit cache (slower)" Eve 2024-12-31 22:55:51 -05:00
  • 7d7a9e2401 Revert "failed subgroup experiment (slower)" Eve 2024-12-31 22:55:46 -05:00
  • d7f4663a7c Revert "initial subgroup test" Eve 2024-12-31 22:55:34 -05:00
  • 12f1cdc196 initial subgroup test Eve 2024-12-31 22:55:22 -05:00
  • 77fe42858c failed subgroup experiment (slower) Eve 2024-12-31 22:47:54 -05:00
  • c47dc70b58 Added get_alloc_size forwarding matt23654 2025-01-01 03:37:10 +00:00
  • 7aad6cbda6 Added init tensor calling code matt23654 2024-12-31 21:56:51 +00:00
  • 5597614a30 32 bit cache (slower) Eve 2024-12-31 16:37:03 -05:00
  • 0827b2c1da
    ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) b4404 Srihari-mcw 2024-12-31 19:53:33 +05:30
  • 45095a61bf
    server : clean up built-in template detection (#11026) b4403 Xuan Son Nguyen 2024-12-31 15:22:01 +01:00
  • 450e47b2fb fix condition Xuan Son Nguyen 2024-12-31 15:00:21 +01:00
  • 1c8ba922ec
    Apply suggestions from code review Diego Devesa 2024-12-31 14:44:19 +01:00
  • 9ad89bc9d3 enable AVX VNNI and alder lake build for MSVC slaren 2024-12-31 14:40:47 +01:00
  • 75be0087c6 Merge remote-tracking branch 'origin/master' into clang_avxvnni_branch slaren 2024-12-31 14:40:22 +01:00
  • 5896c65232
    server : add OAI compat for /v1/completions (#10974) b4402 Xuan Son Nguyen 2024-12-31 12:34:13 +01:00
  • c6bd7a7aef add chat template test Xuan Son Nguyen 2024-12-31 12:30:08 +01:00
  • 44f998affc fix compilation Xuan Son Nguyen 2024-12-31 12:23:03 +01:00
  • c5ac2b85bc server : clean up built-in template detection Xuan Son Nguyen 2024-12-31 12:14:15 +01:00
  • bc7b1f8632
    convert : fix Llama-3_1-Nemotron-51B rope settings (#11008) ymcki 2024-12-31 19:04:48 +08:00
  • 6e1531aca5
    common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013) b4400 Peter 2024-12-31 11:46:06 +11:00
  • 716bd6dec3
    vulkan: optimize mul_mat for small values of N (#10991) b4399 Jeff Bolz 2024-12-30 11:27:11 -06:00
  • e2e609979f
    Create cuda12.2.Dockerfile 流年 2024-12-30 23:49:22 +08:00
  • e370cdb5de
    Merge aa014d7e89 into c250ecb315 0cc4m 2024-12-30 08:45:23 -07:00
  • ee423cfae1 Enhance VisionOS compatibility by adding missing type definitions in common and ggml source files. This update includes conditional type definitions for u_int, u_char, u_short, and uint to address legacy type issues on VisionOS across multiple files: common.cpp, ggml-backend.cpp, ggml-cpu.c, ggml-cpu.cpp, and ggml-metal.m. Giovanni Petrantoni 2024-12-30 22:26:38 +09:00
  • c250ecb315
    android : fix llama_batch free (#11014) b4398 ag2s20150909 2024-12-30 20:35:13 +08:00
  • aa014d7e89 Use mutex instead of atomics for vk_instance counters 0cc4m/vulkan-instance-cleanup 0cc4m 2024-12-30 05:14:58 +00:00
  • 238b9689e0 Update test_chat_completion.py ochafik 2024-12-30 04:59:13 +00:00
  • 389d79b6b4 Try and work around msvc++ non-macro max resolution quirk ochafik 2024-12-30 04:39:35 +00:00
  • ce48584f7d No designated initializers yet ochafik 2024-12-30 04:19:33 +00:00
  • 06b5159560 Avoid print in get_hf_chat_template.py ochafik 2024-12-30 04:10:35 +00:00
  • 80138d9007 Add missing <optional> include ochafik 2024-12-30 04:10:20 +00:00
  • e5113e8d74 Add --jinja and --chat-template-file flags ochafik 2024-12-30 03:40:34 +00:00
  • abd274a48f Copy minja from 58f0ca6dd7 ochafik 2024-12-30 03:21:44 +00:00
  • bdd1e4ddc7 support different subgroup sizes (tested) Eve 2024-12-29 22:28:27 -05:00
  • a4a44f92c8
    fix https://github.com/ggerganov/llama.cpp/issues/9946 ag2s20150909 2024-12-30 10:45:27 +08:00
  • 5641108a33 revert Eve 2024-12-29 20:36:59 -05:00
  • e9cbe702b6 unfinished restructure example, didnt continue as its really slow already 15 t/s Eve 2024-12-29 20:36:14 -05:00
  • 64bb149d53 data b cache example, slower than original Eve 2024-12-29 17:05:19 -05:00
  • 860159c10d new safe method Eve 2024-12-28 22:22:46 -05:00
  • 9449305e47
    trim trailing whitespaces Jianlin Shi 2024-12-29 16:08:46 -07:00
  • cf12ada7f2
    Merge branch 'ggerganov:master' into master Jianlin Shi 2024-12-29 15:52:12 -07:00
  • b23684ccc5
    Merge d87aa806b5 into a813badbbd Nico Bosshard 2024-12-30 00:52:27 +03:00
  • d9b0958f59 Vulkan: Refactor to make sure Vulkan instance is destroyed properly on program exit 0cc4m 2024-11-29 07:42:00 +00:00
  • 984ffac253 DeciLMCausalModel now reads rope_theta from config.json properly Yee Man Chan 2024-12-29 22:17:00 +08:00
  • c1736f30ac
    Merge branch 'ggerganov:master' into master ymcki 2024-12-29 22:15:59 +08:00
  • a813badbbd
    vulkan: im2col and matmul optimizations for stable diffusion (#10942) b4397 Jeff Bolz 2024-12-29 03:16:34 -06:00
  • fdd2188912
    vulkan: Use push constant offset to handle misaligned descriptors (#10987) b4396 Jeff Bolz 2024-12-29 02:35:11 -06:00
  • 0247aaf965 vulkan: optimize mul_mat for small values of N Jeff Bolz 2024-12-26 16:21:53 -06:00
  • 076346db8a fix condition Xuan Son Nguyen 2024-12-28 16:16:57 +01:00