Commit graph

  • 72f636a71d adjustment noemotiovon 2024-11-27 09:54:10 +00:00
  • 1f638043fc [cann] ROPE operator optimization noemotiovon 2024-11-27 09:46:32 +00:00
  • f56013dcb5 ggml-cpu: support IQ4_NL_4_4 by runtime repack Shupei Fan 2024-11-27 17:44:47 +08:00
  • 0dfd33d90d
    Update cann.md to ensure it displays correctly on all platforms. Ruixin Huang 2024-11-27 17:07:44 +08:00
  • 11684db800 only pack CUDA runtime on master slaren 2024-11-27 05:19:14 +01:00
  • 05f205b3ea remove fetch-depth slaren 2024-11-27 05:10:17 +01:00
  • 6b96b5e221 ci : faster CUDA toolkit installation method and use ccache slaren 2024-11-27 04:23:48 +01:00
  • 0aa5fd083a subgroup 64 version with subgroup add. 15% faster Eve 2024-11-26 16:55:24 -05:00
  • d4bfa31c07 ci : fix cuda releases slaren 2024-11-26 21:37:25 +01:00
  • 0f96e5dc83 Change link to landing page Shane A 2024-11-26 12:35:15 -08:00
  • 4bf4c666da Add link to OLMo 2 model in docs Shane A 2024-11-26 12:30:08 -08:00
  • 8ab26137c9 llama : disable warnings for 3rd party sha1 dependency slaren 2024-11-26 20:18:44 +01:00
  • 54e54832df ci : remove nix workflows slaren 2024-11-26 19:40:12 +01:00
  • 003b9f7b10 Add some minimal optimizations for CDNA uvos 2024-11-25 17:33:42 +01:00
  • 8e205ca2c0
    Fix docs regarding GGML_HIP Tristan Druyen 2024-11-26 17:00:11 +01:00
  • 81d7fee6c2
    Fix inconsistency of HIP flags in cmake & make Tristan Druyen 2024-11-26 17:08:13 +01:00
  • 52c262585b add test_cache_vs_nocache_prompt Xuan Son Nguyen 2024-11-26 16:04:13 +01:00
  • 217c9e4215 no cache_prompt for some tests Xuan Son Nguyen 2024-11-26 15:57:42 +01:00
  • 3c8a2a83fe
    shmem experiments gg/metal-mul-mv-new-save3 Georgi Gerganov 2024-11-13 11:04:04 +02:00
  • b63e17c066 mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make Xiaodong Ye 2024-11-26 21:11:27 +08:00
  • dafedd33d2
    4x4 -> 4x gg/metal-mul-mv-new-save2 Georgi Gerganov 2024-11-12 14:47:04 +02:00
  • bf3494345e
    metal : some mul_mv experiments gg/metal-mul-mv-new Georgi Gerganov 2024-11-11 13:16:12 +02:00
  • 3b27041727 vision model support piDack 2024-11-26 12:36:48 +00:00
  • c254ee763a
    Merge branch 'master' into gg/cmake-warnings Georgi Gerganov 2024-11-26 14:16:08 +02:00
  • 9c88b91b45 ci : publish the docker images created during scheduled runs slaren 2024-11-26 13:00:05 +01:00
  • 71fc0f158d update test docs Xuan Son Nguyen 2024-11-26 12:40:23 +01:00
  • 3a504ae88e Merge branch 'master' into xsn/server_pytest Xuan Son Nguyen 2024-11-26 11:39:31 +01:00
  • f51e98ff88
    server : fix parallel speculative decoding Georgi Gerganov 2024-11-26 12:25:11 +02:00
  • ae41d3efed Merge branch 'master' of https://github.com/piDack/llama.cpp into support_glm_edge_model piDack 2024-11-26 10:01:09 +00:00
  • ac90ee39f5 use vmmlaq_s32 for compile option i8mm check Charles Xu 2024-11-26 10:51:23 +01:00
  • 6fc90cb727 support for glm edge model liyuhang 2024-11-26 09:17:30 +00:00
  • b83cae088c
    speculative : add infill mode gg/speculative-infill Georgi Gerganov 2024-11-26 11:14:17 +02:00
  • 1bc0e9a296
    speculative : simplify the implementation Georgi Gerganov 2024-11-26 00:10:46 +02:00
  • 33fd470550 restore some modifications shanshan shen 2024-11-26 08:49:26 +00:00
  • 5d06ee7ebf [cann] RoPE and CANCAT operator optimization noemotiovon 2024-11-26 08:35:07 +00:00
  • e05a398fb3 restore some modifications shanshan shen 2024-11-26 07:32:39 +00:00
  • cf6b987be3 Merge remote-tracking branch 'upstream/master' shanshan shen 2024-11-26 07:10:23 +00:00
  • 1c79893ca2 some modifications after review shanshan shen 2024-11-26 07:09:55 +00:00
  • 00675ec068
    Merge fcc5a22fde into 0eb4e12bee Googulator 2024-11-26 14:12:18 +08:00
  • 4f696624a4 use config partial_rotary_factor as rope ratio liyuhang 2024-11-26 05:29:42 +00:00
  • 0e3d85dec6 kompute: op_mul_mat_q6_k permutted support Sergio Lopez 2024-11-26 00:51:11 +01:00
  • f54c96e8c7 kompute: op_mul_mat_f16 permutted support Sergio Lopez 2024-11-25 21:51:50 +01:00
  • 9c5bdf4188 kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support Sergio Lopez 2024-11-25 21:14:07 +01:00
  • 2ac1d0ee3e kompute: op_mul_mat_q4_k permutted support Sergio Lopez 2024-11-25 17:52:52 +01:00
  • 8930e8850e
    Merge 3c8b10560a into 0eb4e12bee Vinesh Janarthanan 2024-11-25 21:10:25 -06:00
  • 45e90fb517 restore the condistion to build & update pacakge when merge arthw 2024-11-26 11:01:49 +08:00
  • 1b8afa88dc kompute: rope: implement neox and phi3 support Sergio Lopez 2024-11-22 14:35:50 +01:00
  • d8889598d6 kompute: softmax: implement ALiBi support Sergio Lopez 2024-11-20 07:28:25 +01:00
  • 913536f2a5 kompute: op_unary: reject unsupported parameters Sergio Lopez 2024-11-19 18:00:39 +01:00
  • 9e2301f4a4
    metal : fix group_norm support condition (#0) Georgi Gerganov 2024-11-27 11:22:14 +02:00
  • fee824a1a1
    sync : ggml Georgi Gerganov 2024-11-27 11:10:42 +02:00
  • 9150f8fef9
    Do not include arm_neon.h when compiling CUDA code (ggml/1028) Frankie Robertson 2024-11-26 15:50:26 +02:00
  • c31ed2abfc
    vulkan: define all quant data structures in types.comp (#10440) b4196 Jeff Bolz 2024-11-27 01:32:54 -06:00
  • 5b3466bedf
    vulkan: Handle GPUs with less shared memory (#10468) b4195 Jeff Bolz 2024-11-27 01:30:27 -06:00
  • 249a7902ec
    vulkan: further optimize q5_k mul_mat_vec (#10479) Jeff Bolz 2024-11-27 01:21:59 -06:00
  • 71a64989a5
    vulkan: skip integer div/mod in get_offsets for batch_idx==0 (#10506) Jeff Bolz 2024-11-27 01:08:54 -06:00
  • 4a57d362e1
    vulkan: optimize Q2_K and Q3_K mul_mat_vec (#10459) Jeff Bolz 2024-11-27 01:00:50 -06:00
  • c9b00a70b0
    ci : fix cuda releases (#10532) b4191 Diego Devesa 2024-11-26 22:12:10 +01:00
  • de5097351c
    Add OLMo 2 model in docs (#10530) Shane A 2024-11-26 12:55:29 -08:00
  • 5a349f2809
    ci : remove nix workflows (#10526) Diego Devesa 2024-11-26 21:13:54 +01:00
  • 30ec398321
    llama : disable warnings for 3rd party sha1 dependency (#10527) Diego Devesa 2024-11-26 21:01:47 +01:00
  • be0e350c8b
    Fix HIP flag inconsistency & build docs (#10524) Tristan Druyen 2024-11-26 19:27:28 +01:00
  • 249cd93da3
    mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (#10516) R0CKSTAR 2024-11-27 00:00:41 +08:00
  • 904109ed0d
    vulkan: fix group_norm (#10496) Jeff Bolz 2024-11-26 09:45:05 -06:00
  • 45abe0f74e
    server : replace behave with pytest (#10416) Xuan Son Nguyen 2024-11-26 16:20:18 +01:00
  • 0bbd2262a3
    restore the condistion to build & update pacakge when merge (#10507) Neo Zhang Jianyu 2024-11-26 21:43:47 +08:00
  • ab96610b1e
    cmake : enable warnings in llama (#10474) Georgi Gerganov 2024-11-26 14:18:08 +02:00
  • 7db3846a94
    ci : publish the docker images created during scheduled runs (#10515) Diego Devesa 2024-11-26 13:05:20 +01:00
  • c6807b3f28
    ci : add ubuntu cuda build, build with one arch on windows (#10456) Diego Devesa 2024-11-26 13:05:07 +01:00
  • 25669aa92c
    ggml-cpu: cmake add arm64 cpu feature check for macos (#10487) b4179 Charles Xu 2024-11-26 12:37:05 +01:00
  • 84e1c33cde
    server : fix parallel speculative decoding (#10513) b4178 Georgi Gerganov 2024-11-26 13:36:40 +02:00
  • 811872a59d
    speculative : simplify the implementation (#10504) b4177 Georgi Gerganov 2024-11-26 12:29:38 +02:00
  • 9a4b79bcfa
    CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454) b4176 Shanshan Shen 2024-11-26 18:08:37 +08:00
  • 7066b4cce2
    CANN: RoPE and CANCAT operator optimization (#10488) b4175 Chenguang Li 2024-11-26 17:31:05 +08:00
  • dbffb03580 ci : add ubuntu cuda build, build with one arch on windows slaren 2024-11-26 02:50:16 +01:00
  • 0eb4e12bee
    vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484) b4174 Junil Kim 2024-11-26 10:47:20 +09:00
  • c1275fc69c vulkan: skip integer div/mod in get_offsets for batch_idx==0 Jeff Bolz 2024-11-25 18:38:49 -06:00
  • 6fab3ffe02
    speculative-simple : fix compile warning Georgi Gerganov 2024-11-26 00:06:45 +02:00
  • 0cc63754b8
    Introduce llama-run (#10291) b4173 Eric Curtin 2024-11-25 16:56:24 -05:00
  • 33d49f7c5a
    cmake : reuse ggml_get_flags Georgi Gerganov 2024-11-25 23:54:43 +02:00
  • 7177eb8901
    speculative-simple : fix warnings Georgi Gerganov 2024-11-25 23:43:55 +02:00
  • 5ea6fc59e9
    cmake : get_flags -> ggml_get_flags Georgi Gerganov 2024-11-25 23:43:45 +02:00
  • f1c0a93b58
    cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS Georgi Gerganov 2024-11-25 23:43:26 +02:00
  • 50d5cecbda
    ci : build docker images only once daily (#10503) b4172 Diego Devesa 2024-11-25 22:05:39 +01:00
  • 8e399372a7 ci : build docker images only once daily slaren 2024-11-25 21:31:42 +01:00
  • 9fd8c2687f
    server : add more information about error (#10455) b4171 Georgi Gerganov 2024-11-25 22:28:27 +02:00
  • e908ace717
    cmake : enable warnings in llama Georgi Gerganov 2024-11-24 15:00:51 +02:00
  • 47f931c8f9
    server : enable cache_prompt by default (#10501) b4170 Georgi Gerganov 2024-11-25 21:50:07 +02:00
  • 106964e3d2
    metal : enable mat-vec kernels for bs <= 4 (#10491) b4169 Georgi Gerganov 2024-11-25 21:49:31 +02:00
  • 80acb7b430
    Rename Olmo1124 to Olmo2 (#10500) b4168 Shane A 2024-11-25 10:36:09 -08:00
  • 10bce0450f
    llama : accept a list of devices to use to offload a model (#10497) b4167 Diego Devesa 2024-11-25 19:30:06 +01:00
  • 42f61c8656 rename env parameter to LLAMA_ARG_DEVICE for consistency slaren 2024-11-25 19:29:06 +01:00
  • fe48dbd4c3
    server : enable cache_prompt by default Georgi Gerganov 2024-11-25 20:28:59 +02:00
  • 1f922254f0
    Github: update issue templates [no ci] (#10489) Johannes Gäßler 2024-11-25 19:18:37 +01:00
  • aa0a5073e2 Introduce llama-run Eric Curtin 2024-11-14 11:31:40 +00:00
  • acf43cc178 fix other examples slaren 2024-11-25 18:08:33 +01:00
  • 1ee6c482d0 Merge branch 'master' into compilade/mamba2 compilade/mamba2 Francis Couture-Harpin 2024-11-25 12:04:23 -05:00
  • d6cf9186f5 fix dev list with dl backends slaren 2024-11-25 18:00:40 +01:00
  • 2d34163765 vulkan: get the first command buffer submitted sooner Jeff Bolz 2024-11-25 10:33:22 -06:00
  • 04c14a0abd Rename Olmo1124 to Olmo2 Shane A 2024-11-25 08:30:29 -08:00