Commit graph

  • 3b62f7c145 threadpool: make polling the default to match openmp behavior Max Krasnyansky 2024-08-06 18:35:53 -07:00
  • 6fcc780b5f atomics: always use stdatomics with clang and use relaxed memory order when polling in ggml_barrier Max Krasnyansky 2024-08-05 14:25:49 -07:00
  • 2953441563 bench: create fresh threadpool for each test Max Krasnyansky 2024-08-03 17:17:39 -07:00
  • 96d6603dc7 threadpool: use cpu_get_num_math to set the default number of threadpool threads Max Krasnyansky 2024-08-03 16:14:04 -07:00
  • 3008b31b17 fix deadlock for cases where cgraph.n_nodes == 1 Faisal Zaghloul 2024-07-31 12:42:31 -04:00
  • 57637326c4 fix more race conditions Faisal Zaghloul 2024-07-31 12:42:31 -04:00
  • 817eaf0c00 Fix Android bulid issue Faisal Zaghloul 2024-07-31 12:42:30 -04:00
  • 82224f84d7 fixed a harmless race condition Faisal Zaghloul 2024-07-31 12:42:30 -04:00
  • d5c9c14dea fixed use after release bug Faisal Zaghloul 2024-07-31 12:42:30 -04:00
  • a0aae528bb Minor fixes Faisal Zaghloul 2024-07-31 12:42:30 -04:00
  • 130adf8415 Introduce ggml_compute_threadpool Faisal Zaghloul 2024-07-31 12:42:30 -04:00
  • 0e682ced5e add restrict pidack 2024-08-27 20:54:39 +08:00
  • eec0e8ca81 memory access pattern pidack 2024-08-27 20:51:26 +08:00
  • 3246fe84d7
    Fix minicpm example directory (#9111) Xie Yanbo 2024-08-27 20:33:08 +08:00
  • 28506e51d6 Increase submit counter only if actual work has been submitted and increase submit count to 100. Markus Tavenrath 2024-08-27 13:56:13 +02:00
  • 2b2bc1ff8b Repair GGML_VULKAN_CHECK_RESULTS Markus Tavenrath 2024-08-27 13:47:33 +02:00
  • e53b14f152 del debug ingo pidack 2024-08-27 19:33:28 +08:00
  • 21c16fa5ed fix trailing whitespace pidack 2024-08-27 19:10:57 +08:00
  • 78eb487bb0
    llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156) b3636 compilade 2024-08-27 06:09:23 -04:00
  • 1928967874 resolve test-backend-ops conflicts pidack 2024-08-27 17:31:40 +08:00
  • 0f398dddb3
    Merge baf344b5b3 into a77feb5d71 Marko Tasic 2024-08-27 10:30:31 +01:00
  • ed1455ca77
    Merge 6414642d1a into a77feb5d71 Sigbjørn Skjæret 2024-08-27 10:30:30 +01:00
  • 2a8d861ecd
    Merge 50599208d6 into a77feb5d71 Daniel Hiltgen 2024-08-27 10:30:22 +01:00
  • 78458283b8
    Merge 16c093543a into a77feb5d71 rahsuri 2024-08-27 10:30:22 +01:00
  • 40f47872b3 Merge branch 'master' of github.com:ggerganov/llama.cpp into mfalcon_mamba_cuda pidack 2024-08-27 17:08:23 +08:00
  • a77feb5d71
    server : add some missing env variables (#9116) b3635 Xuan Son Nguyen 2024-08-27 11:07:01 +02:00
  • b423a6df5e fix ssm_scan numerical error & others update pidack 2024-08-27 16:51:21 +08:00
  • 2e59d61c1b
    llama : fix ChatGLM4 wrong shape (#9194) b3634 CausalLM 2024-08-27 14:58:22 +08:00
  • 75e1dbbaab
    llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141) b3633 Carsten Kragelund Jørgensen 2024-08-27 08:53:40 +02:00
  • ad76569f8e
    common : Update stb_image.h to latest version (#9161) b3632 arch-btw 2024-08-26 22:58:50 -07:00
  • 8dd323b496 Merge branch 'master' of github.com:ggerganov/llama.cpp into mfalcon_mamba_cuda pidack 2024-08-27 09:44:18 +08:00
  • 07baee57c9 push for juntao Yutong Dai 2024-08-27 00:18:21 +00:00
  • 8ae3d09a30
    Fix ChatGLM4 wrong shape CausalLM 2024-08-27 03:51:14 +08:00
  • a8d8e9607d
    Update .ecrc arch-btw 2024-08-26 12:39:39 -07:00
  • ab4890744d
    Merge branch 'ggerganov:master' into arch-btw-patch-1 arch-btw 2024-08-26 12:35:13 -07:00
  • 7d787ed96c
    ggml : do not crash when quantizing q4_x_x with an imatrix (#9192) b3631 slaren 2024-08-26 19:44:43 +02:00
  • 06658ad7c3
    metal : separate scale and mask from QKT in FA kernel (#9189) b3630 Georgi Gerganov 2024-08-26 18:31:02 +03:00
  • e279ce0f45 ggml : do not crash when quantizing q4_x_x with an imatrix slaren 2024-08-26 16:58:22 +02:00
  • fc18425b6a
    ggml : add SSM Metal kernels (#8546) b3629 Georgi Gerganov 2024-08-26 17:55:36 +03:00
  • ff23e8e9f0
    metal : keep data in local memory Georgi Gerganov 2024-08-26 16:35:22 +03:00
  • 879275ac98
    tests : fix compile warnings for unreachable code (#9185) b3628 Georgi Gerganov 2024-08-26 16:30:25 +03:00
  • e865686c21
    metal : ne01 check no longer necessary Georgi Gerganov 2024-08-26 16:24:26 +03:00
  • e65fc9b8b2
    metal : separate scale and mask from QKT in FA kernel Georgi Gerganov 2024-08-26 16:04:13 +03:00
  • a95225cdfd
    metal : another fix for the fa kernel gg/metal-fix-fa-2 Georgi Gerganov 2024-08-26 14:55:28 +03:00
  • 6e075c8849 tokenize : add --show-count-only (token) option Daniel Bevenius 2024-08-26 07:14:53 +02:00
  • aa931d0375
    metal : fix fa kernel gg/metal-fix-fa Georgi Gerganov 2024-08-26 13:02:36 +03:00
  • 20d390bea4 10x performance improve 4 cuda ssm conv & scan pidack 2024-08-26 17:33:23 +08:00
  • 82a7ed999b
    tests : fix compile warnings for unreachable code Georgi Gerganov 2024-08-26 12:29:12 +03:00
  • fbf2ac1470
    ggml : add ssm_scan metal impl Georgi Gerganov 2024-07-18 14:58:09 +03:00
  • 9928f4bde3
    ggml : add ggml_ssm_conv metal impl Georgi Gerganov 2024-07-17 21:32:38 +03:00
  • 7a3df798fc
    ci : add VULKAN support to ggml-ci (#9055) b3627 Georgi Gerganov 2024-08-26 12:19:39 +03:00
  • ea5ab6030d
    ci : add VULKAN support to ggml-ci Georgi Gerganov 2024-08-16 10:06:52 +03:00
  • e5edb210cd
    server : update deps (#9183) Georgi Gerganov 2024-08-26 12:16:57 +03:00
  • 0c41e03ceb
    metal : gemma2 flash attention support (#9159) b3625 slaren 2024-08-26 11:08:59 +02:00
  • f12ceaca0c
    ggml-ci : try to improve build time (#9160) slaren 2024-08-26 11:03:30 +02:00
  • 6494509801 backup sycl-onednn-convolution Meng, Hengyu 2024-08-26 08:58:54 +00:00
  • edc2e27383 use precise::tanh slaren 2024-08-26 10:51:16 +02:00
  • ccb45186d0
    docs : remove references gg/remove-k-quants-per-iter Georgi Gerganov 2024-08-26 09:52:02 +03:00
  • e48fd74b45
    ggml : remove k_quants_per_iteration macro Georgi Gerganov 2024-07-04 21:19:09 +03:00
  • 9a8274f1a1
    server : update deps Georgi Gerganov 2024-08-26 09:14:26 +03:00
  • 436787f170
    llama : fix time complexity of string replacement (#9163) b3623 Justine Tunney 2024-08-25 23:09:53 -07:00
  • f23b3b1c9c
    Update build.yml awatuna 2024-08-26 13:05:43 +08:00
  • 61221221d7 llama: changed default type IQ2_XS to IQ2_S for LLAMA_FTYPE_MOSTLY_IQ2_S Herman Semenov 2024-08-26 03:18:23 +03:00
  • cfb1b2277f ggml: skip excess iteration for pair whose vars same element when i2 == i1 Herman Semenov 2024-08-26 02:41:27 +03:00
  • 66c8bb05b2 common,train,examples: using constexpr string and strlen for microoptimizations Herman Semenov 2024-08-26 02:19:18 +03:00
  • 93bc3839f9
    common: fixed not working find argument --n-gpu-layers-draft (#9175) b3622 Herman Semenov 2024-08-25 22:54:37 +00:00
  • 0f240bb2f9 common: fixed not working find argument --n-gpu-layers-draft Herman Semenov 2024-08-26 01:08:05 +03:00
  • f91fc5639b
    CUDA: fix Gemma 2 numerical issues for FA (#9166) b3621 Johannes Gäßler 2024-08-25 22:11:48 +02:00
  • 4a4a3420de Add GGML_USE_BLAS flag to llama.cpp and update BLAS documentation simonteozw 2024-08-18 23:47:24 +08:00
  • fae826fb56 Fix failed assertions while running Falcon Mamba Jan Ploski 2024-08-25 14:57:47 +02:00
  • 16aee45179 correction Nexesenex 2024-08-25 14:25:46 +02:00
  • 931ed360ae CUDA: fix Gemma 2 numerical issues for FA Johannes Gäßler 2024-08-25 09:17:41 +02:00
  • 08a49aa535
    Fix time complexity of string replacement Justine Tunney 2024-08-24 20:20:53 -07:00
  • dd3df754b2 Bad indents and trailing whitespaces Nexesenex 2024-08-25 03:30:36 +02:00
  • f63860eaac Put back ffn_down tree where it was before. Nexesenex 2024-08-25 03:17:21 +02:00
  • 8fc46df134 Bump a bit ffn_gate and down for some GQA<2 models Nexesenex 2024-08-24 22:30:45 +02:00
  • 53b8eaa316 Remove deprecated rules for token embeddings Nexesenex 2024-08-24 21:57:07 +02:00
  • 844d11b8f3 bad indent Nexesenex 2024-08-24 21:02:51 +02:00
  • 5ae59714d2 Revamp Q2_K and Q3_K quants Nexesenex 2024-08-24 20:50:07 +02:00
  • 1bde168c07 Usage of n_head to discriminate very small models Nexesenex 2024-08-23 23:27:26 +02:00
  • 16e9c3771a various corrections on IQ2_S+ and IQ3 quants Nexesenex 2024-08-23 23:18:59 +02:00
  • 380b53d061 Fix IQ4_XSR Nexesenex 2024-08-23 21:59:34 +02:00
  • 608108597c Ravamp attn_output Nexesenex 2024-08-23 17:48:31 +02:00
  • 6b5cebfb2b Revamp a bit output weight Nexesenex 2024-08-23 16:40:40 +02:00
  • f796954872 Revamp FFN down and attn_k Nexesenex 2024-08-23 14:17:19 +02:00
  • 596a4aec86 Readd variable attn_k, attn_q, attn_o after merge Nexesenex 2024-08-22 19:12:25 +02:00
  • fb2b9ea667 Merge branch 'master' into pr/8836 Nexesenex 2024-08-25 02:59:57 +02:00
  • 3a027b878b Revamp IQ4_XSR, remove IQ3_XXXL Nexesenex 2024-08-23 00:08:42 +02:00
  • e05da54eff Overhaul of FFN, if GQA and if not Nexesenex 2024-08-22 19:12:13 +02:00
  • 1607a02bdd Further adjustments difquant formulas Nexesenex 2024-08-23 12:38:45 +02:00
  • 179ad0fad4 Little rework of the difquant formulas Nexesenex 2024-08-21 13:10:54 +02:00
  • 1efb216d68 flake.lock: Update github-actions[bot] 2024-08-25 00:20:30 +00:00
  • e589f0fcde
    Update stb_image.h to latest version arch-btw 2024-08-24 15:50:27 -07:00
  • 061e520075 Update CUDA ops and tests to match implementation from commit 8fb57ac0 (llama : use im2col and mul_mat to perform convolution for Mamba); GPU version breaks with assert because of unsupported MUL_MAT Jan Ploski 2024-06-03 14:46:50 +02:00
  • f88617743f ggml-ci : try to improve build time slaren 2024-08-25 00:08:41 +02:00
  • 12c913c52c Fix backend test for ssm_conv CUDA op not working Jan Ploski 2024-06-02 20:32:33 +02:00
  • 64fbd320ef Add patch to test cases provided by @compilade; test for ssm_conv fails Jan Ploski 2024-06-02 19:20:17 +02:00
  • 25f9e65d3a Update CUDA ops ssm_conv and ssm_scan to match CPU implementation from PR #7531 (as per eb589d5e) Jan Ploski 2024-06-02 18:14:02 +02:00
  • cc365b045b Add GGML_OP_SSM_CONF, GGML_OP_SSM_SCAN to supported ops for CUDA backend + test case for each op Jan Ploski 2024-06-02 00:17:34 +02:00
  • f809568fa1 Add initial/naive CUDA kernels for the GGML_OP_SSM_CONV and GGML_OP_SSM_SCAN ops Jan Ploski 2024-06-01 20:47:46 +02:00