Commit graph

  • 2f50a58723
    convert: movre n_mult removing Green Sky 2023-09-10 13:50:57 +02:00
  • 8f26cb07fb
    Update build.yml Alon 2023-09-10 14:27:16 +03:00
  • 6eee6b6949 main: add progress spinner on context swap crasm 2023-09-10 06:24:31 -04:00
  • 100bc5bba2 fix dst variable Alon Faraj 2023-09-10 13:51:11 +03:00
  • 73cae1e306
    Update README.md BarfingLemurs 2023-09-10 02:18:01 -04:00
  • 2d6733a8c9 Check Metal MTLCommandBufferStatus error codes and report an out of memory error if one occurred RogerD 2023-09-09 21:15:25 -07:00
  • 2e974cfb82 feat: docker gpu image CI builds canardleteer 2023-09-09 13:04:10 -07:00
  • f3098f1a32
    Don't highlight console session as Java. Nathan Ringo 2023-09-09 15:20:38 -05:00
  • 1cef45953b
    remove unused command line options xaedes 2023-09-09 21:58:36 +02:00
  • 54b21a397c
    Merge branch 'master' into finetune-lora xaedes 2023-09-09 21:30:22 +02:00
  • ace90884a6
    measure max compute size for each cgraph eval order and use best order xaedes 2023-09-09 21:00:25 +02:00
  • 917d2870b4
    add cgraph evaluation order member and corresponding enum type xaedes 2023-09-09 20:52:53 +02:00
  • 5bb9bf47b7
    convert: remove the now unused find_n_mult Green Sky 2023-09-09 20:26:53 +02:00
  • d3f1b438a8
    simplify broadcasting mul_mat backward using ggml_repeat_back xaedes 2023-09-09 18:55:18 +02:00
  • d3aaf0876a
    add comment briefly describing what ggml_repeat_back does xaedes 2023-09-09 18:47:27 +02:00
  • 9738526899
    decouple random number generator of each operation test xaedes 2023-09-09 18:46:35 +02:00
  • dd3278619d
    test broadcasting mul_mat backward pass xaedes 2023-09-09 18:38:29 +02:00
  • aea8b6be74
    support broadcastable a in out_prod(a, b) and backward pass of broadcasting mul_mat(a, b) xaedes 2023-09-09 18:37:45 +02:00
  • ecd7beddf0
    convert: remove most n_mult usage Green Sky 2023-09-09 17:43:57 +02:00
  • 35260f7d74
    fix finetune to support grouped-query-attention (using flash-attention) xaedes 2023-09-09 17:10:23 +02:00
  • 833a56c144
    add llama API functions to get grouped-query-attention n_head parameter 'n_head_kv'. xaedes 2023-09-09 17:07:54 +02:00
  • d7aade7d8a
    support grouped-query-attention in ggml_flash_attn and ggml_flash_attn_back xaedes 2023-09-09 17:01:54 +02:00
  • cfcdd52502 switching to fast math for some more speedup, see https://learn.microsoft.com/en-us/cpp/build/reference/fp-specify-floating-point-behavior?view=msvc-170 Eric Sommerlade 2023-09-09 14:22:54 +01:00
  • cb864773ba addressed PR comments Eric Sommerlade 2023-09-09 13:35:50 +01:00
  • 16bf5f26ea Fix typo. goerch 2023-09-09 14:08:12 +02:00
  • 75a20d5f8a Adapting the other parts of the makefile goerch 2023-09-09 13:55:21 +02:00
  • 516a0d5509 Remove trailing whitespace goerch 2023-09-09 13:49:25 +02:00
  • 28b749496e Adapting makefile goerch 2023-09-09 13:44:55 +02:00
  • 96533e0c22 Make console usage platform specific goerch 2023-09-09 13:35:57 +02:00
  • e903d5f163 Fixing wrong fix. goerch 2023-09-09 13:28:34 +02:00
  • 4ee2152940 Fix dependency to common goerch 2023-09-09 13:19:47 +02:00
  • 52c9ecf31f Add console.cpp dependency goerch 2023-09-09 13:14:58 +02:00
  • 89a727774c Reenable tokenizer test for LLaMa goerch 2023-09-09 12:21:06 +02:00
  • d5a3c4a9c2 gguf-py: Support identity operation in TensorNameMap KerfuffleV2 2023-09-09 04:00:50 -06:00
  • 21ac3a1503
    metal : support for Swift (#3078) kchro3 2023-09-09 02:12:10 -07:00
  • 4fd5477955
    metal : support build for iOS/tvOS (#3089) Jhen-Jie Hong 2023-09-09 16:46:04 +08:00
  • d58af4d46f
    bump version kchro3 2023-09-09 00:34:25 -07:00
  • 9a953a42c4
    Merge branch 'ggerganov:master' into master goerch 2023-09-09 09:05:58 +02:00
  • 11f72245ff First draft of SqueezeLLM PR chooper1 2023-09-08 21:29:42 -07:00
  • c6b0ebbe1b First draft of SqueezeLLM PR chooper1 2023-09-08 21:29:29 -07:00
  • 24013b37b6 update to use newLibraryWithURL kchro3 2023-09-08 20:27:43 -07:00
  • c2cffcf169 test kchro3 2023-09-08 19:53:58 -07:00
  • 81f0201bf0 metal : support build for iOS/tvOS Jhen 2023-09-09 06:57:24 +08:00
  • 48851cdb30
    build : fixes #2210 by adding a compiler flag check Tristan Ross 2023-09-08 12:53:00 -07:00
  • 89595ff4e5 CUDA: enable option for F16 with LLAMA_HIPBLAS Aaryaman Vasishta 2023-09-09 04:30:12 +09:00
  • de56a82830 glob overrides, --log-disable restores def style staviq 2023-09-08 20:43:24 +02:00
  • 690c794b32 set minimum versions for all platforms kchro3 2023-09-08 09:48:56 -07:00
  • 99f85e7eb1 add a toggle for arm/arm64 kchro3 2023-09-08 09:26:29 -07:00
  • 7331d1e0b4 metal: add back faster diagonal infinity Iwan Kawrakow 2023-09-08 18:07:29 +02:00
  • ec2a24fedf
    flake : add train-text-from-scratch to flake.nix (#3042) takov751 2023-09-08 17:06:26 +01:00
  • 7d99aca759
    readme : fix typo (#3043) Ikko Eltociear Ashimine 2023-09-09 01:04:32 +09:00
  • 3c7aa13552
    Update README.md Georgi Gerganov 2023-09-08 19:04:22 +03:00
  • ba7ffbb251
    metal : Q3_K speedup (#2995) Kawrakow 2023-09-08 18:01:04 +02:00
  • e64f5b5578
    examples : make n_ctx warning work again (#3066) b1204 Cebtenzzre 2023-09-08 11:43:35 -04:00
  • 94f10b91ed
    readme : update hot tpoics Georgi Gerganov 2023-09-08 18:18:04 +03:00
  • 4fc615e827 Reverting the diag infinity change Iwan Kawrakow 2023-09-08 17:14:04 +02:00
  • b3e9852e47
    sync : ggml (CUDA GLM RoPE + POSIX) (#3082) b1202 Georgi Gerganov 2023-09-08 17:58:07 +03:00
  • 4560acced5 Another faster f16 x f32 matrix multiply kernel Iwan Kawrakow 2023-09-08 16:34:07 +02:00
  • 5074f8448b
    sync : ggml (CUDA GLM RoPE + POSIX) Georgi Gerganov 2023-09-08 15:06:19 +03:00
  • cb6c44c5e0
    build : do not use _GNU_SOURCE gratuitously (#2035) b1201 Przemysław Pawełczyk 2023-09-08 14:09:21 +02:00
  • a21baeb122
    docker : add git to full-cuda.Dockerfile main-cuda.Dockerfile (#3044) hongbo.mo 2023-09-08 18:57:55 +08:00
  • 6ff712a6d1
    Update deprecated GGML TheBloke links to GGUF (#3079) Yui 2023-09-08 12:32:55 +02:00
  • fa5a989104 metal: faster diagonal infinity Iwan Kawrakow 2023-09-08 12:06:23 +02:00
  • ce92d754a3 update kchro3 2023-09-07 22:46:56 -07:00
  • 89a96fdb2b Metal support for Swift kchro3 2023-09-07 22:36:02 -07:00
  • 43ca76976d metal: faster soft_max vial float4 Iwan Kawrakow 2023-09-08 10:37:59 +02:00
  • cd4d05791b
    Update deprecated GGML TheBloke links to GGUF Yui 2023-09-08 09:06:26 +02:00
  • da0a872ebb
    Merge branch 'ggerganov:master' into betterlogs2 staviq 2023-09-08 04:28:46 +02:00
  • 0a1b5a9e42 LogTargetWrapper, LogStateWrapper staviq 2023-09-04 02:22:26 +02:00
  • ebc96086af
    ggml-alloc : correctly check mmap return value for errors (#3075) b1198 slaren 2023-09-08 04:04:56 +02:00
  • 7f412dab9c
    enable CPU HBM (#2603) b1197 Kunshang Ji 2023-09-08 09:46:56 +08:00
  • b5b8ff9f51 ggml-alloc : correctly check mmap return value for errors slaren 2023-09-08 03:33:02 +02:00
  • 41bb1a5294 fix code style Kunshang Ji 2023-09-08 01:24:17 +00:00
  • be4bbfb331 retrigger ci Kunshang Ji 2023-09-05 00:59:23 +00:00
  • 6701d16b38 ggml : allow ggml_init with 0 size Georgi Gerganov 2023-09-04 22:57:33 +03:00
  • b2a0939787 Update llama.cpp Georgi Gerganov 2023-09-04 22:48:12 +03:00
  • 100d8e08ce Update ggml.c Georgi Gerganov 2023-09-04 22:47:40 +03:00
  • 7ba244b3e8 add memalign 0 byte check Kunshang Ji 2023-09-04 00:43:23 +00:00
  • eeb20c083d add cpu hbm support Kunshang Ji 2023-08-31 01:39:15 +00:00
  • ff441c97b8 Merge branch 'master' of github.com:ggerganov/llama.cpp Laura 2023-09-07 23:28:31 +02:00
  • bf7eeb731f
    Merge branch 'ggerganov:master' into master Marc 2023-09-07 15:09:52 -05:00
  • 6336d834ec
    convert : fix F32 ftype not being saved (#3048) Cebtenzzre 2023-09-07 14:27:42 -04:00
  • 00d62adb79
    fix some warnings from gcc and clang-tidy (#3038) b1195 Cebtenzzre 2023-09-07 13:22:29 -04:00
  • 2699cac032 Various other speedups for "small" kernels Iwan Kawrakow 2023-09-07 18:12:13 +02:00
  • c472ab4f69 fix error message in ggml_allocr_alloc to display actual max_avail xaedes 2023-08-29 22:49:01 +02:00
  • 47d5f35257 cmake : fix whitespace Cebtenzzre 2023-09-07 10:51:22 -04:00
  • 7c8c6ce085 metal: faster kernel_scale via float4 Iwan Kawrakow 2023-09-07 16:16:28 +02:00
  • 4fa2cc1750
    make : improve test target (#3031) b1194 Cebtenzzre 2023-09-07 10:15:01 -04:00
  • 5ffab089a5
    make : fix CPPFLAGS (#3035) b1193 Cebtenzzre 2023-09-07 10:13:50 -04:00
  • 9a9010609b Minor speed gains for all quantization types Iwan Kawrakow 2023-09-07 11:18:48 +02:00
  • 15b67a66c2
    llama-bench : use two tokens in the warmup run for prompt evals (#3059) b1192 slaren 2023-09-07 15:52:34 +02:00
  • be8c9c245b
    metal : parallel RoPE on Metal (#3024) Kawrakow 2023-09-07 15:45:01 +02:00
  • be6beeb8d7
    metal : correct fix of kernel_norm (#3060) Kawrakow 2023-09-07 15:42:42 +02:00
  • 7d6fac3f18
    Merge branch 'master' into ik/fix_kernel_norm Georgi Gerganov 2023-09-07 16:39:38 +03:00
  • eac2c7bb7d speculative: add --n-gpu-layers-draft option Feodor Kichatov 2023-09-07 15:19:23 +02:00
  • f9c8ccc12f Fix kernel_norm broken by ca82cf7 Iwan Kawrakow 2023-09-07 15:17:43 +02:00
  • 08c799a446 llama-bench : use two tokens in the warmup run for prompt evals slaren 2023-09-07 15:14:47 +02:00
  • c4f496648c
    metal : fix kernel_norm (fixes Falcon on Metal) (#3057) b1189 Georgi Gerganov 2023-09-07 15:49:09 +03:00
  • 2f689dee06
    metal : minor metal-fix-norm Georgi Gerganov 2023-09-07 15:33:21 +03:00
  • efac2d469f
    common : don't do warm-up with more than n_batch tokens (close #3058) Georgi Gerganov 2023-09-07 15:32:19 +03:00