Commit graph

  • 355c80f49e
    examples : simplify vim plugin (#2327) AustinMroz 2023-07-23 06:16:48 -05:00
  • 83a00ce69b
    metal : support bcast add & dup & cont op (#2323) Jiahao Li 2023-07-23 19:00:37 +08:00
  • c2a5188b3c Small Q5_K improvement on older GPUs Iwan Kawrakow 2023-07-23 13:48:21 +03:00
  • 97267a4e80 Enable pipe-friendly help output maddes8cht 2023-07-23 12:38:23 +02:00
  • 5f98fc2f90 refactor AVX code in ggml_vec_dot_q6_K_q8_K() katsu560 2023-07-23 19:28:48 +09:00
  • 8ffe7d55dd
    Support nix build '.#opencl' Wu Zhenyu 2023-07-23 18:14:18 +08:00
  • 4775602c9f add AVX to ggml_vec_dot_q6_K_q8_K() katsu560 2023-07-23 19:09:43 +09:00
  • 8194d591b2 help : fix gqa value for 70B Georgi Gerganov 2023-07-23 12:35:55 +03:00
  • c594992d6a py : oh boy .. Georgi Gerganov 2023-07-23 12:30:47 +03:00
  • 53809c9c26 Fix trailing whitespace in CMakeLists.txt 0cc4m 2023-07-23 11:28:15 +02:00
  • 2dac31b358 py : fix hparams parsing (if-else blocks) Georgi Gerganov 2023-07-23 12:26:21 +03:00
  • dfd2e0e5d6 add AVX to ggml_vec_dot_q5_K_q8_K() katsu560 2023-07-23 18:17:12 +09:00
  • 3fdc00f596 llama : support for GQA and LLaMAv2 70B Georgi Gerganov 2023-07-23 11:30:17 +03:00
  • 64d244a4dd Print max tensor size to stderr crasm 2023-07-23 05:07:08 -04:00
  • 56df218f56 add AVX to ggml_vec_dot_q4_K_q8_K() katsu560 2023-07-23 17:26:41 +09:00
  • 2e84eac7f6 Merge branch 'master' into concedo_experimental Concedo 2023-07-23 16:23:00 +08:00
  • aa05eadb6f Merge branch 'master' into concedo_experimental Concedo 2023-07-23 16:22:44 +08:00
  • 2024d40a00 add AVX to ggml_vec_dot_q3_K_q8_K() katsu560 2023-07-23 16:58:27 +09:00
  • 6afbb11c01 add AVX to ggml_vec_dot_q2_K_q8_K() katsu560 2023-07-23 16:00:22 +09:00
  • ec6d3d62a6 Faster Q5_K on CUDA Iwan Kawrakow 2023-07-23 09:42:36 +03:00
  • d2a43664f9
    Speed up Q4_K (#2322) master-d2a4366 Kawrakow 2023-07-23 08:49:20 +03:00
  • c1893cd966 CUDA: GQA implementation JohannesGaessler 2023-07-22 19:22:02 +02:00
  • 800a311c71 More general use-case for CLBLAST support (Linux and FreeBSD) Jose Yukiteru Amano 2023-07-22 22:46:05 -04:00
  • 1108232e30 Merge branch 'concedo' into concedo_experimental Concedo 2023-07-23 09:59:58 +08:00
  • 0cca0726fe reduce number of retries, fixed maxlength > maxctx bug Concedo 2023-07-23 09:59:34 +08:00
  • 56995caa48
    Fix mirostatv2. (#338) Ycros 2023-07-23 11:52:03 +10:00
  • 57875d97ac formatting slaren 2023-07-23 01:29:04 +02:00
  • adccc1dada alibi: use memcpy for float params slaren 2023-07-23 01:23:43 +02:00
  • f7e9d2785e fix assert slaren 2023-07-23 01:14:38 +02:00
  • c7801890c9 ggml: move op parameters from tensors to ggml_tensor::op_params slaren 2023-07-23 01:08:49 +02:00
  • 7808402a8b
    Linked TheBloke's GGML repos niansa/tuxifan 2023-07-23 00:54:38 +02:00
  • cd8fb6b7be
    Removed sharing warning for LLaMA 2 niansa/tuxifan 2023-07-23 00:49:39 +02:00
  • 7dc88a40d1 Fix Makefile for CLBLAST compile support and instructions for compile llama.cpp FreeBSD Jose Yukiteru Amano 2023-07-22 16:51:48 -04:00
  • 0e74a7222e Added whitespace escaping and unescaping goerch 2023-07-22 22:24:21 +02:00
  • b9b7d94fc1
    CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313) master-b9b7d94 Johannes Gäßler 2023-07-22 21:27:34 +02:00
  • dbccd8c901 examples: simplify vim plugin Austin Mroz 2023-07-22 13:48:17 -05:00
  • 97cb33ff8a Merge branch 'master' into logging_callback Helmut 2023-07-22 20:19:12 +02:00
  • b47b8a9cfe
    llama : optimize memory buffers (#2325) master-b47b8a9 Georgi Gerganov 2023-07-22 21:17:57 +03:00
  • 1ac8ff3593 Handle devices with only a single queue 0cc4m 2023-07-22 20:05:57 +02:00
  • 67843a3812 Reuse pinned allocation for f16 conversion 0cc4m 2023-07-22 18:48:15 +02:00
  • 94a0ee1eb8 More testing of the tokenizer goerch 2023-07-22 18:37:58 +02:00
  • f2d4ca34bf Reduce usage of waitIdle 0cc4m 2023-07-22 18:25:07 +02:00
  • b793fa9774
    llama : optimize memory buffers Georgi Gerganov 2023-07-22 18:23:50 +03:00
  • 3452095089 Unroll loops in dmmv shader 0cc4m 2023-07-22 17:46:52 +02:00
  • 2859562501 Run glslc commands in parallel 0cc4m 2023-07-22 17:42:34 +02:00
  • fa0270df7c added some checks to skip generation if busy Concedo 2023-07-22 23:10:04 +08:00
  • 2807d98fd4 touchup (+2 squashed commit) Concedo 2023-07-22 22:29:11 +08:00
  • 5b2fe744c8 Support bcast add & dup & cont op on MPS backend lijiahao 2023-07-22 22:10:02 +08:00
  • d273bfd2c9 allocator: cleanup, more comments ggml-backends slaren 2023-07-22 15:05:24 +02:00
  • 91317f7bec Speed up Q4_K Iwan Kawrakow 2023-07-22 16:04:31 +03:00
  • c8ae81756c Add possibly missing typename goerch 2023-07-22 14:27:56 +02:00
  • b5fe67f8c6
    Perplexity: Compute scores correlated to HellaSwag (#2312) master-b5fe67f klosax 2023-07-22 14:21:24 +02:00
  • 5141472e2b llama.cpp: print input/output buffers size slaren 2023-07-22 13:31:06 +02:00
  • e2b9575951 allocator cleanup slaren 2023-07-22 13:29:44 +02:00
  • bf665ccb05 Replace VLA with std::vector goerch 2023-07-22 12:44:35 +02:00
  • abd66a9c27
    Update perplexity.cpp klosax 2023-07-22 12:42:22 +02:00
  • 24baa54ac1
    examples : basic VIM plugin whoreson 2023-07-22 12:34:51 +02:00
  • f62bcfe9e4
    Update perplexity.cpp klosax 2023-07-22 12:25:51 +02:00
  • 3aec3038d4 bump scratch buffers Concedo 2023-07-22 18:12:18 +08:00
  • dd6c67d3cb
    ci : fix args Georgi Gerganov 2023-07-22 12:00:56 +03:00
  • 5d500e8ccf
    ci : add 7B CUDA tests (#2319) Georgi Gerganov 2023-07-22 11:48:22 +03:00
  • 699956154f
    ci : reduce CUDA ppl cunks down to 4 to save time Georgi Gerganov 2023-07-22 11:47:38 +03:00
  • 9a36dff0fd
    Update perplexity.cpp klosax 2023-07-22 10:31:43 +02:00
  • e52fe18837
    ci : increase CUDA TG len + add --ignore-eos Georgi Gerganov 2023-07-22 11:23:54 +03:00
  • 345b8b26df
    ci : bump CUDA ppl chunks Georgi Gerganov 2023-07-22 11:21:57 +03:00
  • 52a73b3f9c
    ci : add Q2_K to the tests Georgi Gerganov 2023-07-22 11:18:34 +03:00
  • 754ea680a6 Basic offloading support with mul_f32 and dmmv for q4_0 0cc4m 2023-07-22 10:16:18 +02:00
  • 52c5856a08 auto populate horde model name Concedo 2023-07-22 16:03:12 +08:00
  • b972e200b3
    ci : add 7B CUDA tests Georgi Gerganov 2023-07-22 10:38:23 +03:00
  • bf9be22fca
    Add files via upload whoreson 2023-07-22 07:48:58 +02:00
  • dd3f8dabed updated cluster to horde.koboldai.net Concedo 2023-07-22 12:42:40 +08:00
  • 236d0e8955 add tip about using other workers Concedo 2023-07-22 12:29:22 +08:00
  • 701bf0a6cd reduce sleep time between jobs Concedo 2023-07-22 11:56:43 +08:00
  • 343ae756fa Merge branch 'master' into concedo_experimental Concedo 2023-07-22 11:51:30 +08:00
  • 52c98228aa bugfixes for missing params Concedo 2023-07-22 11:37:44 +08:00
  • d7ab6adbc1 embedded horde worker is ready Concedo 2023-07-22 11:21:32 +08:00
  • 7de7882537 allocator: fix partial offloading slaren 2023-07-22 01:46:49 +02:00
  • 9f055e35d0 Add missing include goerch 2023-07-22 02:12:19 +02:00
  • c04a42de5b
    Merge branch 'ggerganov:master' into fix-#2023 goerch 2023-07-22 00:52:02 +02:00
  • 8c9d1e781e Fix typo goerch 2023-07-22 00:39:56 +02:00
  • ac793a21e8 Fix for #2023 goerch 2023-07-22 00:32:09 +02:00
  • 934eeb43d9 CUDA: Fixed 7b q3_K_S with mul_mat_vec_q JohannesGaessler 2023-07-22 00:30:05 +02:00
  • 43833f6c83
    Merge 39edee5136 into 7d5f18468c dewijones92 2023-07-21 23:13:14 +02:00
  • 545862ae48
    Update perplexity.cpp klosax 2023-07-21 21:25:44 +02:00
  • ebf009f63e
    Update common.cpp klosax 2023-07-21 21:21:35 +02:00
  • 68d2ca65e6
    Update common.h klosax 2023-07-21 21:19:45 +02:00
  • 7d5f18468c
    examples : add easy python script to create quantized (k-bit support) GGML models from local HF Transformer models (#2311) Richard Roberson 2023-07-21 13:01:10 -06:00
  • 1faad6ddac
    examples : rename to use dash instead of underscore Georgi Gerganov 2023-07-21 21:58:21 +03:00
  • a363c2bc60 Resync my fork with new llama.cpp commits richardr1126 2023-07-21 12:35:21 -06:00
  • 75064b4ada wip on embedded horde worker Concedo 2023-07-22 01:30:25 +08:00
  • 807ef887b2 fix white spaces lshzh-ww 2023-07-21 12:39:44 -04:00
  • 6ee897a501 metal: issue operations concurrently if possible lshzh-ww 2023-07-21 11:23:51 -04:00
  • 1c3030ee41 ggml: try to issue operations concurrently on GPU lshzh-ww 2023-07-21 11:23:18 -04:00
  • c8e6ef1846 metal: only encode in one command buffer lshzh-ww 2023-07-21 11:17:48 -04:00
  • fe8b79255b
    Obtaining LLaMA 2 instructions niansa/tuxifan 2023-07-21 17:16:40 +02:00
  • e87840f9fd allocator: automatic inplace operations slaren 2023-07-21 16:51:50 +02:00
  • d924522a46
    Custom RoPE + bettter memory management for CUDA (#2295) master-d924522 Kawrakow 2023-07-21 17:27:51 +03:00
  • 4d76a5f49b
    Faster Q3_K implementation on Metal (#2307) Kawrakow 2023-07-21 17:05:30 +03:00
  • 0db14fef06
    ggml : fix the rope fix (513f861953) master-0db14fe Georgi Gerganov 2023-07-21 15:16:55 +03:00
  • 11315b1d61
    llama : minor style changes Georgi Gerganov 2023-07-21 15:11:23 +03:00