Commit graph

  • 1cbf561466
    metal : new q4_0 matrix-vector kernel (#2188) Shouzheng Liu 2023-07-12 16:10:55 -04:00
  • 5150582bb3 metal: New q4_0 matrix-vector kernel lshzh-ww 2023-07-12 14:43:10 -04:00
  • 975221e954
    ggml : broadcast mul_mat + conv batch support (#2199) Georgi Gerganov 2023-07-12 20:51:29 +03:00
  • 521bc7b4c5
    ggml : apply mul_mat broadcast fix by @jploski Georgi Gerganov 2023-07-12 20:49:54 +03:00
  • 2e3326a939
    ggml : broadcast mul_mat + conv batch support Georgi Gerganov 2023-07-12 20:41:05 +03:00
  • 4523d10d0c ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +03:00
  • 680e6f9177 cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +03:00
  • 64e4602463
    ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +03:00
  • f3b5b4f9f2
    cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +03:00
  • 1a1c6d9c2b Add functions that works directly with model Bach Le 2023-07-12 23:29:13 +08:00
  • 4d3ce352eb Remove vocab reference from context Bach Le 2023-07-12 23:09:58 +08:00
  • b723fe7028 Track and free temporary ggml_tensor_extra_gpu struct created during eval Bach Le 2023-07-12 22:59:04 +08:00
  • 2cca222a54
    Add missing quotes to bash script Bodo Graumann 2023-07-12 16:45:08 +02:00
  • 4e7464ef88
    FP16 is supported in CM=6.0 (#2177) master-4e7464e Howard Su 2023-07-12 20:18:40 +08:00
  • a53a59afe3 Support broadcast add & mul on CUDA (fixed) lijiahao 2023-07-12 18:21:31 +08:00
  • a95e105acd
    building PTX code for both of 60 and 61 Howard Su 2023-07-12 16:41:56 +08:00
  • 2b5eb72e10
    Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189) master-2b5eb72 Johannes Gäßler 2023-07-12 10:38:52 +02:00
  • f7d278faf3
    ggml : revert CUDA broadcast changes from #2183 (#2191) master-f7d278f Georgi Gerganov 2023-07-12 10:54:19 +03:00
  • 95fff695cb ggml : revert CUDA broadcast changes from #2183 Georgi Gerganov 2023-07-12 10:49:27 +03:00
  • dfdadc0dd9 Hotfix for the prompt being ignored with CUDA JohannesGaessler 2023-07-12 09:36:39 +02:00
  • 69391e09fa Fixed __dp4a compute capability: 6.0 -> 6.1 JohannesGaessler 2023-07-12 09:13:00 +02:00
  • 5941514e95 Merge commit '5bf2a27718' into concedo_experimental Concedo 2023-07-12 13:05:16 +08:00
  • 8f4ed0d18c fixed cmake, 8bit MMV should be working now Concedo 2023-07-12 11:22:55 +08:00
  • 1c3ab205d0 add top-k to web ui Yazan Agha-Schrader 2023-07-12 05:10:22 +02:00
  • 7516488550
    fix compilation (#313) Sammy 2023-07-12 04:44:56 +02:00
  • b2e071dd86 Merge remote-tracking branch 'upstream/master' into grammar Evan Jones 2023-07-11 21:51:50 -04:00
  • 20d7740a9b
    ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) master-20d7740 Georgi Gerganov 2023-07-11 22:53:34 +03:00
  • f43d6c7c46
    ggml : sync (abort callback, mul / add broadcast, fix alibi) Georgi Gerganov 2023-07-11 22:22:19 +03:00
  • 5bf2a27718
    ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) master-5bf2a27 Spencer Sutton 2023-07-11 12:31:10 -04:00
  • e902c49d24
    mpi : adapt to new ggml_tensor->src Georgi Gerganov 2023-07-11 19:24:08 +03:00
  • c9c74b4e3f
    llama : add classifier-free guidance (#2135) master-c9c74b4 Bach Le 2023-07-12 00:18:43 +08:00
  • 3ec7e596b2
    docker : add '--server' option (#2174) Jinwoo Jeong 2023-07-12 01:12:35 +09:00
  • 917831c63a
    readme : fix zig build instructions (#2171) Chad Brewbaker 2023-07-11 11:03:06 -05:00
  • 7c0fbc2f12 Update train-text-from-scratch for change Spencer Sutton 2023-07-11 11:19:29 -04:00
  • e7251ab827 Add ggml changes Spencer Sutton 2023-07-11 11:18:33 -04:00
  • afcb8fe0c4
    Add new config option Henri Vasserman 2023-07-11 18:09:27 +03:00
  • 8c2c4978a3
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-11 17:53:54 +03:00
  • e610466307
    Expand arch list and make it overrideable Henri Vasserman 2023-07-11 17:53:14 +03:00
  • 2347463201
    Support using mmap when applying LoRA (#2095) master-2347463 Howard Su 2023-07-11 22:37:01 +08:00
  • bbef28218f
    Possible solution to allow K-quants on models with n_vocab!=32000 (#2148) master-bbef282 LostRuins 2023-07-11 22:01:08 +08:00
  • a286776435 updated lite Concedo 2023-07-11 21:48:01 +08:00
  • 1d1111e10f expose timing info in web api Concedo 2023-07-11 18:56:06 +08:00
  • 7222877069 Merge remote-tracking branch 'ren/concedo' into concedo_experimental Concedo 2023-07-11 18:45:36 +08:00
  • 5ca204d527 Merge remote-tracking branch 'yellowrose/pr/open/LostRuins/koboldcpp/multigpu-cuda-gui' into concedo_experimental Concedo 2023-07-11 18:22:54 +08:00
  • 4be167915a added linear rope option, added warning for bad samplers Concedo 2023-07-11 18:08:19 +08:00
  • 397da62002 FP16 is supported in CM=6.0 Howard Su 2023-07-11 17:43:10 +08:00
  • 2ab2da2eb4 Update comment to reflect the support lora with mmap Howard Su 2023-07-05 14:21:31 +08:00
  • 1d4b687ee6 Fix Linux Howard Su 2023-07-04 18:16:04 +08:00
  • d4e58cbf94 Support using mmap when applying LoRA Howard Su 2023-07-04 16:05:26 +08:00
  • b0b131499f Merge branch 'master' into concedo_experimental Concedo 2023-07-11 16:12:15 +08:00
  • 694fce3a0b
    Add '--server' option to run './server' script Jinwoo Jeong 2023-07-11 14:19:15 +09:00
  • 014fbfd4a9 add unicode escapes Evan Jones 2023-07-10 23:26:09 -04:00
  • b9fa12d360 fix zig build readme Chad Brewbaker 2023-07-10 22:05:07 -05:00
  • 2777168618 Porting MPI PR to Darwin OpenMPI Chad Brewbaker 2023-07-10 17:49:14 -05:00
  • 45e5df66da XgenVocab fix from @smdesai Aman Karmani 2023-07-10 11:06:05 -07:00
  • abf164d71e Fix styling based on review Bach Le 2023-07-10 23:50:17 +08:00
  • 5656d10599
    mpi : add support for distributed inference via MPI (#2099) master-5656d10 Evan Miller 2023-07-10 11:49:56 -04:00
  • eaef2d0e76
    mpi : extend API to allow usage with outer backends (e.g. Metal) Georgi Gerganov 2023-07-10 18:47:24 +03:00
  • c3c3ef11a6
    mpi : factor out recv / send in functions and reuse Georgi Gerganov 2023-07-10 18:35:38 +03:00
  • 11ebfea8c0 Merge branch 'kquant_vocab_fix' into concedo_experimental Concedo 2023-07-10 23:28:48 +08:00
  • fd9a2fdfe2 As an alternative, to avoid failing on Metal due to lack of Q8_0 support, instead quantize tok_embeddings.weight to Q4_0 and retain output.weight as F16. This results in a net gain of about 55mb for a 7B model compared to previous approach, but should minimize adverse impact to model quality. Concedo 2023-07-10 23:22:45 +08:00
  • 048dca9809
    Fix indentation LostRuins 2023-07-10 22:57:15 +08:00
  • 9324cb804a reimplemented save and load Concedo 2023-07-10 22:49:27 +08:00
  • 50097e6c7f Merge branch 'master' into concedo_experimental Concedo 2023-07-10 20:08:27 +08:00
  • 523fc3be52 fixed rwkv, standardized new ctx usage Concedo 2023-07-10 20:05:53 +08:00
  • 2827920044 fix compile errors, rwkv not working Concedo 2023-07-10 18:23:25 +08:00
  • f1014f3cc7 remove unused .re YellowRoseCx 2023-07-10 00:26:40 -05:00
  • 80e4e548bf
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-10 02:09:28 +03:00
  • 242f01e983 Add Multi-GPU CuBLAS support in the new GUI YellowRoseCx 2023-07-09 17:10:14 -05:00
  • ada1a2aa8b [mpi] use MPI_INT32_T Evan Miller 2023-07-09 15:37:33 -04:00
  • b18e4ad4ac Merge branch 'mpi' of github.com:evanmiller/llama.cpp into mpi Evan Miller 2023-07-09 15:32:36 -04:00
  • 666a15aeb4 Merge remote-tracking branch 'refs/remotes/origin/mpi' into mpi Evan Miller 2023-07-09 15:32:10 -04:00
  • 00b8aa1e66
    tests : fix new llama_backend API Georgi Gerganov 2023-07-09 22:31:54 +03:00
  • f085a57d1a [mpi] Link MPI C++ libraries to fix OpenMPI Evan Miller 2023-07-09 15:31:53 -04:00
  • 166db36c51
    mpi : fix after master merge Georgi Gerganov 2023-07-09 22:23:04 +03:00
  • 0492363137
    mpi : fix after master merge refactor-mpi Georgi Gerganov 2023-07-09 22:23:04 +03:00
  • 1c3a15c5d4
    Merge pull request #1 from ggerganov/refactor-mpi Evan Miller 2023-07-09 15:23:04 -04:00
  • 81c5ddd532
    Merge branch 'mpi' into refactor-mpi Georgi Gerganov 2023-07-09 22:20:14 +03:00
  • 03cc12be0d [mpi] continue-on-error: true Evan Miller 2023-07-09 15:10:43 -04:00
  • 4a9a4748e9 Add OpenMPI to GH action Evan Miller 2023-07-09 15:05:58 -04:00
  • 0f557c2ac4 Merge branch 'master' into mpi Evan Miller 2023-07-09 15:02:19 -04:00
  • 9da9d26c70
    mpi : minor Georgi Gerganov 2023-07-09 18:38:32 +03:00
  • beadbf3380
    mpi : fix inference Georgi Gerganov 2023-07-09 18:26:20 +03:00
  • ef37dd14e7
    mpi : fix output tensor after MPI compute (still not working) Georgi Gerganov 2023-07-09 17:01:08 +03:00
  • 8dd585e8cb Variable matmul kernel using specialization constants 0cc4m 2023-07-09 15:50:28 +02:00
  • c717c5185f
    mpi : various fixes - communication now works but results are wrong Georgi Gerganov 2023-07-09 16:40:16 +03:00
  • 01abb3b3b9
    mpi : move all MPI logic into ggml-mpi Georgi Gerganov 2023-07-09 16:04:27 +03:00
  • e339d35579
    mpi : add names for layer inputs + prep ggml_mpi_graph_compute() Georgi Gerganov 2023-07-09 14:42:36 +03:00
  • 3232db628c
    mpi : trying to move more MPI stuff into ggml-mpi (WIP) (#2099) Georgi Gerganov 2023-07-09 14:08:53 +03:00
  • 3bc7a80ca6 Rework command buffer handling 0cc4m 2023-07-09 11:37:32 +02:00
  • 1d16309969
    llama : remove "first token must be BOS" restriction (#2153) master-1d16309 oobabooga 2023-07-09 05:59:53 -03:00
  • db4047ad5c
    main : escape prompt prefix/suffix (#2151) master-db4047a Nigel Bosch 2023-07-09 03:56:18 -05:00
  • 18780e0a5e
    readme : update Termux instructions (#2147) JackJollimore 2023-07-09 05:20:43 -03:00
  • 3bbc1a11f0
    ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115) master-3bbc1a1 clyang 2023-07-09 16:12:20 +08:00
  • 2492a53fd0
    readme : add more docs indexes (#2127) rankaiyx 2023-07-09 15:38:42 +08:00
  • 3d9871715a
    Remove "first token must be BOS" restriction oobabooga 2023-07-08 23:50:19 -03:00
  • 83db5cffed Escape prompt prefix/suffix Nigel Bosch 2023-07-08 18:51:02 -05:00
  • b90c80bdbf Add __restrict__ to dequantize_mul_mat kernels JohannesGaessler 2023-07-08 22:53:43 +02:00
  • 0ef62f511a Fix validation errors, improve compatibility with AMD GPUs 0cc4m 2023-07-08 20:40:19 +02:00
  • 64639555ff
    Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) master-6463955 Johannes Gäßler 2023-07-08 20:01:44 +02:00