Commit graph

  • c2df36d60d
    llama : consistently catch and throw only exceptions deriving from std::exception (#1599) master-c2df36d mgroeber9110 2023-06-05 22:24:29 +02:00
  • 9d0693bce3
    metal : use shared buffers between CPU and GPU (#1696) master-9d0693b kiltyj 2023-06-05 13:24:04 -07:00
  • e129f0bd76
    metal : remove unnecessary copies Georgi Gerganov 2023-06-05 23:23:00 +03:00
  • efe0507632
    ggml : fix internal overflow in ggml_time_us on Windows (#1702) master-efe0507 grahameth 2023-06-05 22:11:49 +02:00
  • 719962fc4f
    Merge branch 'master' into catch_std_exception Georgi Gerganov 2023-06-05 23:11:32 +03:00
  • e7fe66e670
    ci : disable auto tidy (#1705) master-e7fe66e Georgi Gerganov 2023-06-05 23:05:05 +03:00
  • fee3f7ed0a
    ci : disable auto tidy Georgi Gerganov 2023-06-05 23:03:33 +03:00
  • 99009e72f8
    ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) master-99009e7 Kawrakow 2023-06-05 22:56:18 +03:00
  • af275faece
    Merge branch 'master' into ik/k_quants Georgi Gerganov 2023-06-05 22:54:21 +03:00
  • 12d43443b2
    ggml : rename k_quants -> ggml-quants-k, use lowercase in code Georgi Gerganov 2023-06-05 22:53:07 +03:00
  • c79da27ac7 Merge branch 'master' into catch_std_exception mgroeber9110 2023-06-05 21:34:40 +02:00
  • c38b0bbf82 Only import unistd.h for Metal builds Kilty McGowan 2023-06-05 09:02:14 -07:00
  • 2e5edc80e0 updated lite Concedo 2023-06-05 23:56:24 +08:00
  • 79df932d0a added dropdown for blasbatch. added capability to build avx clblast but not in default build for now Concedo 2023-06-05 22:50:21 +08:00
  • 9caca8d11f ggml: Fix internal overflow in ggml_time_us on Windows grahameth 2023-06-05 16:43:22 +02:00
  • 20d5eef816 add examples of input floats ningshanwutuobang 2023-06-05 22:32:36 +08:00
  • a379e40ba9
    Update Makefile DaniAndTheWeb 2023-06-05 15:56:10 +02:00
  • 5673a8de37 fixed inpL shape and type ningshanwutuobang 2023-06-05 21:39:35 +08:00
  • 11af67866e Fixed single GPU performance regression JohannesGaessler 2023-06-05 14:32:20 +02:00
  • 9cec6a5ff0
    fix small typo in README.md Foul-Tarnished 2023-06-05 14:15:58 +02:00
  • c1b44240d7 Remove trailing whitespace Kilty McGowan 2023-06-05 05:02:38 -07:00
  • 5220a991a5
    Increase 3B scratch buffers. (#1698) master-5220a99 Henri Vasserman 2023-06-05 13:43:08 +03:00
  • 1855e0820e removed jni files (moved them to caller android project) George 2023-06-05 17:54:57 +08:00
  • c3b4efc89e updated CMakeLists.txt and added JNI implementation to support building this library as a dependency in Android Studio with NDK George 2023-06-05 17:25:51 +08:00
  • dffd2a710d
    Increase 3B scratch buffers. Henri Vasserman 2023-06-05 12:08:04 +03:00
  • d1f563a743
    llama : fix Metal KV cache sync (close #1695) master-d1f563a Georgi Gerganov 2023-06-05 10:19:03 +03:00
  • a9d0bea047 Page-align buffers used by Metal Kilty McGowan 2023-06-04 23:16:45 -07:00
  • 54dc75ce73 Merge branch 'concedo-opencl-dev' into concedo_experimental Concedo 2023-06-05 13:31:53 +08:00
  • f6431ded5d removed flags from the CL pool malloc, apply code tidying suggestions. Concedo 2023-06-05 13:31:37 +08:00
  • c27f250b6f bigger scratch buffer for 3B llama Concedo 2023-06-05 13:24:53 +08:00
  • bc04508666 Use MTLDevice.newBufferWithBytesNoCopy to share buffers between CPU and GPU Kilty McGowan 2023-06-04 21:30:54 -07:00
  • 9270056269 fixed compile error in cmake VS Concedo 2023-06-05 11:48:04 +08:00
  • 80891d1591 Update REAMDE.md (#1673) qingfengfenga 2023-06-05 11:06:18 +08:00
  • 5eed33f3b3
    Merge branch 'ggerganov:master' into master qingfengfenga 2023-06-05 11:00:12 +08:00
  • 073b4b8ba5 fix(avx): workaround for missing _mm256_setr_m128i in GCC < 8 xingchensong 2023-06-05 09:27:07 +08:00
  • 4f9640b8fe Tensor parallelism JohannesGaessler 2023-05-24 14:29:21 +02:00
  • 971920e935 ggml_cuda_compute_forward JohannesGaessler 2023-05-24 12:55:50 +02:00
  • 071dcd351b CUDA op template JohannesGaessler 2023-05-23 09:17:31 +02:00
  • 827f5eda91
    readme : update hot topics Georgi Gerganov 2023-06-04 23:38:19 +03:00
  • ecb217db4f
    llama : Metal inference (#1642) master-ecb217d Georgi Gerganov 2023-06-04 23:34:30 +03:00
  • 95eaed63a7
    Merge ac7a69fa33 into dcb2ed4826 Howard Su 2023-06-04 22:50:31 +03:00
  • 82cfd1b395 Added tensor layer numbers Daniel Kuntz 2023-06-04 15:06:05 -04:00
  • 324e823afd
    readme : add example for main Georgi Gerganov 2023-06-04 18:50:09 +03:00
  • e33002d42e
    readme : add Metal instructions Georgi Gerganov 2023-06-04 18:48:35 +03:00
  • db3db9e774
    metal : clean-up stuff, fix typos Georgi Gerganov 2023-06-04 18:19:08 +03:00
  • b252acbcb6
    metal : add comments Georgi Gerganov 2023-06-04 18:10:28 +03:00
  • d8a7486d17
    Revert "ci : disable temporary" Georgi Gerganov 2023-06-04 17:58:23 +03:00
  • a7fb899c53
    metal : final refactoring and simplification Georgi Gerganov 2023-06-04 17:57:02 +03:00
  • 32a5f3a601 Had unintentionally committed the Makefile with -Ofast enabled Iwan Kawrakow 2023-06-04 17:35:56 +03:00
  • b7fb1aa233 removed build info in cmake Concedo 2023-06-04 22:34:27 +08:00
  • 6f66e4c4a5 updated lite Concedo 2023-06-04 22:27:15 +08:00
  • 9aa2d8535b hide gpu input box when dropdown not selected, minor memory fix for neox and gptj Concedo 2023-06-04 21:47:17 +08:00
  • b4aad3add4
    Merge a1cdd29cd2 into dcb2ed4826 Georgi Gerganov 2023-06-04 06:02:26 -05:00
  • 1ddbb9acd9 Merge branch 'concedo-opencl-dev' into concedo_experimental Concedo 2023-06-04 18:07:27 +08:00
  • 64e3e74556 change max value size_t to use limits Concedo 2023-06-04 18:04:52 +08:00
  • 2b700749e5
    Merge branch 'master' into concedo-opencl-dev LostRuins 2023-06-04 18:00:06 +08:00
  • dd4b5c64b8 Merge branch 'master' into concedo_experimental Concedo 2023-06-04 17:38:22 +08:00
  • 431693cb10 Added forgotten ggml.o dependence on k_quants.h to the Makefile Iwan Kawrakow 2023-06-04 11:28:50 +03:00
  • e26cd6b483
    mtl : remove temp / debug code Georgi Gerganov 2023-06-04 11:23:36 +03:00
  • e4b522232c
    mtl : clean-up ggml mtl interface + suport scratch / inplace Georgi Gerganov 2023-06-04 10:38:21 +03:00
  • 18e482a89c
    mtl : preparing for merge Georgi Gerganov 2023-06-04 09:27:27 +03:00
  • dcb2ed4826
    OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653) master-dcb2ed4 0cc4m 2023-06-04 08:12:05 +02:00
  • 88919095b5 edit readme Concedo 2023-06-04 12:09:49 +08:00
  • c3c05fc33b further cleanup, refactor renamemode to hordeconfig Concedo 2023-06-04 11:57:46 +08:00
  • 2868fac676 Merge branch 'master' into concedo_experimental Concedo 2023-06-04 11:07:07 +08:00
  • 20803c221e cleaning up some old junk Concedo 2023-06-04 11:05:46 +08:00
  • b62279cb39 buf size for starcoder still not good Concedo 2023-06-04 00:41:08 +08:00
  • 0a71a4e6d3 Fix docker build Iwan Kawrakow 2023-06-03 19:03:15 +03:00
  • 6ef13823b8 Fix quantization error test Iwan Kawrakow 2023-06-03 18:41:45 +03:00
  • d8bd0013e8
    Add info about CUDA_VISIBLE_DEVICES (#1682) Henri Vasserman 2023-06-03 16:35:20 +03:00
  • 4c88864715
    Add info about CUDA_VISIBLE_DEVICES Henri Vasserman 2023-06-03 16:23:33 +03:00
  • b5c85468a3
    Docker: change to calling convert.py (#1641) Jiří Podivín 2023-06-03 14:11:53 +02:00
  • 6a14cd4d3e Setting up flake8 and pre-commit hooks Jiri Podivin 2023-06-03 11:54:24 +02:00
  • 8f5d42db9b Minor Iwan Kawrakow 2023-06-03 14:46:57 +03:00
  • abd99a89a7 A slightly faster ARM_NEON A4_K dot product Iwan Kawrakow 2023-06-03 11:37:53 +03:00
  • 894210a351 A slightly daster Q4_K AVX2 dot product Iwan Kawrakow 2023-06-02 17:28:38 +03:00
  • 9a9c5a0c80 A 10% faster CUDA vector dot kernel for Q3_K Iwan Kawrakow 2023-06-01 15:22:12 +03:00
  • c5959d53ff Don't print zeros/NaNs when no count histogram has been collected Iwan Kawrakow 2023-06-01 14:07:42 +03:00
  • e51ce72e03 Fixed bug in Q2_K CUDA dot product kernel Iwan Kawrakow 2023-06-01 14:01:25 +03:00
  • 7bcc37676a A slightly faster ARM_NEON Q2_K dot Iwan Kawrakow 2023-06-01 11:27:14 +03:00
  • 6ec70579cb Adding ARM_NEON Q2_K dot Iwan Kawrakow 2023-06-01 00:31:52 +03:00
  • 8516fdf728 Adding scalar and AVX2 Q2_K dot Iwan Kawrakow 2023-05-31 22:28:55 +03:00
  • b439efb712 Adding Q2_K - just CUDA for now Iwan Kawrakow 2023-05-31 18:09:31 +03:00
  • 4faa040c20 A very slightly faster ARM_NEON Q3_K dot Iwan Kawrakow 2023-05-31 08:46:30 +03:00
  • 13264fa067 Adding Q3_K dot for ARM_NEON Iwan Kawrakow 2023-05-30 14:18:47 +03:00
  • a197eb50d1 Q5_K dot product for ARM_NEON Iwan Kawrakow 2023-05-30 12:31:21 +03:00
  • 5ca15ce155 Q6_K dot product for ARM_NEON Iwan Kawrakow 2023-05-30 11:22:53 +03:00
  • a2533a72a3 Q4_K dot product for ARM_NEON Iwan Kawrakow 2023-05-30 10:02:54 +03:00
  • 54f808db2b Quantization mixes: didn't quite get what I wanted in the last commit Iwan Kawrakow 2023-05-29 22:09:46 +03:00
  • d537b97cb8 Adding quantization mixes Iwan Kawrakow 2023-05-29 20:10:56 +03:00
  • 5c5191ab68 Per convention, all QX_K quantizations use Q5_K for output.weight Iwan Kawrakow 2023-05-29 19:32:43 +03:00
  • b835d0f49f Adding Q5_K - scalar, AVX2, CUDA Iwan Kawrakow 2023-05-29 18:57:04 +03:00
  • cf221afb55 Adding Q6_K - scalar, AVX2, CUDA Iwan Kawrakow 2023-05-29 16:02:54 +03:00
  • a0b8e9f3c9 Adding Q4_K - scalar, AVX2, CUDA Iwan Kawrakow 2023-05-29 14:30:17 +03:00
  • 3d8b1de3f7 Some more CUDA optimizations for Q3_K Iwan Kawrakow 2023-05-29 09:16:45 +03:00
  • a3c0673089 Some improvement for Q3_K on CUDA Iwan Kawrakow 2023-05-28 22:21:25 +03:00
  • c93cce3a45 Q3_K now working on CUDA and AVX2/scalar Iwan Kawrakow 2023-05-28 21:38:00 +03:00
  • b4f71347ff Adding Q3_K and Q8_K (de)-quantization Iwan Kawrakow 2023-05-27 20:26:36 +03:00
  • 8673a41385 Starting to add k-quantization to ggml Iwan Kawrakow 2023-05-27 19:10:49 +03:00
  • 136476e898
    Fix prompt cache saving and chat-persistent rollover (#1678) master-136476e Evan Jones 2023-06-03 07:28:45 -04:00