Commit graph

  • fb469ed972 fma compile only jon-chuang 2023-04-30 18:24:56 +08:00
  • 876dcec301 Various fixes to mat_mul benchmark Stephan Walter 2023-04-30 12:19:55 +02:00
  • 74a8db7ade Merge branch 'master' of https://github.com/ggerganov/llama.cpp into jon/tall-and-skinny-matmul jon-chuang 2023-04-30 18:19:48 +08:00
  • 78761b10b6 minor jon-chuang 2023-04-30 18:15:53 +08:00
  • 2fbc90f25e minor jon-chuang 2023-04-30 18:14:41 +08:00
  • 496d291d67 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into jon/use-hardware-cores jon-chuang 2023-04-30 18:13:49 +08:00
  • f1c19d8884 remove jon-chuang 2023-04-30 18:11:52 +08:00
  • 710c4bbdbf
    Apply suggestions from code review jon-chuang 2023-04-30 18:10:08 +08:00
  • e69c924ad1 Use two memcpy calls for q5_0 buffer transfer 0cc4m 2023-04-30 10:44:48 +02:00
  • fdd21d0eba add missing include Concedo 2023-04-30 16:15:11 +08:00
  • 3e5aa8a1c4
    ggml : fix labels for GGML_OP_ALIBI master-3e5aa8a Georgi Gerganov 2023-04-30 10:25:46 +03:00
  • b3315459c7 pilled the new dequants for clblast, fixed some ooms Concedo 2023-04-30 14:15:44 +08:00
  • bd5e7409f3
    Fixed incorrect example of quantize in README.md D3faIt 2023-04-30 06:43:46 +02:00
  • 0061b90ec6 Merge branch 'master' into concedo_experimental Concedo 2023-04-30 10:35:02 +08:00
  • 8e739a091f Compress llama state Ivan Stepanov 2023-04-30 03:16:46 +03:00
  • 476f46f7cc cuBLAS: do not use pinned memory if env variable GGML_CUDA_NO_PINNED is set Slaren 2023-04-29 22:25:00 +02:00
  • c3ca7a5f05
    ggml : fix 32-bit ARM NEON master-c3ca7a5 Georgi Gerganov 2023-04-29 21:34:23 +03:00
  • e859ebbb48 ggml: use __restrict instead of restrict on MS compiler to prevent compiler error on VS2017 and VS2019. Helmut 2023-04-29 20:31:08 +02:00
  • e8c051611a
    ggml : use vzip instead of vuzp for consistency master-e8c0516 Georgi Gerganov 2023-04-29 21:12:56 +03:00
  • 0b5a935099
    ggml : fix visibility and unused warnings master-0b5a935 Georgi Gerganov 2023-04-29 19:28:36 +03:00
  • 08e539d5e4 cuBLAS: fall back to pageable memory if pinned alloc fails Slaren 2023-04-29 18:25:46 +02:00
  • ec728e44d7
    ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) master-ec728e4 Georgi Gerganov 2023-04-29 18:43:42 +03:00
  • 214b6a3570
    ggml : adjust mul_mat_f16 work memory (#1226) master-214b6a3 Georgi Gerganov 2023-04-29 18:43:28 +03:00
  • db0604121a Handle signals properly on Windows Danny Daemonic 2023-04-22 04:01:02 -07:00
  • f51988952a
    ggml : fix #if for f32_f32 mul_mat (CLBlast) Georgi Gerganov 2023-04-29 14:45:12 +03:00
  • f149114395 up ver Concedo 2023-04-29 19:42:21 +08:00
  • 7afad2b9b5 integrated the new samplers Concedo 2023-04-29 19:41:41 +08:00
  • 658c686e5a
    ggml : add asserts to guard for incorrect wsize Georgi Gerganov 2023-04-29 14:26:36 +03:00
  • 0ffcd89870
    ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS Georgi Gerganov 2023-04-29 12:41:54 +03:00
  • 150e135858
    llama : minor - remove explicity int64_t cast Georgi Gerganov 2023-04-29 11:45:23 +03:00
  • 305eb5afd5
    build : fix reference to old llama_util.h master-305eb5a Georgi Gerganov 2023-04-29 13:53:12 +03:00
  • 84ca9c2ecf
    examples : fix save-load-state + rename llama-util.h Georgi Gerganov 2023-04-29 13:48:11 +03:00
  • da0c34b028 Merge branch 'master' into concedo_experimental Concedo 2023-04-29 18:27:06 +08:00
  • fe0e4de8e8 fixed a regression where a bad model was giving valid logits after library changes. now we run the eval through the model twice and compare logits. if they give the same logits for different inputs, model is broken Concedo 2023-04-29 18:25:17 +08:00
  • 369d903eda Move cl kernels into ggml-opencl.c 0cc4m 2023-04-29 10:48:52 +02:00
  • d6be497ef6 Fix q8_0 dequant kernel 0cc4m 2023-04-29 10:37:58 +02:00
  • 1560c10f24 Work around q5_0 OpenCL issue 0cc4m 2023-04-29 10:26:58 +02:00
  • 9439da6f95 Implement q5_0, q5_1 and q8_0 0cc4m 2023-04-29 07:43:15 +02:00
  • d8ea75e952
    Merge 'origin/master' into hipblas Henri Vasserman 2023-04-29 11:25:51 +03:00
  • 334637e43e
    common : change default parameters to pre-#1126 (#1223) master-334637e Georgi Gerganov 2023-04-29 09:51:06 +03:00
  • 17d3938955
    common : change default parameters to pre-#1126 Georgi Gerganov 2023-04-29 09:22:34 +03:00
  • dd7eff57d8
    llama : new sampling algorithms (#1126) master-dd7eff5 Ivan Stepanov 2023-04-29 08:34:41 +03:00
  • 5aa185f3f7 remove preallocation Concedo 2023-04-29 12:32:37 +08:00
  • bb282a4ecf reinstated the q4_3 format, for backwards compatibility. Concedo 2023-04-29 11:42:04 +08:00
  • 0fc1772a8f Merge branch 'master' into concedo_experimental Concedo 2023-04-29 11:14:05 +08:00
  • 67ee2b93a7 removed bad import. Concedo 2023-04-29 09:59:16 +08:00
  • d67f481144 Speedup dequantize_block_q4_0() Ivan Komarov 2023-04-29 00:46:25 +03:00
  • 70ce50d377
    Merge 5bac24d2f7 into 7fc50c051a TheNotary 2023-04-29 08:23:11 +08:00
  • 7fc50c051a
    cuBLAS: use host pinned memory and dequantize while copying (#1207) master-7fc50c0 slaren 2023-04-29 02:04:18 +02:00
  • 38a021fafe fix rebase Slaren 2023-04-29 01:55:50 +02:00
  • 3cf2247d37 cuBLAS: also pin kv cache Slaren 2023-04-28 00:48:01 +02:00
  • d5d6a8083a cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory Slaren 2023-04-27 23:27:59 +02:00
  • 2dd6deeb49 cuBLAS: use host pinned memory Slaren 2023-04-27 21:51:43 +02:00
  • d3fd04e92e cuBLAS: dequantize simultaneously while copying memory Slaren 2023-04-27 20:16:32 +02:00
  • b1ee8f59b4
    cuBLAS: non-contiguous tensor support (#1215) master-b1ee8f5 Henri Vasserman 2023-04-29 02:31:56 +03:00
  • 36d19a603b
    Remove Q4_3 which is no better than Q5 (#1218) master-36d19a6 Stephan Walter 2023-04-28 23:10:43 +00:00
  • d194586f65
    Merge 'origin/master' into hipblas Henri Vasserman 2023-04-28 23:03:52 +03:00
  • f571806da7 Windows test fix Ivan Stepanov 2023-04-28 22:12:25 +03:00
  • f1ec8b422d Review suggestions: comments for removed enum values Stephan Walter 2023-04-28 20:44:13 +02:00
  • 924309a248 Remove Q4_3 which is no better than Q5 Stephan Walter 2023-04-28 19:43:20 +02:00
  • 7f15c5c477
    readme : update hot topics Georgi Gerganov 2023-04-28 21:32:52 +03:00
  • 55390bcaf2
    ggml : sync ggml (ggml_alibi) master-55390bc Georgi Gerganov 2023-04-28 20:37:43 +03:00
  • 4ab7bb77c0 Windows build fix Ivan Stepanov 2023-04-28 20:42:44 +03:00
  • 3bf3a968b6 Tests Ivan Stepanov 2023-04-28 20:36:53 +03:00
  • 416f49182a Save and load example adjust Ivan Stepanov 2023-04-28 20:19:17 +03:00
  • 6c4c88d54f Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k) Ivan Stepanov 2023-04-28 19:53:24 +03:00
  • 61f822f63b Added --logit-bias and --no-penalize-nl, removed std::span Ivan Stepanov 2023-04-28 03:12:49 +03:00
  • f01c67fe55 mirostat Ivan Stepanov 2023-04-22 21:23:10 +03:00
  • 9b3b07cc5c Sample interface, new samplers. Ivan Stepanov 2023-04-22 14:31:08 +03:00
  • 5fba3c016b
    examples : add Jeopardy example (#1168) CRD716 2023-04-28 11:13:33 -05:00
  • d8b4b15722
    Merge branch 'ggerganov:master' into master Wen Shi 2023-04-29 00:00:56 +08:00
  • 1481a9cf25
    llama : add session file format and saved sessions in main (#1169) master-1481a9c Evan Jones 2023-04-28 11:59:37 -04:00
  • ef7bfbad54
    Merge pull request #1 from shiqinwen/shiqinwen-correct-readme-quantize-param Wen Shi 2023-04-28 23:58:52 +08:00
  • ef551af6c1
    Correct the parameters of type given. Wen Shi 2023-04-28 23:58:30 +08:00
  • f19ee3b2ec
    now then? Henri Vasserman 2023-04-28 18:50:07 +03:00
  • 759510534c
    more fixes, now OpenBLAS and CLBlast build too Henri Vasserman 2023-04-28 18:31:56 +03:00
  • 78c66dfdc2
    Merge 1506737499 into 11d902364b Pavol Rusnak 2023-04-28 17:26:19 +02:00
  • 5495990d9d
    fix error Henri Vasserman 2023-04-28 18:25:57 +03:00
  • 4634aad647
    Merge 'origin/master' Henri Vasserman 2023-04-28 18:19:51 +03:00
  • 9a7ca5d827
    rename Henri Vasserman 2023-04-28 18:16:02 +03:00
  • 11d902364b
    ggml : add helper debug printf in soft_max master-11d9023 Georgi Gerganov 2023-04-28 17:58:44 +03:00
  • 7296c961d9
    ggml : add CLBlast support (#1164) master-7296c96 0cc4m 2023-04-28 16:57:16 +02:00
  • 4530d5c3c4
    Merge branch 'master' into clblast-llama-cpp Georgi Gerganov 2023-04-28 17:56:36 +03:00
  • 87864d99a9
    remove extra stuff Henri Vasserman 2023-04-28 17:47:55 +03:00
  • d9bc43c555
    Cuda: non-contiguous tensor support Henri Vasserman 2023-04-28 17:38:22 +03:00
  • 78ec543733
    Correcting link to w64devkit (#1214) Folko-Ven 2023-04-28 19:22:48 +05:00
  • 181d28d094
    Correcting link to w64devkit Folko-Ven 2023-04-28 19:10:13 +05:00
  • 92a6e13a31
    Add Manjaro CUDA include and lib dirs to Makefile (#1212) master-92a6e13 Johannes Gäßler 2023-04-28 15:40:32 +02:00
  • 2ab9d11f37
    Merge 'origin/master' into hipblas Henri Vasserman 2023-04-28 16:30:05 +03:00
  • 04aaae1d79
    add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) master-04aaae1 Yann Follet 2023-04-28 19:59:48 +08:00
  • f75de52b25 add short delay before exit gui Concedo 2023-04-28 15:09:17 +08:00
  • 3b4a53138f Merge 'origin/master' into hipblas Henri Vasserman 2023-04-28 10:08:41 +03:00
  • a1caa48611 add more cuda defines Henri Vasserman 2023-04-28 10:08:21 +03:00
  • e97c7099b0 created new tkinter GUI Concedo 2023-04-28 15:03:48 +08:00
  • d9b5a5975b Add Manjaro CUDA include and lib dirs to Makefile JohannesGaessler 2023-04-28 08:25:41 +02:00
  • 032a171867 integrated q5 formats Concedo 2023-04-28 12:58:39 +08:00
  • e8a389f85b updated kobold lite, added debug mode, changed streaming mode to now use the same url when launching Concedo 2023-04-28 11:41:03 +08:00
  • e309138de7 add avx2 for dot_q8_0_q8_0, 2x faster than scalar Yann Follet 2023-04-28 01:30:19 +00:00
  • ecc056519f only .cu file needs to be complied as device Henri Vasserman 2023-04-28 01:58:27 +03:00
  • 5bac24d2f7 adds performance disclaimer on p9 file sharing via symlinks TheNotary 2023-04-27 14:47:57 -05:00