Commit graph

  • 4282f9b0f3 Update quantize_row_q4_1 for PowerPC Håkon H. Hitland 2023-04-05 02:48:51 +02:00
  • 93c95fcc1b Update quantize_row_q4_0 for Arm NEON Håkon H. Hitland 2023-04-05 02:37:20 +02:00
  • b7e704658e Update quantize_row_q4_0 for WASM Håkon H. Hitland 2023-04-05 01:18:42 +02:00
  • 5d5f2b2efa Update quantize_row_q4_0 for AVX/AVX2 Håkon H. Hitland 2023-04-05 01:02:43 +02:00
  • 3698f79e6a Use full range for q4_0 quantization Håkon H. Hitland 2023-04-03 03:02:26 +02:00
  • cd6c121357 reinstated the reusable buffers -> approx 10% speedup for prompt processing Concedo 2023-04-22 22:49:27 +08:00
  • bdca7999c8 Fix Makefile Slaren 2023-04-22 16:12:37 +02:00
  • f787b35f7b Fix: Issue with CUBLAS compilation error B1gM8c 2023-04-22 21:19:02 +08:00
  • d181095e5b Fix: Issue with CUBLAS compilation error due to missing -fPIC flag B1gM8c 2023-04-22 21:00:14 +08:00
  • a0242a833c Minor, plus rebase on master gg/rmse_quantization Iwan Kawrakow 2023-04-21 18:09:43 +02:00
  • e435bfd93c RMSE-optimized quants for all quantization types Iwan Kawrakow 2023-04-21 10:26:49 +02:00
  • 4ceff5a979
    cmake : add install step for libllama and llama.h grencez 2023-04-22 06:37:26 -07:00
  • 0e018fe008
    ggml : fix Q4_3 cuBLAS master-0e018fe Georgi Gerganov 2023-04-22 16:31:56 +03:00
  • 4676a1d41b Fix: Issue with CUBLAS compilation error B1gM8c 2023-04-22 21:19:02 +08:00
  • 81cb1eee30 A better mul_sum_i8_pairs_float implementation using AVX512 MeouSker77 2023-04-22 21:16:58 +08:00
  • bde28f2056 A better packNibbles implementation using AVX512 MeouSker77 2023-04-22 17:37:03 +08:00
  • 857308d1e8
    ci : trigger CI for drafts, but not most PR actions (#1125) master-857308d Stephan Walter 2023-04-22 13:12:29 +00:00
  • 2390573ce3 Fix: Issue with CUBLAS compilation error due to missing -fPIC flag B1gM8c 2023-04-22 21:00:14 +08:00
  • 69aa277226 Trigger CI for drafts, but not most PR actions Stephan Walter 2023-04-22 14:17:11 +02:00
  • c50b628810
    Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122) master-c50b628 Stephan Walter 2023-04-22 10:54:13 +00:00
  • dd87e59980 Try to fix ARM NEON blindfolded... Stephan Walter 2023-04-22 12:47:37 +02:00
  • d950154d25 Fix CI: quantization unit tests, editorconfig Stephan Walter 2023-04-22 12:35:31 +02:00
  • 5f939498d5
    ggml : unit test for quantization functions (#953) unbounded 2023-04-22 11:10:39 +02:00
  • 36b4f7e064
    llama : print timings on ctrl+c exit (#1021) master-36b4f7e wbpxre150 2023-04-22 16:56:35 +08:00
  • 4b8d5e3890
    llama : quantize attention results quant-attn Georgi Gerganov 2023-04-21 17:42:02 +03:00
  • 811989c2ad fixed pyinstaller Concedo 2023-04-22 16:31:42 +08:00
  • 10f19c1121
    llama : have n_batch default to 512 (#1091) master-10f19c1 eiery 2023-04-22 04:27:05 -04:00
  • 1b7aa2b815 Merge branch 'master' into concedo Concedo 2023-04-22 16:22:08 +08:00
  • 7e312f165c
    cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100) master-7e312f1 Howard Su 2023-04-22 16:18:20 +08:00
  • 872c365a91 ggml : fix AVX build + update to new Q8_0 format master-872c365 Georgi Gerganov 2023-04-22 11:08:12 +03:00
  • 1ea0e15292 Merge branch 'master' into concedo Concedo 2023-04-22 16:07:27 +08:00
  • 708540712d Q8_0: unbreak AVX Stephan Walter 2023-04-22 10:07:17 +02:00
  • b53c5e7c80 update kobold lite Concedo 2023-04-22 16:04:43 +08:00
  • 955ef9a5d5
    ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) Georgi Gerganov 2023-04-22 10:55:35 +03:00
  • 2c358eca92 ggml : fix AVX paths for Q8_0 quantization Georgi Gerganov 2023-04-22 10:49:17 +03:00
  • 76b6b267e6
    ggml : slight improvement of Q4_3 - no need for loop unrolling Georgi Gerganov 2023-04-22 10:37:19 +03:00
  • 829c480643
    ggml : fix Q4_3 scalar imlpementation Georgi Gerganov 2023-04-21 23:08:15 +03:00
  • 5425e06006
    ggml : alternative Q4_3 implementation using modified Q8_0 Georgi Gerganov 2023-04-21 20:53:18 +03:00
  • ec805eeffa
    ggml : prefer vzip to vuzp Georgi Gerganov 2023-04-21 20:12:42 +03:00
  • c5aa5e5777
    ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099) master-c5aa5e5 Stephan Walter 2023-04-22 07:37:05 +00:00
  • e9a9cb0c54
    examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) Clint Herron 2023-04-22 02:54:33 -04:00
  • b6e7f9b09e
    llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105) master-b6e7f9b xaedes 2023-04-22 08:21:32 +02:00
  • 6e908c1792 added lora support Concedo 2023-04-22 12:29:38 +08:00
  • b2e8a320c8 set n_batch to 512 for all cases eiery 2023-04-22 00:07:42 -04:00
  • 131159ff1b
    Merge branch 'ggerganov:master' into master eiery 2023-04-22 00:02:09 -04:00
  • c454f8b848 Gpt NeoX / Pythia integration completed Concedo 2023-04-22 11:23:25 +08:00
  • 7b3d04e5d4 Merge branch 'master' into concedo_experimental Concedo 2023-04-22 10:58:16 +08:00
  • 1609357e6b add global pointer to ctx. wbpxre150 2023-04-22 10:57:56 +08:00
  • 4fa3dfe8bc just doesn't work properly on windows. will leave it as a manual flag for others Concedo 2023-04-22 10:57:38 +08:00
  • a333c3a56b Merge remote-tracking branch 'upstream/master' into eval-thread-count ml6 2023-04-21 17:58:54 -07:00
  • a9227f3e97 Adding trailing newline. HanClinto 2023-04-21 20:04:41 -04:00
  • 50cb666b8a
    Improve cuBLAS performance by using a memory pool (#1094) master-50cb666 slaren 2023-04-21 21:59:17 +02:00
  • 53e8fa223b Increasing repeate_penalty to 1.1 to make alpaca more usable by default. HanClinto 2023-04-21 12:44:27 -04:00
  • 9f51b6b242 Moving parameters to separate lines for readability. HanClinto 2023-04-21 12:43:56 -04:00
  • d774e05428 Change memory pool synchronization mechanism to a spin lock General code cleanup Slaren 2023-04-21 21:02:17 +02:00
  • 25d7abbd1f
    llama : fixed rlimit error message (#888) master-25d7abb apaz 2023-04-21 13:48:06 -05:00
  • 018f2279f5
    cmake : link threads publicly to ggml (#1042) master-018f227 源文雨 2023-04-22 02:27:06 +08:00
  • 9411288271
    main : evaluate tokens in batches after swapping context (#1014) master-9411288 Alex Klinkhamer 2023-04-21 11:18:09 -07:00
  • 80d1c166db
    Update examples/main/main.cpp Georgi Gerganov 2023-04-21 21:17:55 +03:00
  • 3a5958bd28 Rename hsum_int_8 to hsum_i32_8 Stephan Walter 2023-04-21 19:26:25 +02:00
  • 266bb63f68
    ggml : alternative Q4_3 format + implementation Georgi Gerganov 2023-04-21 20:13:12 +03:00
  • 5db246a547
    ggml : prefer vzip to vuzp Georgi Gerganov 2023-04-21 20:12:42 +03:00
  • ef13443047 wip pythia integration Concedo 2023-04-22 01:08:23 +08:00
  • 68898046c2 accidentally added the binaries onto repo again. Concedo 2023-04-22 00:41:19 +08:00
  • 535ea4709e finish AVX vectorization of quantize_row_q8_0 Stephan Walter 2023-04-21 18:31:26 +02:00
  • cee018960e Merge branch 'master' into concedo_experimental Concedo 2023-04-22 00:19:50 +08:00
  • 63d8dff4fb AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring Stephan Walter 2023-04-21 11:20:48 +02:00
  • 456aedc461
    fix comment xaedes 2023-04-21 17:49:52 +02:00
  • 1c51e1f324
    remove trailing whitespace xaedes 2023-04-21 17:32:39 +02:00
  • 8687c1f258
    llama : remember and restore kv cache data pointers (#1104) master-8687c1f xaedes 2023-04-21 17:25:21 +02:00
  • f555db44ec adding the libraries for cublas first. but i cannot get the kernel to work yet Concedo 2023-04-21 23:24:09 +08:00
  • 9d26580b84
    remove unused variables xaedes 2023-04-21 17:23:53 +02:00
  • 1bfc153e2f
    ggml : a faster version for Q4_1 x Q8_0 dot products (#1083) master-1bfc153 Kawrakow 2023-04-21 17:18:26 +02:00
  • 283156c1fd
    remember and restore kv cache data pointers xaedes 2023-04-14 02:38:47 +02:00
  • 8288b36749
    add functions to get and set the whole llama state: xaedes 2023-04-14 03:51:34 +02:00
  • 8ed3c3fe2a
    reserve correct size for logits xaedes 2023-04-14 02:38:10 +02:00
  • b99a678d5b Make AVX512 test on Windows to build the shared libs Howard Su 2023-04-21 21:30:10 +08:00
  • 0626417687 Fix build under Windows when enable BUILD_SHARED_LIBS Howard Su 2023-04-21 21:19:02 +08:00
  • 794a38a2e8 Revert "cublas is not feasible at this time. removed for now" Concedo 2023-04-21 21:02:40 +08:00
  • 3d59769c3b
    Show perplexity ETA in hours and minutes (#1096) master-3d59769 slaren 2023-04-21 14:57:57 +02:00
  • c542d5a7a4 Cleaning up Iwan Kawrakow 2023-04-21 14:45:17 +02:00
  • 66a865b80d A faster version for Q4_1 x Q8_0 dot products Iwan Kawrakow 2023-04-20 17:45:27 +02:00
  • 5160053e51 merged llama adapter into the rest of the gpt adapters Concedo 2023-04-21 17:47:48 +08:00
  • 4808efefac
    readme : update gpt4all instructions Pavol Rusnak 2023-04-21 11:11:00 +02:00
  • 82d74ca1a6 Merge branch 'master' into concedo Concedo 2023-04-21 16:24:30 +08:00
  • 3687db7cf7 cublas is not feasible at this time. removed for now Concedo 2023-04-21 16:14:23 +08:00
  • d40fded93e
    llama : fix comment for "output.weight" tensor master-d40fded Georgi Gerganov 2023-04-21 10:23:36 +03:00
  • 7c36e03dfb too many hooved animals! Barton Rhodes 2023-04-21 05:07:57 +00:00
  • d1d76e24f2 Show perplexity ETA in hours and minutes Slaren 2023-04-21 03:50:20 +02:00
  • c832e7c793 Add CXX flags to nvcc Slaren 2023-04-21 03:39:04 +02:00
  • 94cb00a3cf alternate implementation of setting different n_batch for BLAS eiery 2023-04-20 20:57:16 -04:00
  • d3e1984ce0 add rpath Henri Vasserman 2023-04-21 03:32:06 +03:00
  • 0e005f7793 Build file changes Henri Vasserman 2023-04-21 02:13:00 +03:00
  • 641e9a0c52 Move cuda specific definitions to ggml-cuda.h/cu Slaren 2023-04-21 00:58:26 +02:00
  • 6ffe4680ca fiat nexus ▦ Barton Rhodes 2023-04-20 22:38:56 +00:00
  • 041627284d
    Merge branch 'ggerganov:master' into main barton ⊛ 2023-04-20 22:21:31 +00:00
  • e8797a9aed Improve cuBLAS performance by using a memory pool Slaren 2023-04-21 00:09:14 +02:00
  • 2510c1831f
    Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) Stephan Walter 2023-04-20 21:56:44 +00:00
  • d58c304a45
    Add ggml-model-*.bin checksums for 65B Pavol Rusnak 2023-04-20 23:55:04 +02:00
  • c6dfc44a37 spacing eiery 2023-04-20 17:06:34 -04:00