Commit graph

  • 4b781c2055 set default n_batch to 512 when using BLAS eiery 2023-04-20 17:04:31 -04:00
  • 12b5900dbc
    ggml : sync ggml (add GPT-NeoX RoPE implementation) master-12b5900 Georgi Gerganov 2023-04-20 23:32:59 +03:00
  • 7aa501cd1c Faster q3_0 implementation, using two planes, by @pubby pubby 2023-04-17 10:38:45 -05:00
  • 8c90a860cc More AVX2 optimizations Stephan Walter 2023-04-16 15:36:36 +02:00
  • c29ab90e06 Q2 AVX2: do two blocks at a time, by @slaren Stephan Walter 2023-04-16 09:55:39 +02:00
  • 6fc51a8c05 Q2 and Q3 quantization Stephan Walter 2023-03-24 17:32:35 +01:00
  • d54dcbcc3b Add ggml-model-*.bin checksums for 7B, 13B, 30B Stephan Walter 2023-04-20 21:50:25 +02:00
  • 54a63c10e8 Update Makefile for the Cuda kernels Henri Vasserman 2023-04-20 22:19:22 +03:00
  • 9ff334f3c9
    ggml : fix bug in ggml_compute_forward_dup_f32() master-9ff334f Georgi Gerganov 2023-04-20 21:58:05 +03:00
  • 0fd8363adc use hipblas based on cublas Henri Vasserman 2023-04-20 02:04:00 +03:00
  • 2005469ea1
    Add Q4_3 support to cuBLAS (#1086) master-2005469 slaren 2023-04-20 20:49:53 +02:00
  • 8a1756abdf
    ggml : do not break cuBLAS build (Q4_3 is not yet implemented) master-8a1756a Georgi Gerganov 2023-04-20 21:43:50 +03:00
  • 7aba7cae29 Add Q4_3 support to cuBLAS Slaren 2023-04-20 20:34:35 +02:00
  • 66aab46079
    ggml : fix Q4_3 quantization master-66aab46 Georgi Gerganov 2023-04-20 20:44:05 +03:00
  • 38de86a711
    llama : multi-threaded quantization (#1075) master-38de86a Kawrakow 2023-04-20 19:42:27 +02:00
  • b3545d9a2a
    Merge branch 'master' into multi-thread-quantize Georgi Gerganov 2023-04-20 20:41:29 +03:00
  • e0305ead3a
    ggml : add Q4_3 quantization (#1082) master-e0305ea Georgi Gerganov 2023-04-20 20:35:53 +03:00
  • 515ccfd2b6
    ggml : add Q4_3 quantization Georgi Gerganov 2023-04-20 18:58:35 +03:00
  • 0ae02ebb40 Still fighting with lambda captures in MSVC Iwan Kawrakow 2023-04-20 18:50:27 +02:00
  • 7fae1c4ee2 Avoiding compiler confusion Iwan Kawrakow 2023-04-20 18:40:33 +02:00
  • 07bb31b034 wip dont use Concedo 2023-04-21 00:35:54 +08:00
  • b65e559a68 Reviewer comments Iwan Kawrakow 2023-04-20 18:17:56 +02:00
  • 6a9661ea5a
    ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074) master-6a9661e Ivan Komarov 2023-04-20 17:15:18 +02:00
  • 7ba36c2c6c trying to put out penguin based fires. sorry for inconvenience Concedo 2023-04-20 23:15:07 +08:00
  • 5addcb120c
    fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) master-5addcb1 源文雨 2023-04-20 21:28:43 +08:00
  • 113de2bb06
    fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' 源文雨 2023-04-20 19:09:44 +08:00
  • 49697d86d8 adjusted down the buf memory allocation now that realloc seems to work Concedo 2023-04-20 17:51:13 +08:00
  • 4605074245 Merge branch 'master' into concedo_experimental Concedo 2023-04-20 17:30:54 +08:00
  • 3e88616439 fixed WONKY CODE Concedo 2023-04-20 16:41:32 +08:00
  • 0b08ec7c5d forgot to remove this Concedo 2023-04-20 16:28:47 +08:00
  • 346cd68903 make linux and OSX build process equal to windows. Now it will build all applicable libraries, for a full build do make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 Concedo 2023-04-20 15:53:55 +08:00
  • c8c2c52482
    AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) master-c8c2c52 Stephan Walter 2023-04-20 06:45:41 +00:00
  • ce05fc0a67 Multi-threading for quantize-stats Iwan Kawrakow 2023-04-20 07:25:13 +02:00
  • 2732a6b84a Merge remote-tracking branch 'upstream/master' into eval-thread-count ml6 2023-04-19 21:43:40 -07:00
  • 93761e7baf slightly clarified the library replacement steps - replacing the dll is necessary in addition to replacing the library imports Concedo 2023-04-20 12:23:54 +08:00
  • 5ca2d774cc
    doc - explanation of how to use a custom version of the windows libraries at the lib folder. (#92) Gustavo Rocha Dias 2023-04-20 01:20:11 -03:00
  • e488db9fd9
    Remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI Ivan Komarov 2023-04-20 04:23:19 +02:00
  • 02d6988121
    Improve cuBLAS performance by dequantizing on the GPU (#1065) master-02d6988 slaren 2023-04-20 03:14:14 +02:00
  • 18337719e0 Fix windows build Slaren 2023-04-20 01:03:44 +02:00
  • 95cf9597aa Fix possible synchronization issue Slaren 2023-04-19 23:01:53 +02:00
  • 834695fe3a
    Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -05:00
  • 5b7ff8234f
    editorconfig check CRD716 2023-04-19 14:35:58 -05:00
  • 48f6664589
    trailing CRD716 2023-04-19 14:31:01 -05:00
  • 0731d4147e
    Update README.md CRD716 2023-04-19 14:29:40 -05:00
  • 72028641ca AVX2 optimization for vec_dot_q4_2_q8_0 Stephan Walter 2023-04-19 20:41:55 +02:00
  • d2f9266200 Multi-threading quantization. Iwan Kawrakow 2023-04-19 20:20:44 +02:00
  • f7d05095b4
    Q4_2 quantization with rmse-optimized scale and quants (#1062) master-f7d0509 Kawrakow 2023-04-19 20:20:14 +02:00
  • fe14e7c522 Re-add dropped Darwin-only flag. Corbin 2023-04-19 10:53:42 -07:00
  • 35b0bf0585 Merge remote-tracking branch 'upstream/master' into more_responsive Jeffersoncgo 2023-04-19 13:44:25 -04:00
  • 14a4fc874b Nix flake: Use Makefile instead of CMake Corbin 2023-04-19 10:23:47 -07:00
  • 884e7d7a2b
    ggml : use 8-bit precision for Q4_1 intermediate results (#1047) master-884e7d7 Georgi Gerganov 2023-04-19 20:10:08 +03:00
  • 96d84438bc Fixed type as per reviewer comment Iwan Kawrakow 2023-04-19 18:57:07 +02:00
  • 49beb2cdb8 Better follow ggml conventions for function names Iwan Kawrakow 2023-04-19 18:46:44 +02:00
  • e582f2ad60
    gitignore : ignore ppl-*.txt files Georgi Gerganov 2023-04-19 19:31:44 +03:00
  • ad7007aa21
    ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051) slaren 2023-04-19 18:29:02 +02:00
  • 426230525c
    ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32 Georgi Gerganov 2023-04-18 23:33:03 +03:00
  • e9c07f72cb
    ggml : use 8-bit precision for Q4_1 intermediate results (ARM) Georgi Gerganov 2023-04-18 22:12:19 +03:00
  • 6d36a51fa5
    ggml : satisfy the sanitizer builds Georgi Gerganov 2023-04-19 19:18:28 +03:00
  • 891af05e7d Remove unused parameters Slaren 2023-04-19 18:11:54 +02:00
  • 7cd5c4a3e9
    readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +03:00
  • f3d4edf504
    ggml : Q4 cleanup - remove 4-bit dot product code (#1061) master-f3d4edf Stephan Walter 2023-04-19 16:06:37 +00:00
  • 359b056034 Improve cuBLAS performance with quantized models by dequantizing on the GPU Slaren 2023-04-19 18:01:39 +02:00
  • e9657b20e8 Remove unused AVX512 Q4_0 code Stephan Walter 2023-04-19 17:31:02 +02:00
  • 6eec06081b Q4_2 quantization with rmse-optimized scale and quants Iwan Kawrakow 2023-04-19 17:10:58 +02:00
  • 21ee6d97cc Q4 cleanup Stephan Walter 2023-04-19 16:15:24 +02:00
  • 275f1bdf13 Added tokens to identify if is loading or ready Jeffersoncgo 2023-04-19 09:08:32 -04:00
  • be1222c36e Merged the upstream cublas feature, Concedo 2023-04-19 20:45:37 +08:00
  • cc407f283a messing around with memory allocation to bandaid the random ooms with various gpt2 and gptj models Concedo 2023-04-19 20:18:55 +08:00
  • 99eafe908f more_responsive Jeffersoncgo 2023-04-19 08:01:35 -04:00
  • 8944a13296
    Add NVIDIA cuBLAS support (#1044) master-8944a13 slaren 2023-04-19 11:22:45 +02:00
  • f662a9a230 Merge branch 'master' into concedo Concedo 2023-04-19 16:34:51 +08:00
  • 65bfcdb1cc Merge branch 'concedo_experimental' into concedo Concedo 2023-04-19 15:35:48 +08:00
  • 45ec09d31b fast forwarding for rwkv for unmodified contexts Concedo 2023-04-19 15:09:35 +08:00
  • 116488af66
    Create make_pyinstaller.sh (#89) AlpinDale 2023-04-19 07:27:07 +04:30
  • 142c38a4f3 AVX2 implementation of ggml_vec_dot_q4_1_q8_0 Slaren 2023-04-19 03:13:20 +02:00
  • 6667401238
    Multi-threaded ggml_cpy (#1035) master-6667401 slaren 2023-04-19 00:53:24 +02:00
  • b9e99cd1fd Also fix wdata offset in ggml_compute_forward_add_q_f32 Slaren 2023-04-18 22:27:50 +02:00
  • 8bd47a8bda Update ggml.c slaren 2023-04-18 19:19:01 +02:00
  • 0f8b1df18f Multi-threaded ggml_cpy Slaren 2023-04-18 01:47:34 +02:00
  • 40846bd28d Cleanup cublas comments Slaren 2023-04-19 00:37:33 +02:00
  • 5fc6799f05 Add support to cmake Slaren 2023-04-18 23:20:11 +02:00
  • 77a73403ca
    ggml : add new Q4_2 quantization (ARM only) (#1046) master-77a7340 Georgi Gerganov 2023-04-18 23:54:57 +03:00
  • ed24225917
    ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32 Georgi Gerganov 2023-04-18 23:33:03 +03:00
  • 3ceb0733a6
    Merge branch 'master' into q4_1xq8_0 Georgi Gerganov 2023-04-18 23:13:21 +03:00
  • 50a8a2af97
    ggml : scratch that - vmlaq_n_f32 is always better master-50a8a2a Georgi Gerganov 2023-04-18 23:11:23 +03:00
  • 5843b45b6b
    ggml : optimize q4_2 using vmlaq_n_f32 + vmulq_n_f32 Georgi Gerganov 2023-04-18 23:09:18 +03:00
  • 3a7908940f
    ggml : speed-up q4_2 Georgi Gerganov 2023-04-18 22:39:35 +03:00
  • 5e6b62ce77
    llama : update llama_type_name() with Q4_2 entry Georgi Gerganov 2023-04-18 22:21:20 +03:00
  • fe859297f3
    ggml : add ggml_is_quantized() Georgi Gerganov 2023-04-18 21:32:27 +03:00
  • e435b81454
    ggml : Q4_2 ARM Georgi Gerganov 2023-04-18 21:11:56 +03:00
  • 4caebf6d40
    gitignore : vdot Georgi Gerganov 2023-04-18 23:00:08 +03:00
  • dcdd65e296
    ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators master-dcdd65e Georgi Gerganov 2023-04-18 22:59:17 +03:00
  • 7840f6637c
    ggml : use 8-bit precision for Q4_1 intermediate results (ARM) Georgi Gerganov 2023-04-18 22:12:19 +03:00
  • 5ecff35151
    Adding a simple program to measure speed of dot products (#1041) master-5ecff35 Kawrakow 2023-04-18 21:00:14 +02:00
  • e8061e6990
    Merge 3dc5243b1b into 7faa7460f0 Jan Bielak 2023-04-18 20:35:14 +02:00
  • 5725eec429
    Update CMakeLists.txt 源文雨 2023-04-19 01:10:49 +08:00
  • 7faa7460f0
    readme : update hot topics about new LoRA functionality Georgi Gerganov 2023-04-18 20:10:26 +03:00
  • 5af8e32238
    ci : do not run on drafts master-5af8e32 Georgi Gerganov 2023-04-17 18:00:10 +03:00
  • d7c53a084e
    Update CMakeLists.txt 源文雨 2023-04-19 00:45:50 +08:00
  • fdb55c9a01
    Update CMakeLists.txt 源文雨 2023-04-19 00:45:29 +08:00