Commit graph

  • 36bfb3c158 Fix typos, use GGML_TYPE defines, improve code 0cc4m 2023-04-25 18:43:31 +02:00
  • 0aa3d839fb free old ctx on retry Concedo 2023-04-25 23:42:57 +08:00
  • 6454855ae3
    Update convert-lora-to-ggml.py ostix360 2023-04-25 17:33:52 +02:00
  • a696b0a16c missed another thing Concedo 2023-04-25 23:16:04 +08:00
  • 8c9c218609 missed a thing Concedo 2023-04-25 23:02:08 +08:00
  • 235daf4016 Merge branch 'master' into concedo Concedo 2023-04-25 20:44:22 +08:00
  • 72b2331ad6 edge cases with mem crash? need verify Concedo 2023-04-25 20:42:30 +08:00
  • 5eec5d6ed9 Added backwards compatibility to an earlier version of NeoX. Concedo 2023-04-25 20:34:18 +08:00
  • bff998f871 Slight refactor of the python code: credits to @LuxF3rre Concedo 2023-04-25 19:20:14 +08:00
  • 9bfc54373c force int caste .0 in the config file for the lora_alpha param ostix360 2023-04-25 07:19:46 +00:00
  • 9143ccefa0
    introduction to give more consistent results CRD716 2023-04-24 22:55:43 -05:00
  • e3159c018f
    editorcheck CRD716 2023-04-24 22:01:05 -05:00
  • e82439a36e
    Prefixes, Line separators, etc CRD716 2023-04-24 21:59:06 -05:00
  • 7f58f2cca0 llama : add session file format and saved sessions in main Evan Jones 2023-04-24 20:56:45 -04:00
  • 7fd88f445b
    Prevent Results.txt from coming up CRD716 2023-04-24 19:13:08 -05:00
  • ccf900240d
    Basic Setup CRD716 2023-04-24 19:12:46 -05:00
  • 54bb60e268
    ggml : fix bug in ggml_compute_forward_sum_f32 (#1162) master-54bb60e xaedes 2023-04-24 23:02:02 +02:00
  • daa5df51f7 Replace buffer pool with static buffers a, b, qb, c 0cc4m 2023-04-24 22:08:51 +02:00
  • ae73887fb9 Add CLBlast to CMakeLists.txt 0cc4m 2023-04-24 21:22:41 +02:00
  • 18cc05bde4 Fix cast in opencl kernels 0cc4m 2023-04-24 16:13:43 +02:00
  • 8603c25e3c Fix device selection env variable names 0cc4m 2023-04-24 15:53:48 +02:00
  • f469d9afa0 Double CLBlast speed by disabling OpenBLAS thread workaround 0cc4m 2023-04-24 15:15:23 +02:00
  • 309af7fce9 Add q4_2 and q4_3 CLBlast support, improve code 0cc4m 2023-04-24 07:16:43 +02:00
  • 1b16b8c90d Move CLBlast implementation to separate file 0cc4m 2023-04-23 09:59:45 +02:00
  • 6f66870726 Finish merge of ClBlast support 0cc4m 2023-04-15 12:03:11 +02:00
  • b7143c1a2e Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers 0cc4m 2023-04-11 21:53:50 +02:00
  • a908c37ce9 Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing 0cc4m 2023-04-10 09:49:40 +02:00
  • e24ecd2cc9
    fix bug in ggml_compute_forward_sum_f32 xaedes 2023-04-24 21:13:22 +02:00
  • 8a0f8673ba
    ggml : export symbols (#1155) master-8a0f867 Georgi Gerganov 2023-04-24 22:18:25 +03:00
  • 09ae3044f4
    small update on the readme KASR 2023-04-24 20:56:15 +02:00
  • 5808fcf7ac Use full range for q4_2 quantization Håkon H. Hitland 2023-04-24 20:54:51 +02:00
  • 735c77acf1
    move powershell script & update readme KASR 2023-04-24 20:49:43 +02:00
  • e5bbecaf2d
    Merge branch 'ggerganov:master' into master KASR 2023-04-24 20:32:39 +02:00
  • d09f97e28f Update quantize_row_q4_0 for PowerPC Håkon H. Hitland 2023-04-05 02:48:51 +02:00
  • fea8d10107 Update quantize_row_q4_0 for Arm NEON Håkon H. Hitland 2023-04-05 02:37:20 +02:00
  • 73a92d2d3c Update quantize_row_q4_0 for WASM Håkon H. Hitland 2023-04-05 01:18:42 +02:00
  • 84aa7d83c4 Update quantize_row_q4_0 for AVX/AVX2 Håkon H. Hitland 2023-04-05 01:02:43 +02:00
  • f57433c44f Use full range for q4_0 quantization Håkon H. Hitland 2023-04-03 03:02:26 +02:00
  • 0c5692345d
    examples : add save_load_state example (#1150) master-0c56923 xaedes 2023-04-24 18:23:31 +02:00
  • 00ef34dea1
    renamed save-load-state example files replacing underscores by dashes xaedes 2023-04-24 18:20:10 +02:00
  • e8a156ab50
    ggml : export symbols Georgi Gerganov 2023-04-24 18:55:18 +03:00
  • 957c8ae21d
    llama : increase scratch buffer size for 65B (ref #1152) master-957c8ae Georgi Gerganov 2023-04-24 18:47:03 +03:00
  • 9b0a4d4214
    examples/main README improvements and some light refactoring (#1131) master-9b0a4d4 mgroeber9110 2023-04-24 17:45:32 +02:00
  • 2ec83428de
    Fix build for gcc 8 and test in CI (#1154) master-2ec8342 Stephan Walter 2023-04-24 15:38:26 +00:00
  • 0f40b0adb9
    use <cstdio> instead of <iostream> and fprintf / printf instead of cout xaedes 2023-04-24 17:33:06 +02:00
  • e4cf982e0d
    Fix cuda compilation (#1128) master-e4cf982 slaren 2023-04-24 17:29:58 +02:00
  • 890f536287 Fix build for gcc 8 and test in CI Stephan Walter 2023-04-24 17:08:27 +02:00
  • 59fb174678 fixed compile errors, made mmap automatic when lora is selected, added updated quantizers and quantization handling for gpt neox gpt 2 and gptj Concedo 2023-04-24 23:20:06 +08:00
  • 2e335c4c4e
    Update check_SHA256_windows.ps1 KASR 2023-04-24 16:41:27 +02:00
  • 3962eb39c7 added token unbanning Concedo 2023-04-24 21:50:20 +08:00
  • d2c2630307
    Merge 35b0bf0585 into c4fe84fb0d jeffersoncgo 2023-04-24 08:33:12 -05:00
  • 1b9b9068b1 merged q4_2 and q4_3 dequants and FIXED CLBLAST SLOWNESS! Concedo 2023-04-24 21:33:01 +08:00
  • e58f1d1336 Merge branch 'master' into concedo_experimental Concedo 2023-04-24 19:43:17 +08:00
  • f30e41e0ef Merge branch 'master' into example-readme mgroeber9110 2023-04-24 11:01:14 +02:00
  • 86fc970126
    verify sha256 checksums for windows users KASR 2023-04-24 10:43:52 +02:00
  • c4fe84fb0d
    llama : refactor get / set state + remove redundant kv cache API (#1143) master-c4fe84f Georgi Gerganov 2023-04-24 07:40:02 +03:00
  • 8e615c8245 Merge branch 'master' into concedo_experimental Concedo 2023-04-24 12:20:08 +08:00
  • 19a2ca08e5
    add save_load_state example xaedes 2023-04-22 21:11:40 +02:00
  • 6e292d70a5 cuBLAS memory management routines John 2023-04-24 04:26:40 +02:00
  • 7fcfba2e28 cuBLAS memory management John 2023-04-24 03:37:55 +02:00
  • 3a004b2a01 add rpath Henri Vasserman 2023-04-24 02:24:54 +03:00
  • 1d78fecdab
    Fix LoRA acronym (#1145) slaren 2023-04-23 23:03:44 +02:00
  • 4fd4c2d474 Merge some information from previous README draft mgroeber9110 2023-04-23 21:44:16 +02:00
  • a75c52696a Fix LoRA acronym Slaren 2023-04-23 21:33:17 +02:00
  • db7a01297e Merge 'origin/master' into hipblas Henri Vasserman 2023-04-23 21:49:28 +03:00
  • e0814f1224 merge mgroeber9110 2023-04-23 20:38:01 +02:00
  • 284685f169
    scripts : add helper scripts to synch ggml repo Georgi Gerganov 2023-04-23 19:57:09 +03:00
  • a136a93085
    llama : refactor get / set state + remove redundant kv cache API Georgi Gerganov 2023-04-23 18:55:34 +03:00
  • edce63baa9
    Added README.md for main with examples and explanations (#1139) DannyDaemonic 2023-04-23 08:37:02 -07:00
  • ec9cdb6752
    ggml : do not print perf ops that have not been used at all master-ec9cdb6 Georgi Gerganov 2023-04-23 18:32:52 +03:00
  • e4422e299c
    ggml : better PERF prints + support "LLAMA_PERF=1 make" master-e4422e2 Georgi Gerganov 2023-04-23 18:15:39 +03:00
  • 507207e976 Fixed typo and added longer section on n_predict Danny Daemonic 2023-04-23 08:13:19 -07:00
  • befd8755ca Added --help and --seed Danny Daemonic 2023-04-23 05:46:41 -07:00
  • b8cf6b69db Added README.md for main with examples and explanations Danny Daemonic 2023-04-23 03:03:20 -07:00
  • 102cd98074 ggml : Q4_3c using 2x "Full range" approach q4_3-range-fix Georgi Gerganov 2023-04-23 14:44:36 +03:00
  • 1195577355 Apply review comments mgroeber9110 2023-04-23 13:55:28 +02:00
  • 53c8434398
    Improve AVX2 for vec_dot_q4_3_q8_0 (#1138) master-53c8434 Stephan Walter 2023-04-23 11:01:03 +00:00
  • e1fcd0fe21 Improve AVX2 for vec_dot_q4_3_q8_0 Stephan Walter 2023-04-23 11:52:12 +02:00
  • c6524f46eb
    readme : update gpt4all instructions (#980) Pavol Rusnak 2023-04-23 10:21:26 +02:00
  • 9129e937f9 only llama can use batch sizes above 256 to prevent unacceptably high memory usage Concedo 2023-04-23 15:57:06 +08:00
  • c9e2c26f41
    A better packNibbles and mul_sum_i8_pairs_float implementation using AVX512 (#1119) master-c9e2c26 Yishuo Wang 2023-04-23 15:57:05 +08:00
  • 432cc91649 still needs to be a bit higher for very small contexts Concedo 2023-04-23 15:01:38 +08:00
  • 4e1ea2ac61 hopefully fixed the ooms for good Concedo 2023-04-23 13:49:50 +08:00
  • 3f21bd81f3
    doc - Better explanation of how to build the libraries at Windows. (#107) Gustavo Rocha Dias 2023-04-23 02:40:09 -03:00
  • d771b8152d change mul_sum_i8_pairs_float to use AVX_VNNI MeouSker77 2023-04-23 11:44:56 +08:00
  • b1a8c244ce main: add pledge call on OpenBSD codesoap 2023-04-22 22:59:46 +02:00
  • 367723544c More build file changes Henri Vasserman 2023-04-22 23:28:00 +03:00
  • 9521390ce4 Rename structure member for --interactive-first to be consistent with external parameter mgroeber9110 2023-04-22 20:29:05 +02:00
  • bc290f6cde First draft of README for main mgroeber9110 2023-04-22 20:26:07 +02:00
  • 6fd49ed050 Minor, plus rebase on master Iwan Kawrakow 2023-04-21 18:09:43 +02:00
  • 4f4f90c92e Fix test-quantize Iwan Kawrakow 2023-04-21 17:39:17 +02:00
  • 3c69f93d6c RMSE-optimized quants for all quantization types Iwan Kawrakow 2023-04-21 10:26:49 +02:00
  • d41490c27b just revert back to the working commit Concedo 2023-04-23 00:35:42 +08:00
  • c60fb5ef4b fixed rwkv build errors on ARM devices Concedo 2023-04-23 00:18:38 +08:00
  • b5d6284190 increase initial buffer too Concedo 2023-04-23 00:07:33 +08:00
  • d2f14b2b1f add an extra buffer to mem allocations Concedo 2023-04-23 00:04:32 +08:00
  • 71e6ae3779 ggml : continue from #729 (wip) q4_0-q4_2-range-fix Georgi Gerganov 2023-04-22 18:49:07 +03:00
  • 7c60441d71 Merge branch 'master' into concedo Concedo 2023-04-22 23:46:14 +08:00
  • eb73b4c261 remove writing to cl_buffer_c and change it to a writeonly buffer - should work since beta is always zero. Concedo 2023-04-22 23:19:17 +08:00
  • bd166f7ffc Fix type error in quantize_row_q4_1 for Arm NEON Håkon H. Hitland 2023-04-05 22:59:54 +02:00