Commit graph

  • 4487694d79 updates readme per feedback TheNotary 2023-04-27 12:49:05 -05:00
  • 6ddce362ef
    remove the part of the code that reads the file at once if enough ram is available KASR 2023-04-27 16:55:39 +02:00
  • 24317a510e
    minor improvment KASR 2023-04-27 16:51:52 +02:00
  • 0a6d364c5a
    update the verification script KASR 2023-04-27 16:44:12 +02:00
  • bbfba5f740 Fix import cl file name 0cc4m 2023-04-27 15:30:30 +02:00
  • 96346fb2a4 Rename dequant kernels file to conform with other file names 0cc4m 2023-04-27 15:27:22 +02:00
  • fafebff53c Make globals static, fix indentation 0cc4m 2023-04-27 15:25:09 +02:00
  • 846ee2c850 Fix editorconfig Alisamar Husain 2023-04-27 17:14:18 +05:30
  • d2af46e371 Fix editorconfig Alisamar Husain 2023-04-27 16:56:43 +05:30
  • 2b50d21423 Run LLaMA function Alisamar Husain 2023-04-26 13:06:44 +05:30
  • db921e4880 Added Boost to tests Alisamar Husain 2023-04-26 12:53:36 +05:30
  • 98de1c349d Created a Server example Alisamar Husain 2023-04-17 17:15:57 +05:30
  • 2499632cdc up version Concedo 2023-04-27 17:27:10 +08:00
  • 137efe2b8f updated embedded kobold lite, force streaming mode if stream flag is used Concedo 2023-04-27 17:16:55 +08:00
  • 95bbd46019 Merge branch 'master' into concedo_experimental Concedo 2023-04-27 16:12:00 +08:00
  • 5070815dcf fixing discussion #121 and issue #122 Concedo 2023-04-27 16:10:01 +08:00
  • 78434dd8e2
    Update README.md KASR 2023-04-27 09:54:39 +02:00
  • b7fb31ea76
    python script to verify the checksum of the llama models KASR 2023-04-27 09:47:50 +02:00
  • da58bab021
    missed a few CRD716 2023-04-27 01:18:59 -05:00
  • c74ceed3ae
    fix column removal stuff CRD716 2023-04-27 01:16:50 -05:00
  • 6ee4d4c1f2
    Create run.py Jeff Price 2023-04-26 23:12:53 -07:00
  • d08356700e
    Y'all ready to get funky? CRD716 2023-04-27 00:49:47 -05:00
  • 024e31abae
    Grading, ready for testing! CRD716 2023-04-27 00:26:09 -05:00
  • 4277270908
    Basic graph thing CRD716 2023-04-26 23:41:04 -05:00
  • 42b63c575e fixes some typos TheNotary 2023-04-26 17:55:56 -05:00
  • e9c3a82bd6 adds missing .gitkeep TheNotary 2023-04-26 17:43:50 -05:00
  • 5b36ab5afd documentation: reflow the readme to make following the setup more clear TheNotary 2023-04-26 16:19:06 -05:00
  • 0e41441fa1 moves ggml-vocab.bin into test folder where it's used. TheNotary 2023-04-26 12:55:45 -05:00
  • 0b2da20538
    ggml : slightly faster AVX2 implementation for Q5 (#1197) master-0b2da20 Stephan Walter 2023-04-26 20:26:42 +00:00
  • f9be42add0
    readme : add quantization info Georgi Gerganov 2023-04-26 23:24:42 +03:00
  • 8569b35b42 Q5: Slightly faster AVX2 implementation Stephan Walter 2023-04-26 22:11:01 +02:00
  • 574406dc7e
    ggml : add Q5_0 and Q5_1 quantization (#1187) master-574406d Georgi Gerganov 2023-04-26 23:14:13 +03:00
  • 87a6f846d3
    Allow setting the rng seed after initialization. (#1184) master-87a6f84 Ásgeir Bjarni Ingvarsson 2023-04-26 20:08:43 +00:00
  • ea3ad7eb60
    Updating build instructions to include BLAS support (#1183) DaniAndTheWeb 2023-04-26 22:03:03 +02:00
  • 2bfa1fe8e7
    ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195) Stephan Walter 2023-04-26 19:38:15 +00:00
  • 33e50f7247 AVX2 optimizations for Q5_0, Q5_1 Stephan Walter 2023-04-26 17:46:22 +02:00
  • 5a51160a89
    Fix trailing whitespace DaniAndTheWeb 2023-04-26 20:16:29 +02:00
  • 4a35ec9df5 First check error, then release event 0cc4m 2023-04-26 19:56:58 +02:00
  • 982bfce678
    quantize : add Q5_0 and Q5_1 to map Georgi Gerganov 2023-04-26 20:45:03 +03:00
  • 8e936ad0cd
    ggml : adding Q5_0 mode Georgi Gerganov 2023-04-26 18:30:56 +03:00
  • b9c43584f6
    ggml : rename Q5_0 -> Q5_1 Georgi Gerganov 2023-04-26 17:57:26 +03:00
  • d390f4f7dd
    ggml : q5_0 more efficient ARM NEON using uint64_t masks Georgi Gerganov 2023-04-26 16:32:33 +03:00
  • b294b7fdc0
    ggml : q5_0 ARM NEON dot Georgi Gerganov 2023-04-26 16:24:27 +03:00
  • ef8e3ee6f5
    ggml : q5_0 scalar dot product Georgi Gerganov 2023-04-26 13:58:47 +03:00
  • 99238e4c28
    ggml : fix q5_0 histogram stats Georgi Gerganov 2023-04-26 13:37:57 +03:00
  • 2576c16f00
    ggml : fix Q5_0 qh -> uint32_t Georgi Gerganov 2023-04-26 10:43:26 +03:00
  • 5bebc0a6e2
    ggml : add Q5_0 quantization (cuBLAS only) Georgi Gerganov 2023-04-26 10:33:57 +03:00
  • 859fee6dfb
    quantize : use map to assign quantization type from string (#1191) master-859fee6 Pavol Rusnak 2023-04-26 18:43:27 +02:00
  • 6383bbfa5f fix jon-chuang 2023-04-27 00:42:41 +08:00
  • 9eda98d14b fix jon-chuang 2023-04-27 00:41:12 +08:00
  • ce97a807cb Simplify code, fix include 0cc4m 2023-04-26 18:39:04 +02:00
  • b746458281 Use c compiler for opencl files 0cc4m 2023-04-26 18:38:31 +02:00
  • d3e9a5c415
    quantize : use map to assign quantization type from string Pavol Rusnak 2023-04-26 18:06:10 +02:00
  • 101f7a6e73 updated readme Concedo 2023-04-26 23:50:00 +08:00
  • b80bc36ab0 minor jon-chuang 2023-04-26 23:33:24 +08:00
  • 7ffbcbdfa3 fix jon-chuang 2023-04-26 23:29:26 +08:00
  • 8cead20746 done jon-chuang 2023-04-26 23:03:54 +08:00
  • 8ead56c03a fix jon-chuang 2023-04-26 22:58:20 +08:00
  • 5bb5327833 minor jon-chuang 2023-04-26 22:48:15 +08:00
  • afe94e878b Merge branch 'jon/tall-and-skinny-matmul' of https://github.com/jon-chuang/llama.cpp into jon/tall-and-skinny-matmul jon-chuang 2023-04-26 22:46:48 +08:00
  • 0a320ed274 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into jon/tall-and-skinny-matmul jon-chuang 2023-04-26 22:45:58 +08:00
  • 4a98a0f21a fix jon-chuang 2023-04-26 22:37:52 +08:00
  • 42c297b926 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into jon/use-hardware-cores jon-chuang 2023-04-26 22:21:52 +08:00
  • 0bf8b5bdeb
    Fixing typo DaniAndTheWeb 2023-04-26 15:51:55 +02:00
  • 8136918974
    Windows Make instructions DaniAndTheWeb 2023-04-26 15:45:01 +02:00
  • 93a8e00dfa Merge branch 'master' into concedo Concedo 2023-04-26 18:01:35 +08:00
  • ef51e9ecac
    Merge branch 'ggerganov:master' into hipblas Henri Vasserman 2023-04-26 12:46:26 +03:00
  • 27bc29128e
    Update README.md (#120) Disty0 2023-04-26 12:33:34 +03:00
  • 2b0c6a56f9 Improve code quality 0cc4m 2023-04-26 07:48:04 +02:00
  • 741bb67445 Allow setting the rng seed after initialization. Asgeir Bjarni Ingvarsson 2023-04-25 23:23:15 +00:00
  • 2ca73cb6ea
    Clarify the effect of BLAS DaniAndTheWeb 2023-04-26 01:40:00 +02:00
  • b6904cc79f
    BLAS for Mac DaniAndTheWeb 2023-04-26 01:33:25 +02:00
  • 2ff156d463
    Better BLAS explanation DaniAndTheWeb 2023-04-26 01:27:04 +02:00
  • 5ac9074a7c
    Better BLAS explanation DaniAndTheWeb 2023-04-26 01:07:34 +02:00
  • e1b704b44c
    Update information about BLAS DaniAndTheWeb 2023-04-26 00:38:40 +02:00
  • ab07da07c1
    Update README.md DaniAndTheWeb 2023-04-26 00:17:51 +02:00
  • e2bb127fd8
    Updated build information DaniAndTheWeb 2023-04-26 00:15:56 +02:00
  • 4afcc37869
    Update SHA256SUMS after quantization change (#1181) Stephan Walter 2023-04-25 21:41:56 +00:00
  • e4e868e2e5
    Update SHA256SUMS after quantization change (65B) Pavol Rusnak 2023-04-25 23:40:57 +02:00
  • 667c501334
    py : cast lora_alpha to int in convert-lora-to-ggml (#1170) ostix360 2023-04-25 23:33:08 +02:00
  • bb98e77be7
    nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) Pavol Rusnak 2023-04-25 23:19:57 +02:00
  • 7a32fcb3b2
    ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179) master-7a32fcb Georgi Gerganov 2023-04-25 23:40:51 +03:00
  • e8c3731764
    ggml : fix assert using wrong QK4_2 instead of QK4_3 Georgi Gerganov 2023-04-25 23:25:03 +03:00
  • 4ddb983a02
    ggml : fix Q8_0 to use 255 values out of 256 Georgi Gerganov 2023-04-25 23:23:05 +03:00
  • 91bfa51dca
    ggml : extend quantize_fns_t with "vec_dot_type" Georgi Gerganov 2023-04-25 22:47:50 +03:00
  • 46fc696dea
    ggml : fix bug - using wrong block type Georgi Gerganov 2023-04-25 22:28:26 +03:00
  • 6e0f0b6ff1
    ggml : Q8_0 unroll x2 Georgi Gerganov 2023-04-25 22:21:57 +03:00
  • 88618ab7f5
    ggml : fix Q8_0 dot product bug (ARM) Georgi Gerganov 2023-04-25 22:14:25 +03:00
  • 6496b79e8e
    ggml : use q4_0_q8_0 and q4_2_q8_0 Georgi Gerganov 2023-04-25 22:08:44 +03:00
  • d8bf7207f1
    ggml : finalize Q8_0 implementation Georgi Gerganov 2023-04-25 22:03:08 +03:00
  • 79cfdf5e23
    tests : fix test-quantize-fns Georgi Gerganov 2023-04-25 21:55:15 +03:00
  • 95c6f85ae3 Update SHA256SUMS after quantization change Stephan Walter 2023-04-25 20:51:41 +02:00
  • f83c321c47 ggml : add Q8_0 quantization format (rename the old one to Q8_1) Georgi Gerganov 2023-04-25 21:39:06 +03:00
  • d571d1629f Merge 'origin/master' into hipblas Henri Vasserman 2023-04-25 21:15:33 +03:00
  • 608aa33d9f change default GPU arch to match CMake Henri Vasserman 2023-04-25 21:15:04 +03:00
  • b73c19201f
    Merge branch 'ggerganov:master' into master CRD716 2023-04-25 12:56:06 -05:00
  • 618fda5009 examples : switch input_noecho to input_echo to remove negation deadprogram 2023-04-25 19:55:25 +02:00
  • 137071003c Improve btype dequant kernel selection code, add error if type is unsupported 0cc4m 2023-04-25 19:40:54 +02:00
  • ecff6723d1
    Update convert-lora-to-ggml.py Pavol Rusnak 2023-04-25 19:29:11 +02:00
  • dd0eabc049
    ggml : use full range for Q4_0 and Q4_2 quantization (#729) master-dd0eabc unbounded 2023-04-25 19:20:46 +02:00