Commit graph

  • 14ac9dadc4 metal: fix-test lshzh-ww 2023-08-21 01:27:37 -04:00
  • 1e9fe8a954 always send partial response for get correct probs of last to_send Jhen 2023-08-21 13:26:23 +08:00
  • 371cc14815 remove unused function Jhen 2023-08-21 13:06:59 +08:00
  • b7ddf04a26 correct probabilites usage Jhen 2023-08-21 12:47:39 +08:00
  • a7042c187f revert unnecessary change Jhen 2023-08-21 11:49:48 +08:00
  • 54f9f3c107 use final response to show probabilities on stop Jhen 2023-08-21 11:49:04 +08:00
  • e4c04c242d fix incorrect prob convert if the str is already a known token Jhen 2023-08-21 11:48:51 +08:00
  • c818c405e0
    convert-llama-hf-to-gguf.py : fix attn_q permute klosax 2023-08-21 04:42:09 +02:00
  • 58bde5c5c1
    Delete convert-permute-debug.py klosax 2023-08-21 04:35:06 +02:00
  • 287db51015
    Delete convert-permute-debug-master.py klosax 2023-08-21 04:34:39 +02:00
  • d5c8fcfd8a
    convert.py : 70b model working (change attn_q permute) klosax 2023-08-21 04:33:33 +02:00
  • 7de7cb4bd8
    convert-permute-debug.py : change permute type of attn_q klosax 2023-08-21 04:06:59 +02:00
  • 4f92488dd6
    convert-permute-debug-master.py : permute debug for master klosax 2023-08-21 03:44:16 +02:00
  • 5a02b9625a
    convert-permute-debug.py : permute debug print klosax 2023-08-21 03:24:29 +02:00
  • 8b4106ae33
    also save latest finetune output with ITERATION="LATEST" and print where files are saved xaedes 2023-08-21 02:24:25 +02:00
  • 11e863651d generate index.html.hpp Jhen 2023-08-21 06:39:21 +08:00
  • 25e6747a56 Merge branch 'master' into server-probs Jhen 2023-08-21 06:39:00 +08:00
  • 7ec9c22249 skip empty array or byte pair (> 1) in Probabilites Jhen 2023-08-21 06:38:22 +08:00
  • 44dd9ed287 Improve commentary goerch 2023-08-21 00:28:31 +02:00
  • 6586487e62 Restored accidentally removed comment goerch 2023-08-21 00:13:25 +02:00
  • dea1e4c03e Merge branch 'gguf' of https://github.com/ggerganov/llama.cpp into gguf goerch 2023-08-21 00:12:47 +02:00
  • 9e232f0234
    ggml : move all type info to ggml_type_traits (#2663) master-9e232f0 slaren 2023-08-20 22:17:53 +02:00
  • 27c24ffa1b
    add option to save finetune output every N iterations xaedes 2023-08-20 20:16:46 +02:00
  • d61ed6b431
    mixing multiple LORA adapters is now possible xaedes 2023-08-20 18:36:20 +02:00
  • f838faa874
    convert-llama-7b-pth-to-gguf.py : special tokens klosax 2023-08-20 16:56:48 +02:00
  • 76b46627e2
    convert-llama-hf-to-gguf.py : special tokens klosax 2023-08-20 16:54:42 +02:00
  • c9d9b05281 HellaSwag: split token evaluation into batches if needed Iwan Kawrakow 2023-08-20 17:41:13 +03:00
  • 5e9ff54a67
    More efficient Hellaswag implementation (#2677) master-5e9ff54 Kawrakow 2023-08-20 16:44:46 +03:00
  • 05ef02aec3 More efficient Hellaswag implementation Iwan Kawrakow 2023-08-20 10:05:29 +03:00
  • 3d8e255514 make scripts executable Cebtenzzre 2023-08-18 17:49:25 -04:00
  • 01046648cf ggml: create thread pool lazily JohannesGaessler 2023-08-19 19:54:56 +02:00
  • 16ab5f1b18 ggml: use __CUDACC__ to recognise nvcc compiler Kylin 2023-08-20 01:57:24 +08:00
  • 28b8c265eb
    cmpnct_gpt2bpe.hpp : cleanup gguf-28b8c26 klosax 2023-08-19 18:26:51 +02:00
  • 8a25bd41b3 ggml: support CUDA's half type for aarch64(#1455) support CUDA's half type for aarch64 in ggml_fp16_t definition Kylin 2023-08-20 00:07:50 +08:00
  • 5ae5d2bd5b Remove unnecessary scalar layout extension 0cc4m 2023-08-19 17:53:48 +02:00
  • 2faad208ae CUDA: fix __builtin_assume for CUDA < 11.2 JohannesGaessler 2023-08-19 17:17:55 +02:00
  • aea173f5af More sentencepiece compatibility by eliminating magic numbers goerch 2023-08-19 16:50:29 +02:00
  • c0a1269b7f
    Update examples/server/README.md klosax 2023-08-19 15:27:37 +02:00
  • da837401cd Exclude platform dependent tests goerch 2023-08-19 14:50:32 +02:00
  • dc65fb3044 Merge branch 'gguf' of https://github.com/goerch/llama.cpp into gguf goerch 2023-08-19 14:40:21 +02:00
  • 370a95f524 Improve token type support goerch 2023-08-19 14:39:33 +02:00
  • 21d88645fc Merge branch 'gguf' of https://github.com/ggerganov/llama.cpp into gguf goerch 2023-08-19 13:37:04 +02:00
  • c16ea8e193
    Merge branch 'ggerganov:gguf' into gguf goerch 2023-08-19 13:36:05 +02:00
  • 6a2e520095
    cmpnct_gpt2bpe.hpp : remove non-general stuff klosax 2023-08-19 13:19:02 +02:00
  • 8945d47f52
    gptneox-main.cpp : fixes klosax 2023-08-19 12:09:24 +02:00
  • 781bf2481f
    falcon-main.cpp : fixes klosax 2023-08-19 12:08:17 +02:00
  • dadf098b5a
    cmpnct_gpt2bpe.hpp : fixes klosax 2023-08-19 12:06:22 +02:00
  • b3a7a2b486
    convert-falcon-hf-to-gguf.py : add tensor data layout klosax 2023-08-19 12:05:11 +02:00
  • 12e4284c31 Fix CUDA softmax by subtracting max value before exp lijiahao 2023-08-19 11:55:01 +08:00
  • 946e3138a4 ggml : move all type info to ggml_type_traits slaren 2023-08-19 02:54:25 +02:00
  • 2c8055b65b
    convert-falcon-hf-to-gguf.py : update ref klosax 2023-08-19 01:08:39 +02:00
  • 1d80eea574
    falcon-main.cpp : fix for falcon 40b klosax 2023-08-19 01:03:37 +02:00
  • bd5a57901b
    gguf.py : fix for falcon 40b klosax 2023-08-19 01:01:52 +02:00
  • 281d6d1105
    convert-llama-hf-to-gguf.py : remove extra kv klosax 2023-08-19 00:32:56 +02:00
  • 593b04fdcd
    convert-llama-7b-pth-to-gguf.py : remove extra kv klosax 2023-08-19 00:32:27 +02:00
  • c0e4ca630b
    convert-gptneox-hf-to-gguf.py : remove extra kv klosax 2023-08-19 00:31:56 +02:00
  • 16ab9ba3b3
    convert-falcon-hf-to-gguf.py : remove extra kv klosax 2023-08-19 00:31:28 +02:00
  • d5e976c12b
    falcon-main.cpp : falcon inference example klosax 2023-08-19 00:02:18 +02:00
  • 95f2c5d475
    Merge 1c154e9ea5 into 1f0bccb279 Eve 2023-08-18 21:50:59 +00:00
  • 1c154e9ea5 lazy fix for llama-bench (runs without pp_threads support) netrunnereve 2023-08-18 17:49:04 -04:00
  • 1f0bccb279
    server : better default prompt (#2646) Georgi Gerganov 2023-08-19 00:45:36 +03:00
  • f63564adfa
    server : update xxd usage for older versions compatibility (#2649) Jhen-Jie Hong 2023-08-19 05:41:32 +08:00
  • a129a31457
    Merge branch 'ggerganov:master' into master Eve 2023-08-18 21:17:06 +00:00
  • a217151444 add gqa parameter (Llama 2 70b support) Colin Calvert 2023-08-18 15:41:51 -05:00
  • 2d8b76a110
    Add link to clojure bindings to Readme. (#2659) Adrian 2023-08-18 12:39:22 -07:00
  • 37dfb544aa
    resolve todo xaedes 2023-08-18 21:22:41 +02:00
  • 3e47890760
    remove unnecessary src tensor from ggml_repeat & ggml_repeat_back xaedes 2023-08-18 20:51:00 +02:00
  • 65b0561637
    remove unnecessary src tensor from ggml_get_rows_back xaedes 2023-08-18 20:25:03 +02:00
  • fb7c883cd3
    convert-falcon-hf-to-gguf.py : falcon HF --> gguf conversion, not tested klosax 2023-08-18 20:14:01 +02:00
  • 6c98640035
    bug fix: make sure finetune input gradient is allocated at begin and kept until end xaedes 2023-08-18 20:10:04 +02:00
  • 210b07f980
    Add link to clojure bindings to Readme. Adrian 2023-08-18 11:08:44 -07:00
  • 63cb374a99
    change default finetune params lora_r and lora_alpha to match the n_rank parameters of 4 xaedes 2023-08-18 19:08:15 +02:00
  • 25b8a8922d
    llama : introduce enum llama_vocab_type + remove hardcoded string constants Georgi Gerganov 2023-08-18 18:46:38 +03:00
  • 7a63d429af
    adjust maximal values to support finetuning 3B models xaedes 2023-08-18 17:32:31 +02:00
  • 7af633aec3
    readme : incoming BREAKING CHANGE Georgi Gerganov 2023-08-18 17:48:31 +03:00
  • a4ad2bf35c
    llama : fix MPI build Georgi Gerganov 2023-08-18 17:34:27 +03:00
  • 5d2656d670
    llama : avoid hardcoded special tokens Georgi Gerganov 2023-08-18 17:29:20 +03:00
  • 113c90f1cc
    improve optimization iteration prints xaedes 2023-08-18 16:24:42 +02:00
  • a0c2752ba7
    remove debug prints and function to compute tensor data hash xaedes 2023-08-18 16:24:13 +02:00
  • 035d511457
    llama : minor API updates Georgi Gerganov 2023-08-18 17:06:34 +03:00
  • 011f47f972
    remove trailing whitespace xaedes 2023-08-18 16:02:46 +02:00
  • f358204a5f
    avoid keeping in memory ALL of the gradients xaedes 2023-08-18 16:01:43 +02:00
  • 2d6c2c757c
    llama : remove C++ API + reorganize common source in /common dir Georgi Gerganov 2023-08-18 16:22:48 +03:00
  • a252111b45
    fix bug in ggml_out_prod which resulted in wrong n_dims of result tensors xaedes 2023-08-18 15:03:57 +02:00
  • 44526cb261
    make sure base model tensors data cannot be used in viewable operations xaedes 2023-08-18 15:03:17 +02:00
  • 38016ed9ec
    Merge branch 'master' into gguf Georgi Gerganov 2023-08-18 15:21:48 +03:00
  • 660ca9bbca
    llama : re-order functions Georgi Gerganov 2023-08-18 14:56:36 +03:00
  • 097e121e2f
    llama : add benchmark example (#2626) master-097e121 slaren 2023-08-18 12:44:58 +02:00
  • eaf98c2649
    readme : add link to Rust bindings (#2656) mdrokz 2023-08-18 15:47:58 +05:30
  • 80e1ca4853 chore: add rust bindings to readme mdrokz 2023-08-18 15:37:06 +05:30
  • e9b12c332e
    perplexity : more meaningful ETA number - 2 decimal points master-e9b12c3 Georgi Gerganov 2023-08-18 12:48:55 +03:00
  • dea5be61d7
    editorconfig : fix whitespaces Georgi Gerganov 2023-08-18 12:42:38 +03:00
  • e35f8c744e
    tests : update vocab file with new magic Georgi Gerganov 2023-08-18 12:39:09 +03:00
  • 856afff746
    Merge branch 'master' into gguf Georgi Gerganov 2023-08-18 12:38:05 +03:00
  • aa3efe87c8
    llama : print number of tensors per type + print arch + style Georgi Gerganov 2023-08-18 10:36:45 +03:00
  • 06a883f7b1 remove unused $func Jhen 2023-08-18 15:35:20 +08:00
  • a7871acced Merge remote-tracking branch 'origin/master' into prompt-array Xiao-Yong Jin 2023-08-17 21:49:35 -05:00
  • b275de745d
    llama.cpp : get special token kv and linefeed token id klosax 2023-08-18 03:34:30 +02:00
  • 604b8bdfa6
    Fix unicode in grammars (fixes #2501) (#2553) master-604b8bd Evan Jones 2023-08-17 19:54:44 -04:00
  • 5c6aee64de Merge branch 'master' into server-probs jhen 2023-08-18 07:38:35 +08:00