Commit graph

  • 09b6da741e
    gguf.py : string len uint64_t and n_dims uint32_t klosax 2023-08-26 21:53:56 +02:00
  • 6d369a1558
    gguf : update all counts to 64-bit Georgi Gerganov 2023-08-26 22:41:55 +03:00
  • bc3eaf262e
    gguf.py : string lengths uint32_t klosax 2023-08-26 21:29:36 +02:00
  • be726c57ee
    gguf.py : uint64_t on all lengths, sizes and counts, enums still uint32_t klosax 2023-08-26 21:23:12 +02:00
  • ba335ff5b2
    gguf.py : bump GGUF version Georgi Gerganov 2023-08-26 22:13:05 +03:00
  • 3656b3ce81
    gguf : v1 backwards comp Georgi Gerganov 2023-08-26 22:11:42 +03:00
  • 4f0547e4a3
    gguf : add support for 64-bit (no backwards comp yet) Georgi Gerganov 2023-08-26 22:05:14 +03:00
  • 2978e03086
    update readme with gguf filenames xaedes 2023-08-26 21:04:14 +02:00
  • 167dd2dcec
    add checkpoint file version for future compatibility xaedes 2023-08-26 21:04:01 +02:00
  • ff83017428 improve ggml_vec_dot_q4_K_q8_K AVX2 10% by reducing instruction dependency Ronny Brendel 2023-08-22 10:39:36 +02:00
  • c88362d194 llama2.c: support copying vocab from a llama gguf model file ochafik 2023-08-26 19:47:11 +01:00
  • 5f1fffd2d4
    gguf : bump version to 2 Georgi Gerganov 2023-08-26 21:52:27 +03:00
  • c7d92e6dfe
    llama : use Unicode Escape Sequence to replace encoded characters (#2814) b1081 Tim Miller 2023-08-27 03:27:07 +09:00
  • 61d1a2895e
    flake.nix : add rocm support and cleanup (#2808) Tungsten842 2023-08-26 20:19:44 +02:00
  • 741ca7dd1c
    llama : move #includes out of _GNU_SOURCE conditional (#2817) b1079 Cebtenzzre 2023-08-26 14:17:51 -04:00
  • 72f895c923
    main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (#1528) b1078 Dr. Tom Murphy VII Ph.D 2023-08-26 14:12:56 -04:00
  • 1bf050c2a8
    main : pass ctx to llama_token_nl() Georgi Gerganov 2023-08-26 21:06:41 +03:00
  • 0d29c8aaef
    Merge branch 'master' into master Georgi Gerganov 2023-08-26 21:03:38 +03:00
  • f5d4b48297
    main : fix indentation Georgi Gerganov 2023-08-26 21:02:41 +03:00
  • 8a136017f0 remove trailing white-space Bruce MacDonald 2023-08-26 13:27:27 -04:00
  • 46ec18406f llama : move #includes out of _GNU_SOURCE conditional Cebtenzzre 2023-08-26 13:08:50 -04:00
  • 50526f37eb
    llama : use std::abs in llama_sample_tail_free (#2800) b1077 Cebtenzzre 2023-08-26 12:53:52 -04:00
  • e4324cbd4d
    tests : add option to tokenize text files Georgi Gerganov 2023-08-26 19:21:22 +03:00
  • 5cea869275 fix stray whitespace after master sync staviq 2023-08-26 18:08:55 +02:00
  • 5031c50e48
    Merge branch 'master' into betterlogs staviq 2023-08-26 17:57:10 +02:00
  • e99f039c9e cleanup main.cpp:273 staviq 2023-08-26 17:52:25 +02:00
  • 9be7e2b0bd llama : use std::abs in llama_sample_tail_free Cebtenzzre 2023-08-25 18:16:58 -04:00
  • 5f23d41faa Refactor types KerfuffleV2 2023-08-26 09:23:06 -06:00
  • 70005bd5c9
    tests : use Python to generate tokenizer tests for C++ Georgi Gerganov 2023-08-26 18:05:59 +03:00
  • dfa058ef73
    examples : no longer manually add leading space when tokenizing Georgi Gerganov 2023-08-26 17:51:35 +03:00
  • 1e7a033f10
    common : add comments Georgi Gerganov 2023-08-26 17:42:33 +03:00
  • 04f4b1eb10
    k-quants : remove unnecessary tensor shape restrictions (#2811) b1076 Georgi Gerganov 2023-08-26 17:37:35 +03:00
  • 9668aa115c
    llama : distinguish pieces from decoded text + fix detokenization Georgi Gerganov 2023-08-26 17:35:45 +03:00
  • 7592375403
    Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (#2807) b1075 Kawrakow 2023-08-26 17:27:49 +03:00
  • 9ee138e62f Use Unicode Escape Sequence to replace encoded characters Tim Miller 2023-08-26 23:17:48 +09:00
  • 5d0ffb69f5
    llama : prefix input text for tokenization with whitespace Georgi Gerganov 2023-08-26 17:08:59 +03:00
  • 771551a793
    Fix HellaSwag (#2805) b1074 Kawrakow 2023-08-26 16:48:53 +03:00
  • 3979af1e58 PR comment Iwan Kawrakow 2023-08-26 16:44:22 +03:00
  • b398d885a1 flake.nix: add rocm support and cleanup Tungsten842 2023-08-26 14:49:15 +02:00
  • f305bad11e
    flake : build llama.cpp on Intel with nix (#2795) Volodymyr Vitvitskyi 2023-08-26 14:25:39 +01:00
  • 6a20f7a2f0
    bug fixes xaedes 2023-08-25 22:32:39 +02:00
  • d01f52409f Added const if possible lijiahao 2023-08-26 21:14:09 +08:00
  • eff86d4f13
    k-quants : remove unnecessary tensor shape restrictions Georgi Gerganov 2023-08-26 16:05:18 +03:00
  • 5cad62bce4
    tests : write a Python tokenizer test (wip) Georgi Gerganov 2023-08-26 15:55:23 +03:00
  • 4c93e55996 cuda: 1.2x faster dequantization kernel lijiahao 2023-08-26 20:38:32 +08:00
  • a2ca4e9de9
    Handle null rope scaling value (#2793) Nigel Bosch 2023-08-26 07:11:17 -05:00
  • 6544756895 Better perplexity for 2- and 3-bit quantization for the 70B model Iwan Kawrakow 2023-08-26 14:52:54 +03:00
  • 2ba83c8685
    Fix spm whitespaces (#2806) b1071 klosax 2023-08-26 13:45:53 +02:00
  • c50b1ae6b8
    test-tokenizer-0.cpp : spm - add whitespace in front of prompt klosax 2023-08-26 13:13:05 +02:00
  • 63174b8073 llama2.c gguf conversion: fix token types in converter ochafik 2023-08-26 11:17:40 +01:00
  • 43f7c16ad0
    main.cpp : spm - add whitespace in front of prompt klosax 2023-08-26 12:11:09 +02:00
  • d52894602d
    llama.cpp : fix spm whitespace escaping + clean up klosax 2023-08-26 12:08:34 +02:00
  • bae5c5f679
    examples : skip unnecessary external lib in server README.md how-to (#2804) lon 2023-08-26 10:07:43 +02:00
  • d34472c124 Fix HellaSwag ik/fix_hellaswag Iwan Kawrakow 2023-08-26 10:55:39 +03:00
  • e664c0cf54 examples: nodejs do not require external lib for fetch Lon 2023-08-26 08:49:36 +02:00
  • d793897a03 Add a /detokenize endpoint to the example server Bruce MacDonald 2023-08-25 18:16:53 -07:00
  • 0b367255c1 fix msvc staviq 2023-08-26 02:57:50 +02:00
  • 5562e3e6fa
    temporarily disable broken 512 build ci_cublas_linux-b1071-5562e3e Green Sky 2023-08-26 01:54:14 +02:00
  • 20f7f4c8de
    ci: add linux binaries to release build Green Sky 2023-05-05 00:01:30 +02:00
  • 540798132b
    implement loading/saving of checkpointing files using GGUF xaedes 2023-08-24 21:57:16 +02:00
  • ef6dc874dc
    Merge pull request #1 from arcrank/arcrank-patch-1 arcrank 2023-08-25 14:11:27 -04:00
  • c248ddfe79
    Update api_like_OAI.py simplify function arcrank 2023-08-25 14:10:32 -04:00
  • 7930a5af0e Build llama.cpp on Intel with nix Volodymyr Vitvitskyi 2023-08-25 18:22:30 +01:00
  • a14a033fb7 Handle null rope scaling value Nigel Bosch 2023-08-25 12:00:50 -05:00
  • 9bace227c5 stub LOG_DUMP_CMDLINE for WIN32 for now staviq 2023-08-25 18:24:55 +02:00
  • e23fa92401
    Merge branch 'master' into skip-unused-2 Olivier Chafik 2023-08-25 17:24:41 +01:00
  • 232caf3c15
    llama : fix struct decl (#2790) b1069 Marcus Dunn 2023-08-25 09:17:15 -07:00
  • d046dcee08
    Faster perplexity computation (#2786) b1068 Kawrakow 2023-08-25 19:05:02 +03:00
  • 7051c50200
    Update llama.h Marcus Dunn 2023-08-25 08:43:14 -07:00
  • c8a1118308 fix LOG_TEELN and configchecker staviq 2023-08-25 17:41:15 +02:00
  • c82742ac9c
    llama : add llama_beam_search() (#2267) b1067 Matt Pulver 2023-08-25 11:18:48 -04:00
  • ce45974afa Faster perplexity computation Iwan Kawrakow 2023-08-25 18:06:27 +03:00
  • 5fa1ea2c38 Change eos to eob in llama_beam and llama_beam_view structs. Matt Pulver 2023-08-25 09:47:52 -04:00
  • b619cfc059 Delete obsolete comment from an earlier revision. Matt Pulver 2023-08-25 09:38:15 -04:00
  • fa33614b4d Use llama_ prefix for structs in global namespace. Matt Pulver 2023-08-25 09:36:34 -04:00
  • 28b2c996ca
    convert.py : Get rope scale from HuggingFace models (#2772) Nigel Bosch 2023-08-25 09:41:52 -05:00
  • 93daad763d Prefer west const. Matt Pulver 2023-08-25 09:34:13 -04:00
  • e46a8b517f Add spaces around comparison and assignment operators. Matt Pulver 2023-08-25 09:31:19 -04:00
  • 9bedaf4c71 Add space around * pointers and & references. Matt Pulver 2023-08-25 09:22:14 -04:00
  • 4950b2dbbc
    Merge branch 'master' into convert_rope_scale Nigel Bosch 2023-08-25 09:16:50 -05:00
  • 4fdcede9ec LOG_DISABLE_LOGS compile flag, wrapped f in macros staviq 2023-08-25 15:51:27 +02:00
  • abe0829984 Add '// Beam search' heading to llama.{h,cpp} after llama_grammar_accept_token(). Matt Pulver 2023-08-25 09:18:24 -04:00
  • 154725c543
    llama-bench : add model sizes (#2771) b1065 slaren 2023-08-25 15:16:19 +02:00
  • 3247687d8c adjust column sizes slaren 2023-08-25 15:15:03 +02:00
  • cc544b2057 back to GiB slaren 2023-08-25 15:11:57 +02:00
  • 3be6e8d36f Tweak GPU offload when skipping unused logits computations Olivier Chafik 2023-08-25 14:02:04 +01:00
  • 5553820d90 Allow disabling unused logit skipping code w/ cmake / make options Olivier Chafik 2023-08-25 14:00:24 +01:00
  • bc0dc16c93 more compact markdown output slaren 2023-08-25 14:32:20 +02:00
  • c4269e0200 Add llama_beam_search(). Matt Pulver 2023-07-18 14:33:34 -04:00
  • 12e2e33a97
    convert.py : export rope freq_base when converting CodeLlama from an HF model (#2773) slaren 2023-08-25 14:08:53 +02:00
  • 29674ab4e8
    server : display token probabilities in the UI (#2489) b1063 Jhen-Jie Hong 2023-08-25 18:32:45 +08:00
  • 5439a0ab57
    ci : pip install gguf in editable mode (#2782) Georgi Gerganov 2023-08-25 13:03:25 +03:00
  • e38f78476a
    ci : pip install gguf in editable mode Georgi Gerganov 2023-08-25 12:46:20 +03:00
  • 8194cd8772
    gguf : export objects to user code (#2780) M. Yusuf Sarıgöz 2023-08-25 12:43:41 +03:00
  • 058fbdd899 gguf : bump version M. Yusuf Sarıgöz 2023-08-25 12:18:22 +03:00
  • 37e045830c gguf export all objects to user code for now M. Yusuf Sarıgöz 2023-08-25 12:13:52 +03:00
  • 79c9f7ff22 gguf export more objects to user code M. Yusuf Sarıgöz 2023-08-25 12:11:08 +03:00
  • 6bbc598a63
    ROCm Port (#1087) b1060 Henri Vasserman 2023-08-25 12:09:42 +03:00
  • 3f460a2b72
    cuda : add RoPE kernel for mode == 2 (NeoX) (#2760) b1059 Georgi Gerganov 2023-08-25 11:55:59 +03:00
  • 333e27b31f falcon : do not offload the embeddings layer Georgi Gerganov 2023-08-25 11:54:57 +03:00