Commit graph

  • 3c025a6d07
    gguf : add KV constant maps Georgi Gerganov 2023-08-22 20:06:15 +03:00
  • deb7dfca4b
    gguf : add ftype meta info to the model (#2710) master-deb7dfc Georgi Gerganov 2023-08-22 20:05:59 +03:00
  • 3057d6a687
    llama : refactor llama_model_load_internal() Georgi Gerganov 2023-08-22 19:30:02 +03:00
  • 5d3e7b25e0
    use "ROCm" instead of "CUDA" Henri Vasserman 2023-08-22 19:24:35 +03:00
  • bac66994cf
    Quantization imrovements for k_quants (#2707) master-bac6699 Kawrakow 2023-08-22 19:14:09 +03:00
  • 8bd7f06b58
    llama : check if model architecture is known Georgi Gerganov 2023-08-22 19:03:08 +03:00
  • 4ed3469c68
    llama : refactor GGUF constants into static maps Georgi Gerganov 2023-08-22 18:35:28 +03:00
  • 331a647834
    Update lamma-cpp-cublas.srpm.spec JohnnyB 2023-08-22 16:59:04 +01:00
  • 2e832a941c
    Create lamma-cpp-clblast.srpm.spec JohnnyB 2023-08-22 16:56:48 +01:00
  • 6f8c94d61b
    Create lamma-cpp-cublas.srpm.spec JohnnyB 2023-08-22 16:54:17 +01:00
  • 391dd9a0e2
    Merge 'origin/master' into hipblas Henri Vasserman 2023-08-22 18:19:52 +03:00
  • 39cc83e8c9 incomplete merge, compiles but generates rubbish Concedo 2023-08-22 23:12:47 +08:00
  • ce956385cb
    Update llama-cpp.srpm.spec JohnnyB 2023-08-22 15:48:02 +01:00
  • 489177caf6
    Tested spec success. JohnnyB 2023-08-22 15:16:09 +01:00
  • 519c981f8b
    embedding : evaluate prompt in batches (#2713) master-519c981 slaren 2023-08-22 16:03:12 +02:00
  • 5cb4658c22 embedding : evaluate prompt in batches slaren 2023-08-22 15:35:27 +02:00
  • 34151c9d2b
    convert.py : fix Enum to IntEnum Georgi Gerganov 2023-08-22 16:28:30 +03:00
  • 1123f7fbdf
    ggml-cuda : use graph allocator (#2684) master-1123f7f slaren 2023-08-22 15:25:19 +02:00
  • 32fc925ea8
    convert.py : add ftype when converting (does not work) Georgi Gerganov 2023-08-22 16:19:15 +03:00
  • efdfc41e49 Alternative way to output PPL results Iwan Kawrakow 2023-08-22 16:12:45 +03:00
  • b791d1f489 Implementing strided computation of perplexity Iwan Kawrakow 2023-08-22 15:55:52 +03:00
  • 4cc6943d56
    Rename llama-cpp.srpm to llama-cpp.srpm.spec JohnnyB 2023-08-22 13:34:04 +01:00
  • d93383cb86
    Create llama-cpp.srpm JohnnyB 2023-08-22 13:21:34 +01:00
  • bee1f0e441
    llama : add ftype meta info to the model Georgi Gerganov 2023-08-22 14:37:35 +03:00
  • ef3f333d37
    ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709) master-ef3f333 Georgi Gerganov 2023-08-22 14:22:08 +03:00
  • 47b4b7f51f
    ggml : fix tabs Georgi Gerganov 2023-08-22 14:10:37 +03:00
  • 645bbec255
    ggml : sync latest (SAM + SD operators, CUDA alibi) Georgi Gerganov 2023-08-22 14:00:52 +03:00
  • 2d17c22437 functional commit before gguf merge Concedo 2023-08-22 18:20:06 +08:00
  • 8e4364f2af
    llama-bench : minor fixes (#2695) master-8e4364f slaren 2023-08-22 09:56:03 +02:00
  • fdf73db54d Fix for changed tensor names Iwan Kawrakow 2023-08-22 10:46:22 +03:00
  • 1e3bc523d8
    ggml : support CUDA's half type for aarch64(#1455) (#2670) master-1e3bc52 Kylin 2023-08-22 15:14:23 +08:00
  • 14b1d7e6f7
    metal : add missing barriers for mul-mat (#2699) Shouzheng Liu 2023-08-22 02:18:40 -04:00
  • 35a0b974e3 Fix after rebasing on master Iwan Kawrakow 2023-08-22 08:51:13 +03:00
  • b7063393d8 make_qkx2_quants is better for Q5_K after all Iwan Kawrakow 2023-08-22 08:45:28 +03:00
  • e2af308cc7 Better Q6_K Iwan Kawrakow 2023-08-21 17:57:26 +03:00
  • 9f78d4cdf9 Revert Q5_K back to make_qkx1_quants Iwan Kawrakow 2023-08-20 09:02:43 +03:00
  • 404e43cc3b Iterating Iwan Kawrakow 2023-08-16 10:52:54 +03:00
  • 1c1f985b27 Q2_K improvement Iwan Kawrakow 2023-08-14 20:03:14 +03:00
  • e9f1340c20 Another minor improvement Iwan Kawrakow 2023-08-14 17:20:02 +03:00
  • 4f8dcb1653 Adding make_qkx2_quants Iwan Kawrakow 2023-08-14 16:06:00 +03:00
  • ec9cb753a6 Some more fine tuning Iwan Kawrakow 2023-08-13 18:02:19 +03:00
  • 77aea7214f Minor 4-bit quantization improvement Iwan Kawrakow 2023-08-13 13:06:07 +03:00
  • f26f9ef42c Improve LLaMA-2 2-, 3- and 4-bit quantization Iwan Kawrakow 2023-08-13 11:41:20 +03:00
  • fbea8db29c
    tweaks to grammar guide Evan Jones 2023-08-21 23:09:59 -04:00
  • be6faa45a5 docs : add grammar docs Evan Jones 2023-08-21 23:03:35 -04:00
  • a2e814cf4f
    Merge branch 'ggerganov:master' into master William Behrens 2023-08-21 20:46:49 -05:00
  • e1911ac47c send empty string when got stop_pos in partial Jhen 2023-08-22 09:18:46 +08:00
  • de8cd113c6 Merge branch 'master' into server-probs Jhen 2023-08-22 09:18:40 +08:00
  • 0f7cb95352 Fix import of llama2.c models that don't share weights between embedding layers ochafik 2023-08-22 01:56:58 +01:00
  • 226255b44e
    server : fallback to default if client param is null (#2688) master-226255b Jhen-Jie Hong 2023-08-22 08:32:00 +08:00
  • 930523c8e1
    Fix convert-llama-ggmlv3-to-gguf.py vocab conversion (#2698) Kerfuffle 2023-08-21 18:01:34 -06:00
  • 5a43e729be Fix convert-llama-ggmlv3-to-gguf.py vocab conversion KerfuffleV2 2023-08-21 17:32:36 -06:00
  • aad8ef4668 Merge branch 'master' into server-probs Jhen 2023-08-22 06:50:45 +08:00
  • 0bf8b41459 Merge remote-tracking branch 'origin/master' into prompt-array Xiao-Yong Jin 2023-08-21 17:42:12 -05:00
  • 91b8be0877 refactor probs render & make pColor transparent if not found Jhen 2023-08-22 06:41:42 +08:00
  • 423db742e7
    Merge 'origin/master' into hipblas Henri Vasserman 2023-08-22 01:03:44 +03:00
  • 5bc418fa18 llama-bench : minor fixes slaren 2023-08-22 00:00:24 +02:00
  • 76515b7574 Merge remote-tracking branch 'origin/master' into cuda-graph-allocr slaren 2023-08-21 23:50:24 +02:00
  • 2d86b2e219 Add --config argument Pontus Mårdnäs 2023-08-21 23:46:56 +02:00
  • 2bfb39ac1d ggml-cuda : use graph allocator slaren 2023-08-20 21:31:53 +02:00
  • 2932a5516a metal: add missing barriers for mul-mat lshzh-ww 2023-08-21 16:43:47 -04:00
  • c8dba409e6
    py : remove obsolete script Georgi Gerganov 2023-08-21 23:40:22 +03:00
  • 6381d4e110
    gguf : new file format with flexible meta data (beta) (#2398) master-6381d4e Georgi Gerganov 2023-08-21 23:07:43 +03:00
  • 66a66a05a8
    readme : add notice about new file format gguf Georgi Gerganov 2023-08-21 22:11:00 +03:00
  • 811f653f95
    py : cosmetics Georgi Gerganov 2023-08-21 20:40:08 +03:00
  • 49c25cce19
    tests : use new tokenizer type API (#2692) goerch 2023-08-21 19:11:14 +02:00
  • 11e3806be4 Use token type API in test-tokenizer-1.cpp goerch 2023-08-21 19:04:03 +02:00
  • d3f5fbef6c
    main : flush stdout Georgi Gerganov 2023-08-21 19:52:51 +03:00
  • a856685648 Merge branch 'gguf' of https://github.com/goerch/llama.cpp into gguf goerch 2023-08-21 18:48:23 +02:00
  • 0b53b8b08d
    llama : add API for token type Georgi Gerganov 2023-08-21 19:35:31 +03:00
  • 8d177eddeb
    llama : improve token type support (#2668) goerch 2023-08-21 17:56:02 +02:00
  • e06cbcee73
    gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682) Kerfuffle 2023-08-21 08:45:52 -06:00
  • 054776049e Set default value for gguf add_tensor raw_shape KW arg KerfuffleV2 2023-08-21 08:27:31 -06:00
  • 6490ff7198
    py : fix whitespace Georgi Gerganov 2023-08-21 16:42:27 +03:00
  • e3da126f2a
    main : inject reverse prompt after EOS + update examples/chat.sh Georgi Gerganov 2023-08-21 16:41:27 +03:00
  • 1e7a0092dd
    Merge branch 'master' into gguf Georgi Gerganov 2023-08-21 16:27:51 +03:00
  • 2177142b49
    Merge e9c17039db into dadbed99e6 Lionel Cheng 2023-08-21 08:07:30 -05:00
  • 8af1991e2a
    main : restore old EOS behavior in interactive mode Georgi Gerganov 2023-08-21 15:40:51 +03:00
  • 7a7d1ba68a
    convert-llama-hf-to-gguf.py : rope scale fix klosax 2023-08-21 14:12:02 +02:00
  • 9070e330ab
    convert-llama-7b-pth-to-gguf.py : rope scale fix klosax 2023-08-21 14:11:22 +02:00
  • c082b9fa0b
    llama.cpp : use rope scale kv klosax 2023-08-21 13:30:03 +02:00
  • dc1f051013
    convert-llama-7b-pth-to-gguf.py : rope scale and added tokens klosax 2023-08-21 13:27:53 +02:00
  • 5f6ff387ca
    convert-llama-hf-to-gguf.py : rope scale and added tokens klosax 2023-08-21 13:25:14 +02:00
  • 6a69a693cb
    gguf.py : fix rope scale kv klosax 2023-08-21 13:23:10 +02:00
  • dadbed99e6
    metal : fix synchronization in new matrix multiplication kernel (#2686) Shouzheng Liu 2023-08-21 06:59:29 -04:00
  • f68aef5473 Fix wrong type size for Q8_K KerfuffleV2 2023-08-21 04:19:17 -06:00
  • 996aaca1d4 Use correct params override var name KerfuffleV2 2023-08-20 16:06:23 -06:00
  • e854cd7dc6 Allow overriding vocab and hyperparams from original model metadata KerfuffleV2 2023-08-20 15:58:02 -06:00
  • f56db2164a Allow specifying name and description for output GGUF KerfuffleV2 2023-08-20 14:24:26 -06:00
  • 80912f0741 Improve help text, expand warning KerfuffleV2 2023-08-20 13:15:01 -06:00
  • ff25134390 Add description to converted GGUF files KerfuffleV2 2023-08-20 13:03:19 -06:00
  • 8083e20d19 More vocab conversion fixes KerfuffleV2 2023-08-20 11:23:13 -06:00
  • 08959c88c2 Fix vocab space conversion logic KerfuffleV2 2023-08-20 10:36:57 -06:00
  • f7e61fd1a9 Cleanups, better output during conversion KerfuffleV2 2023-08-20 10:26:43 -06:00
  • 8afc1ef312 First pass at converting GGMLv3 LLaMA models to GGUF KerfuffleV2 2023-08-20 09:34:48 -06:00
  • cb1c0727bd
    HellaSwag: split token evaluation into batches if needed (#2681) master-cb1c072 Kawrakow 2023-08-21 11:11:31 +03:00
  • 1f373e349e server : do not overwrite 404 if status is 500 from exception_handler jhen 2023-08-21 16:02:32 +08:00
  • 3f9fa77fe0 server : fallback to default if client param is null jhen 2023-08-21 15:37:51 +08:00
  • af1ea58b60 fix content of format_final_response Jhen 2023-08-21 13:42:06 +08:00
  • 1bef2dcf87 fix typo Jhen 2023-08-21 13:33:33 +08:00