Commit graph

  • f3947e1e02
    scripts : rename to server-llm.sh Georgi Gerganov 2023-10-31 13:58:18 +02:00
  • 2f719c876d
    scripts : add deploy-server.sh Georgi Gerganov 2023-10-31 11:29:23 +02:00
  • 1cb90e57e4 Add further ops, not yet enabled. Improve semaphore code 0cc4m 2023-10-31 09:49:56 +01:00
  • fc5a26aade
    llama : enable warning about not offloaded tensors Georgi Gerganov 2023-10-31 08:57:10 +02:00
  • 0bfdcdd0f8
    llama : normalize tensor names Georgi Gerganov 2023-10-31 08:46:34 +02:00
  • 6669cd8329
    llama : update offload functions for KQ tensors Georgi Gerganov 2023-10-31 08:24:07 +02:00
  • 2926ef63b1
    llama : fix input allocation logic Georgi Gerganov 2023-10-31 08:23:43 +02:00
  • dc3115f2a3 Add another alias to n_layers Galunid 2023-10-31 04:20:51 +01:00
  • 08f183c229 convert : restore Falcon vocab padding cebtenzzre 2023-10-30 23:05:05 -04:00
  • 0743f7a900 Fix variable Galunid 2023-10-31 03:52:52 +01:00
  • b9c664ab2f Woops Galunid 2023-10-31 03:42:55 +01:00
  • 6f6856c6ea [Untested] Initial Persimmon support Galunid 2023-10-31 03:27:04 +01:00
  • 94ba1db24a Add Starcoder and Refact Galunid 2023-10-31 03:12:25 +01:00
  • 0afa75a9a2 Add Falcon support Galunid 2023-10-31 02:13:45 +01:00
  • 3bb9844de9 Get rid of dumb print Galunid 2023-10-31 01:54:24 +01:00
  • 08918b700e MPT conversion fix Galunid 2023-10-31 01:52:55 +01:00
  • 653bc1c000 Merge branch 'master' of https://github.com/AndrewGodfrey/llama.cpp into finetune_enableGpu Andrew Godfrey 2023-10-30 14:05:42 -07:00
  • 207b51900e
    ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861) b1446 Georgi Gerganov 2023-10-30 19:19:15 +02:00
  • 4b3cb98d46
    ggml-impl : move extern "C" to start of file ggml-impl Georgi Gerganov 2023-10-30 19:05:58 +02:00
  • d70917f4b2
    ggml : prefix lookup tables with ggml_ Georgi Gerganov 2023-10-30 18:38:11 +02:00
  • 1039a16ce2
    ggml : remove duplicate static assert macros Georgi Gerganov 2023-10-30 18:35:03 +02:00
  • 9fc823826e
    fix loading rope.scaling.original_context_length from GGUF (#3) Jeffrey Quesnelle 2023-10-30 08:35:51 -07:00
  • 9eba77c6a0 finally got something workable Concedo 2023-10-30 23:30:21 +08:00
  • 223696c9f9
    ggml : add math.h to ggml-impl.h Georgi Gerganov 2023-10-30 17:12:27 +02:00
  • 334984e457
    ggml : explicitly initialize deprecated type traits Georgi Gerganov 2023-10-30 17:09:37 +02:00
  • a1c3ff68cd
    tests : fix ARM build Georgi Gerganov 2023-10-30 16:53:34 +02:00
  • d3e2cedb79
    ggml : move FP16 <-> FP32 stuff to ggml-impl.h Georgi Gerganov 2023-10-30 16:35:17 +02:00
  • bc28aaa8c2
    make : use -lfto=auto to avoid warnings and maintain perf lto Georgi Gerganov 2023-10-30 16:00:53 +02:00
  • 57c4296cf0
    ci : fix focal build Georgi Gerganov 2023-10-30 15:58:40 +02:00
  • a6aba2c85c
    ci : try to fix code coverage build Georgi Gerganov 2023-10-30 15:43:05 +02:00
  • 6f6b0db6d1
    build : disable lto for C++ (make) and enable existing LTO flag (cmake) Georgi Gerganov 2023-10-30 15:40:01 +02:00
  • 1206b5f3be
    build : enable link-time optimizations Georgi Gerganov 2023-10-30 15:12:54 +02:00
  • e19b78038a
    Add transformers dependency wonjun Jang 2023-10-30 11:56:49 +00:00
  • d54764d0b1
    Add VocabLoader and remove *Vocab class wonjun Jang 2023-10-30 11:54:59 +00:00
  • a3f80013ad
    llama : add LLAMA_OFFLOAD_DEBUG + fix starcoder offloading Georgi Gerganov 2023-10-30 12:14:23 +02:00
  • 792d1a1b16
    llama : minor Georgi Gerganov 2023-10-30 11:34:47 +02:00
  • 61c395833d context shifting is still buggy Concedo 2023-10-30 16:25:01 +08:00
  • 998a548a30 tabs to spaces Andrew Godfrey 2023-10-29 19:14:27 -07:00
  • e00b7b3769 flake.nix: fix for rocm 5.7 Tungsten842 2023-10-29 22:23:23 +01:00
  • 87adfad25f Merge branch 'min-p-sampling' of https://github.com/kalomaze/koboldcpp into min-p-sampling kalomaze 2023-10-29 15:50:02 -05:00
  • 18c0aa7c31 Merge remote-tracking branch 'original/cuda-quantum-batch' into min-p-sampling kalomaze 2023-10-29 15:46:50 -05:00
  • f39e6075cf
    llama : add llm_build_kqv helper Georgi Gerganov 2023-10-29 22:26:36 +02:00
  • c9121fdd0f
    llama : remove obsolete comments in build graphs Georgi Gerganov 2023-10-29 21:44:19 +02:00
  • a104abea48
    llama : simplify falcon Q, K, V computation Georgi Gerganov 2023-10-29 21:24:25 +02:00
  • 31a12f3d03
    llama : fix llm_build_k_shift to use n_head_kv instead of n_head Georgi Gerganov 2023-10-29 21:17:46 +02:00
  • 5990861938
    llama : remove obsolete offload names Georgi Gerganov 2023-10-29 21:11:20 +02:00
  • 3e0462594b
    llama : add llm_build_kv_store helper Georgi Gerganov 2023-10-29 20:35:20 +02:00
  • 443f7d586e Call add_tensor before write_* functions Galunid 2023-10-29 20:00:54 +01:00
  • 909d64471b
    llama : fix offloading after recent changes Georgi Gerganov 2023-10-29 19:45:27 +02:00
  • e71544231c
    Update convert.py wonjun Jang 2023-10-29 18:29:38 +00:00
  • 97f690ab51
    Merge branch 'master' into convert_hf_vocab wonjun Jang 2023-10-30 02:33:14 +09:00
  • 6e08281e58
    Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843) b1445 Kerfuffle 2023-10-29 11:31:40 -06:00
  • 38728a0be0
    llama : add llm_build_k_shift helper Georgi Gerganov 2023-10-29 19:22:54 +02:00
  • bed3f179b4 Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear KerfuffleV2 2023-10-29 10:59:24 -06:00
  • dbf836bb64
    llama : add llm_build_ffn helper function (#3849) Georgi Gerganov 2023-10-29 18:47:46 +02:00
  • 2046eb4345
    make : remove unnecessary dependency on build-info.h (#3842) b1444 cebtenzzre 2023-10-29 12:33:47 -04:00
  • 71a09da301
    llama : fix kv shift bug (#3835) b1443 Georgi Gerganov 2023-10-29 18:32:51 +02:00
  • d69d777c02
    ggml : quantization refactoring (#3833) b1442 Georgi Gerganov 2023-10-29 18:32:28 +02:00
  • 7f5d1b2fc6 slider error Concedo 2023-10-30 00:02:38 +08:00
  • 7f050b5d16 tweak numbers Concedo 2023-10-29 22:46:19 +08:00
  • 3b778a4af9
    llama : add llm_build_ffn helper function Georgi Gerganov 2023-10-29 16:24:08 +02:00
  • 7db9c96d8a
    llama : add llm_build_norm helper function Georgi Gerganov 2023-10-29 15:39:58 +02:00
  • 210e6e5d02
    llama : remove obsolete map for layer counting Georgi Gerganov 2023-10-29 13:39:04 +02:00
  • 79ad734417
    llama : comment Georgi Gerganov 2023-10-29 13:27:53 +02:00
  • 761087932b
    llama : add functional header Georgi Gerganov 2023-10-29 13:26:23 +02:00
  • 8925cf9ef8
    llama : add layer index to all tensor names Georgi Gerganov 2023-10-29 13:22:15 +02:00
  • 1e9c5443c2
    llama : refactor tensor offloading as callback Georgi Gerganov 2023-10-29 12:35:07 +02:00
  • 15267192c0
    llama : refactor tensor offloading as callback scratch Georgi Gerganov 2023-10-29 12:35:07 +02:00
  • 7924592a83 context shift feature done Concedo 2023-10-29 18:21:39 +08:00
  • da936188d8
    llama : move refact in correct place + optimize graph input Georgi Gerganov 2023-10-29 11:48:24 +02:00
  • 739b85c985
    llama : try to fix build Georgi Gerganov 2023-10-29 11:25:32 +02:00
  • 25cfbf6776
    llama : fix non-CUDA build Georgi Gerganov 2023-10-29 11:12:03 +02:00
  • b4ad03b3a7
    llama : try to optimize offloading code Georgi Gerganov 2023-10-29 10:33:11 +02:00
  • 2d0826247b OpenCL: Pass src0 offset as kernel argument instead of global offset shibe2 2023-10-29 12:18:46 +04:00
  • 79617902ea
    llama : fix res_norm offloading Georgi Gerganov 2023-10-29 09:20:35 +02:00
  • e14aa46151
    llama : do tensor offload only with CUDA Georgi Gerganov 2023-10-29 08:03:46 +02:00
  • 0dc05b8433
    llama : factor graph input into a function Georgi Gerganov 2023-10-29 07:52:43 +02:00
  • 4e98897ede
    llama : support offloading result_norm + comments Georgi Gerganov 2023-10-29 07:36:07 +02:00
  • 3ddfd67d13 permit simultaneous use of top_p and min_p cebtenzzre 2023-10-29 01:31:14 -04:00
  • 69e638e56a cleanup cebtenzzre 2023-10-29 00:23:21 -04:00
  • fcbbfc1666 Even formatting + exclusively 0.0f to disable now kalomaze 2023-10-28 23:52:22 -05:00
  • cb233584cc minor whitespace fix kalomaze 2023-10-28 23:40:23 -05:00
  • 6f7cdec38a Simplified counter by checking candidates size kalomaze 2023-10-28 23:37:18 -05:00
  • 49b68e8226 Standardize 0.0 disabling min_p upon feedback kalomaze 2023-10-28 23:12:14 -05:00
  • 62fc77153b Remove accidentally kept prints + min_keep support kalomaze 2023-10-28 23:04:29 -05:00
  • ebd4b91327 Extend llama_kv_cache_seq_rm to allow matichng any sequence KerfuffleV2 2023-10-28 21:06:55 -06:00
  • 833637b703 erring on the side of caution; disable by default kalomaze 2023-10-28 22:05:05 -05:00
  • 338d6c265d fixes to smartcontextpro Concedo 2023-10-29 10:42:37 +08:00
  • 69ef4ca885 Debugging print statements removed kalomaze 2023-10-28 21:14:55 -05:00
  • 838d58dc32 Min P disabled if set to 1.0 or 0, otherwise Top P kalomaze 2023-10-28 21:08:26 -05:00
  • a235a0d226 Transform Min P into a proper CLI option kalomaze 2023-10-28 20:49:17 -05:00
  • b8f1c3cbac make : remove unnecessary dependency on build-info.h cebtenzzre 2023-10-28 21:24:43 -04:00
  • 550b925af2 Missing variable Galunid 2023-10-29 02:06:41 +01:00
  • 989db34149 Missing variable Galunid 2023-10-29 02:05:28 +01:00
  • 8618b4e74c Add [UNTESTED] Baichuan support Galunid 2023-10-29 01:38:35 +02:00
  • 0ff237105d Make gguf_writer member of Model, rework tokenizer export Galunid 2023-10-29 00:33:05 +02:00
  • a9e2b74f1a Super hacky starting implementation of Min P kalomaze 2023-10-28 17:23:06 -05:00
  • 8a86b95e87 quantize : --pure option for disabling k-quant mixtures ggml-quants cebtenzzre 2023-10-28 16:32:49 -04:00
  • 51c4f9ee9f
    llama : comments Georgi Gerganov 2023-10-28 22:50:08 +03:00
  • 3af8771389
    llama : update offload log messages to print node index Georgi Gerganov 2023-10-28 22:36:44 +03:00