Commit graph

  • 83d2c43791
    llama : offload rest of the models Georgi Gerganov 2023-10-28 21:45:03 +03:00
  • 38aca9e1ab
    llama : factor out tensor offloading outside the build call (wip) Georgi Gerganov 2023-10-28 21:22:31 +03:00
  • 5946d98fc8
    metal : disable kernel load log Georgi Gerganov 2023-10-28 21:22:01 +03:00
  • 8b2420d249
    llama : factor out ggml-alloc from graph graph build functions Georgi Gerganov 2023-10-28 19:54:28 +03:00
  • fb6458340a
    llama : fix kv shift bug Georgi Gerganov 2023-10-28 18:30:55 +03:00
  • ff3bad83e2
    flake : update flake.lock for newer transformers version + provide extra dev shell (#3797) Erik Scholz 2023-10-28 16:41:07 +02:00
  • ee37e35dc5
    ggml-quants : fix Zig and Swift builds + quantize tool Georgi Gerganov 2023-10-28 17:21:36 +03:00
  • 3412be728b
    ggml : factor all quantization code in ggml-quants Georgi Gerganov 2023-10-28 17:05:07 +03:00
  • 82a6646e02
    metal : try cwd for ggml-metal.metal if bundle lookup fails (#3793) b1440 Aarni Koskela 2023-10-28 15:43:01 +03:00
  • 6df45c1730
    Update ggml-metal.m Georgi Gerganov 2023-10-28 15:42:39 +03:00
  • ba231e8a6d
    issues : change label from bug to bug-unconfirmed (#3748) Georgi Gerganov 2023-10-28 15:25:33 +03:00
  • 8a2f2fea29
    convert : ignore tokens if their IDs are within [0, vocab_size) (#3831) Georgi Gerganov 2023-10-28 15:25:15 +03:00
  • de7e0912b6
    convert : ignore tokens if their IDs are within [0, vocab_size) apply-3585 Georgi Gerganov 2023-10-28 15:01:36 +03:00
  • bd6d9e2059
    llama : allow quantizing k-quants to fall back when tensor size incompatible (#3747) b1437 Kerfuffle 2023-10-28 05:54:24 -06:00
  • ee1a0ec9cb
    llama : add option for greedy sampling with probs (#3813) b1436 Georgi Gerganov 2023-10-28 14:23:11 +03:00
  • 20ef442c2a fixed for smartcontext Concedo 2023-10-28 19:09:22 +08:00
  • bbfc62ac2f
    sampling : temp == 0.0 -> no probs, temp < 0.0 -> probs sampling-greedy-with-probs Georgi Gerganov 2023-10-28 14:04:57 +03:00
  • c5c54d1057
    train : minor Georgi Gerganov 2023-10-28 13:54:46 +03:00
  • c86cca8061
    llama : add comment about llama_sample_token_greedy() missing probs Georgi Gerganov 2023-10-28 13:21:29 +03:00
  • 177461104b
    common : print that one line of the syntax help *also* to standard output (#3823) b1435 Henk Poley 2023-10-28 12:16:33 +02:00
  • 6cf2b4c73b MMQ optimizations (+1 squashed commits) Concedo 2023-10-28 17:35:42 +08:00
  • e374227221
    Revert "cuda : use CUBLAS_COMPUTE_16F for non-attention ops" Georgi Gerganov 2023-10-28 12:20:08 +03:00
  • fdee152e4e
    starcoder : add GPU offloading (#3827) b1434 Georgi Gerganov 2023-10-28 12:06:08 +03:00
  • 731dd98bb5
    starcoder : offload layers to GPU Georgi Gerganov 2023-10-28 11:20:49 +03:00
  • 2ea3b567cf Merge: Testing speed of tensor cores vs MMQ Concedo 2023-10-28 16:41:42 +08:00
  • 53ab0535f5
    starcoder : do not GPU split 1D bias tensors Georgi Gerganov 2023-10-28 11:04:41 +03:00
  • 2fa1137890 updated lite Concedo 2023-10-28 14:43:15 +08:00
  • 09c74ea046 include content-length Concedo 2023-10-28 14:24:37 +08:00
  • 64f3bc5168 update model string (+1 squashed commits) Concedo 2023-10-28 14:05:53 +08:00
  • 879d1ba268 simplify colab dropdowns (+1 squashed commits) Concedo 2023-10-28 13:33:27 +08:00
  • eb9a93097b
    Colab Improvements (#498) Pyroserenus 2023-10-28 01:26:59 -04:00
  • 15f525c580 revamped smart context for llama models Concedo 2023-10-28 12:59:08 +08:00
  • ecd38b58f2
    Print that one line of the syntax help *also* to standard output Henk Poley 2023-10-28 04:50:52 +02:00
  • 41aee4df82
    speculative : ensure draft and target model vocab matches (#3812) b1433 Kerfuffle 2023-10-27 15:40:07 -06:00
  • 6d459cbfbe
    llama : correctly report GGUFv3 format (#3818) b1432 cebtenzzre 2023-10-27 17:33:53 -04:00
  • 880780080b
    flake : use even smaller version of torch Green Sky 2023-10-27 22:35:05 +02:00
  • cd3e20fb50
    cuda : fix multi-gpu with tensor cores cuda-multi-gpu Georgi Gerganov 2023-10-27 23:11:50 +03:00
  • 706ff4c2e0
    cuda : try to fix main device write Georgi Gerganov 2023-10-27 22:17:47 +03:00
  • 0f2498f25d
    cuda : use CUBLAS_COMPUTE_16F for non-attention ops Georgi Gerganov 2023-10-27 20:15:21 +03:00
  • d055fed8e5 llama : correctly report GGUFv3 format cebtenzzre 2023-10-27 13:10:18 -04:00
  • 3b9ea655d4
    cuda : use CUBLAS_COMPUTE_32F to speed-up and avoid dst cpy Georgi Gerganov 2023-10-27 18:13:54 +03:00
  • c8d6a1f34a
    simple : fix batch handling (#3803) b1431 Thibault Terrasson 2023-10-27 16:37:41 +02:00
  • 41f5d2acdf Tolerate small differences when checking dft vs tgt vocab KerfuffleV2 2023-10-27 08:15:00 -06:00
  • 1a0843c493
    cuda : utilize tensor cores with multiple GPU devices Georgi Gerganov 2023-10-27 13:05:33 +03:00
  • 2f9ec7e271
    cuda : improve text-generation and batched decoding performance (#3776) b1430 Georgi Gerganov 2023-10-27 17:01:23 +03:00
  • 4059df1470 quantizing: Add warning when tensors were incompatible with k-quants KerfuffleV2 2023-10-27 07:46:33 -06:00
  • 7f20d78e7e Allow quantizing k-quants to fall back when tensor size incompatible KerfuffleV2 2023-10-23 09:24:24 -06:00
  • 4aa1fb0d38
    llama : add option for greedy sampling with probs Georgi Gerganov 2023-10-27 16:12:01 +03:00
  • a0897651a2 speculative: Ensure draft and target model vocab matches KerfuffleV2 2023-10-27 06:41:14 -06:00
  • 49af767fad
    build : add compile option to force use of MMQ kernels cuda-quantum-batch Georgi Gerganov 2023-10-27 13:21:04 +03:00
  • c2f675133d support for abort without crash on disconnect Concedo 2023-10-27 15:27:17 +08:00
  • 22201248a0 Remove comments Galunid 2023-10-27 02:05:27 +02:00
  • 5fc96c3630
    simple : fix batch handling Thibault Terrasson 2023-10-26 23:37:12 +02:00
  • 34b2a5e1ee
    server : do not release slot on image input (#3798) b1429 Georgi Gerganov 2023-10-26 22:53:37 +03:00
  • aed05e5565 todo: troubleshoot sse with multiuser Concedo 2023-10-27 00:21:52 +08:00
  • f344a99425 causallm is not working well on clblast, running out of mem wth blas. this helps a bit but doesnt fix the problem. Concedo 2023-10-26 23:36:35 +08:00
  • 49b89d1682
    flake : update flake.lock for newer transformers version + provide extra dev shell with torch and transformers (for most convert-xxx.py scripts) Green Sky 2023-10-26 16:08:48 +02:00
  • 0f46534866 wip Concedo 2023-10-26 21:58:51 +08:00
  • 4823b9bdcb Initial generic convert script Galunid 2023-10-26 13:08:41 +02:00
  • a4e15a36e4
    cuda : add CUDA_USE_TENSOR_CORES and GGML_CUDA_FORCE_MMQ macros Georgi Gerganov 2023-10-25 18:48:36 +03:00
  • a00bb06c43 Make convert script with pytorch files Galunid 2023-10-26 12:40:15 +02:00
  • 00ae2aa3c0 Try cwd for ggml-metal if bundle lookup fails Aarni Koskela 2023-10-26 13:34:00 +03:00
  • 1aef336f16
    server: disable llm logs if SERVER_VERBOSE is off Olexiy Buyanskyy 2023-10-25 18:52:39 +03:00
  • c3b7c0e19c
    Apply suggestions from code review staviq 2023-10-25 23:51:49 +00:00
  • 1fcb8139f6
    Update common/log.h staviq 2023-10-25 23:50:16 +00:00
  • 8eda531b63
    Update common/log.h staviq 2023-10-25 23:35:18 +00:00
  • 108b23f748 impl --log-new, --log-append staviq 2023-10-25 23:36:12 +02:00
  • d130fe6d6b Merge remote-tracking branch 'origin/master' into vulkan 0cc4m 2023-10-25 18:27:24 +02:00
  • 0230981649 Clean up unused functions 0cc4m 2023-10-25 18:27:10 +02:00
  • 5db89b90b7 Merge branch 'master' into concedo_experimental Concedo 2023-10-25 23:58:15 +08:00
  • 4c6744b526
    cuda : remove duplicated cuBLAS GEMM code Georgi Gerganov 2023-10-25 18:25:13 +03:00
  • 98d1dba256 tighten timings Concedo 2023-10-25 20:44:20 +08:00
  • e543c420ad amx isa abhilash1910 2023-10-25 05:29:17 -07:00
  • a3c28439d3
    cuda : fine-tune >= VOLTA params + use MMQ only for small batches Georgi Gerganov 2023-10-25 15:07:34 +03:00
  • 16b60dd75c
    cuda : add F32 sgemm branch Georgi Gerganov 2023-10-25 14:00:21 +03:00
  • eb78e0a9a7
    fix: parameter changed Jinho Heo 2023-10-25 19:45:26 +09:00
  • 52af782608
    cuda : new cublas gemm branch for multi-batch quantized src0 Georgi Gerganov 2023-10-25 13:14:24 +03:00
  • 59d1232ea7
    cuda : prints wip Georgi Gerganov 2023-10-25 10:26:58 +03:00
  • 6961c4bd0b
    batched-bench : print params at start b1428 Georgi Gerganov 2023-10-25 10:26:27 +03:00
  • c9983a72d6 prevent lora with clblast Concedo 2023-10-25 15:18:03 +08:00
  • cc44877486
    log : disable pid in log filenames b1427 Georgi Gerganov 2023-10-25 10:09:16 +03:00
  • 30d1017021 update readme and colab (+1 squashed commits) Concedo 2023-10-25 14:25:54 +08:00
  • fe44ded01a finetune.sh: Add an optional LLAMA_TRAINING_DIR variable Andrew Godfrey 2023-10-24 19:31:52 -07:00
  • 6359c15174 finetune.sh: Add an optional LLAMA_MODEL_DIR variable Andrew Godfrey 2023-10-24 19:14:17 -07:00
  • 4d5ed8349d Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ntkv2 cebtenzzre 2023-10-24 16:20:03 -04:00
  • ad93962657
    server : add parameter -tb N, --threads-batch N (#3584) (#3768) b1426 cebtenzzre 2023-10-24 16:10:43 -04:00
  • 1717521cdb
    server : do not block system prompt update (#3767) b1425 Georgi Gerganov 2023-10-24 23:08:20 +03:00
  • 4b32c65d78
    server : minor Georgi Gerganov 2023-10-24 23:03:34 +03:00
  • 54f98316b4 server : add parameter -tb N, --threads-batch N (#3584) Michael Coppola 2023-10-11 15:42:22 -04:00
  • ee201791a1
    server : update state machine logic to process system prompts Georgi Gerganov 2023-10-24 22:36:47 +03:00
  • b2f7e04bd3
    sync : ggml (conv ops + cuda MSVC fixes) (#3765) b1424 Georgi Gerganov 2023-10-24 21:51:20 +03:00
  • 01be4169bf
    server : do not block system prompt update Georgi Gerganov 2023-10-24 21:45:11 +03:00
  • 58f8ddd0f5
    sync : ggml (conv ops + cuda MSVC fixes) Georgi Gerganov 2023-10-24 21:05:26 +03:00
  • abd21fc99f
    cmake : add missed dependencies (#3763) b1423 John Smith 2023-10-25 01:48:45 +08:00
  • 3d5e42ce15
    Add missed dependencies. John Smith 2023-10-25 01:21:59 +08:00
  • 839fc6dac8 handle freq_base_train Concedo 2023-10-24 23:44:22 +08:00
  • 81dabd8edd Tweak an error message Andrew Godfrey 2023-10-23 19:28:05 -07:00
  • 86ceda4275 Add "add_f16_f32_f32_cuda" Andrew Godfrey 2023-10-23 19:26:20 -07:00
  • 9587ab4c73 finetune.sh: Edit comments Andrew Godfrey 2023-10-23 19:15:40 -07:00
  • 7cbf5b282c Add an f16 case to ggml_add_cast_impl and llama_build_lora_finetune_graphs Andrew Godfrey 2023-10-23 18:31:06 -07:00