Commit graph

  • 9fe94ccac9
    docker : build images only once (#9225) slaren 2024-08-28 17:28:00 +02:00
  • 4de6f2b606 docker : build images only once slaren 2024-08-28 16:41:47 +02:00
  • a7618821d5 py: let users add full base model and dataset to model_card brian khuu 2024-08-06 00:42:27 +10:00
  • d32c74d1f2 py: Detailed datasets metadata in gguf kv store brian khuu 2024-08-28 21:37:38 +10:00
  • 48507397d0
    nix: fix: mpi-cuda should include cuda support... Someone Serge 2024-08-28 11:36:32 +00:00
  • 23d8f65ac0
    nix: fix regression: asking for deprecated autoAddOpenGLRunpath Someone Serge 2024-08-28 11:29:28 +00:00
  • 66b039a501
    docker : update CUDA images (#9213) slaren 2024-08-28 13:20:36 +02:00
  • 7a383dbe65 add comment Xuan Son Nguyen 2024-08-28 11:46:16 +02:00
  • 6aaa183ee1 devops : only build specific targets for full image Xuan Son Nguyen 2024-08-28 11:41:15 +02:00
  • 41ad342b69 add log in cli caitianchi 2024-08-28 17:24:57 +08:00
  • fa0c2bdc45 clarify comment Xuan Son Nguyen 2024-08-28 11:24:09 +02:00
  • 8e8f8ce42d threadpool: improve setprioty error message Max Krasnyansky 2024-08-27 22:36:01 -07:00
  • bead7d47fb threadpool: minor indent fixes Max Krasnyansky 2024-08-27 22:33:03 -07:00
  • 8cad654cbd docker : update CUDA images slaren 2024-08-28 04:51:55 +02:00
  • 7444046c47 llama: rwkv6: Apply code format changes Molly Sophia 2024-08-26 09:52:11 +08:00
  • 5f00c52be0 llama: rwkv6: Remove unused nodes Molly Sophia 2024-08-26 09:50:51 +08:00
  • e0ea51144e llama: rwkv6: Keep `time_mix_w1/w2` as F32 Molly Sophia 2024-08-26 09:32:16 +08:00
  • 601b5920c6 converter: Match `new_name instead of name` for float32 explicit tensors Molly Sophia 2024-08-26 09:31:21 +08:00
  • 6d69fd77b1 llama: rwkv6: Add kv `time_mix_extra_dim and time_decay_extra_dim` Molly Sophia 2024-08-25 16:26:57 +08:00
  • c414a24a5a llama: rwkv6: Make use of key `feed_forward_length` Molly Sophia 2024-08-25 16:16:29 +08:00
  • 87a29014a4 converter: Use class name `Rwkv6Model` Molly Sophia 2024-08-25 15:56:43 +08:00
  • 7756afd8dd llama: rwkv6: Apply code style and misc changes Molly Sophia 2024-08-25 15:48:35 +08:00
  • e94778ade0 llama: rwkv6: Use `ggml_norm instead of ggml_group_norm` Molly Sophia 2024-08-25 12:36:29 +08:00
  • 57decb4a38 Update src/llama.cpp Molly Sophia 2024-08-25 12:10:02 +08:00
  • f5d955d2fe llama: rwkv6: Use the new advanced batch splits Molly Sophia 2024-08-23 10:14:35 +08:00
  • 6da6aa48b0 llama: rwkv6: Add quantization tensor exclusion Molly Sophia 2024-08-13 18:31:25 +08:00
  • c165e34629 llama: rwkv6: Clean up Molly Sophia 2024-08-13 17:46:29 +08:00
  • ee1b78c091 llama: rwkv6: Fix group_norm assertion failure with Metal Molly Sophia 2024-08-13 17:41:34 +08:00
  • 683d70cb68 llama: rwkv6: Fix tensor loading for 7B/14B models Molly Sophia 2024-08-13 17:06:07 +08:00
  • b0f4fe5279 llama: rwkv6: Detect model.type Molly Sophia 2024-08-13 17:01:44 +08:00
  • 276d53b18f build_rwkv6: Simplify graph Molly Sophia 2024-08-12 14:47:26 +08:00
  • 12fbe1ade2 Use MODEL_ARCH.RWKV6 instead of MODEL_ARCH.RWKV Molly Sophia 2024-08-12 14:30:04 +08:00
  • 5afa3eff3a Update convert_hf_to_gguf.py Molly Sophia 2024-08-12 14:16:02 +08:00
  • ae9936a80d Update convert_hf_to_gguf.py Molly Sophia 2024-08-12 14:14:56 +08:00
  • 8aa711ad98 ggml: Add backward computation for unary op `exp` Molly Sophia 2024-08-12 09:29:47 +08:00
  • c6955525b4 Update convert_hf_to_gguf.py Molly Sophia 2024-08-12 09:12:16 +08:00
  • 7f2e370fa2 convert_hf_to_gguf: rwkv tokenizer: Don't escape sequences manually Molly Sophia 2024-08-12 09:08:30 +08:00
  • 18decea3ed convert_hf_to_gguf: rwkv: Avoid using `eval` Molly Sophia 2024-08-11 12:19:45 +08:00
  • 8bc1f9ae80 build_rwkv: Avoid using inplace operations Molly Sophia 2024-08-11 12:06:16 +08:00
  • 6ae2f4866f Remove trailing whitespaces Molly Sophia 2024-08-11 10:13:33 +08:00
  • 01dcf4bb77 Fix parallel inferencing for RWKV Molly Sophia 2024-08-09 20:51:00 +08:00
  • 98ce5f43f0 Fix offloading layers to CUDA Molly Sophia 2024-08-07 16:40:41 +08:00
  • 903089b5eb Add `wkv.head_size` key for RWKV Molly Sophia 2024-08-07 10:35:40 +08:00
  • 8d498c7075 Add `rescale_every_n_layers` parameter Molly Sophia 2024-08-06 18:53:27 +08:00
  • 0784a0cf26 RWKV v6 graph building Molly Sophia 2024-08-02 13:58:34 +08:00
  • 5732de89b7 ggml: Add unary operator Exp Molly Sophia 2024-08-02 16:29:16 +08:00
  • 0e5ac349f8 Fix rwkv tokenizer Molly Sophia 2024-08-02 12:04:36 +08:00
  • a180b63b49 Load more tensors for rwkv v6 Molly Sophia 2024-08-01 21:45:02 +08:00
  • 700dad1b86 Fix build Molly Sophia 2024-08-01 12:51:29 +08:00
  • b3b17e05fe Add placeholder llm_build_time_mix Layl Bongers 2024-05-15 01:19:44 +02:00
  • 3cbeffc50f Add time mix output loading Layl Bongers 2024-05-13 14:39:50 +02:00
  • b409fd8e11 Add remaining time mix parameters Layl Bongers 2024-05-13 13:32:41 +02:00
  • dd3aa3d40e Add time mix KVRG & correct merge mistake Layl Bongers 2024-05-06 15:31:56 +02:00
  • 5479588569 Add rwkv5 layer norms Layl Bongers 2024-04-26 16:48:24 +02:00
  • 4e23d9715b Add logits conversion to rwkv5 Layl Bongers 2024-04-23 11:12:09 +02:00
  • a866789603 Add workaround for kv cache Layl Bongers 2024-04-19 10:06:00 +02:00
  • a0aae8d671 Add (broken) placeholder graph builder for RWKV Layl Bongers 2024-04-17 14:59:18 +02:00
  • e92c74f4a1 Fix model loading Layl Bongers 2024-04-15 12:05:47 +02:00
  • 7cac72a80b Do not use special tokens when matching in RWKV tokenizer Layl Bongers 2024-04-12 16:28:54 +02:00
  • 865167d01a Fix build Molly Sophia 2024-07-31 22:16:22 +08:00
  • dc0767f4b3 Add RWKV tokenization Layl Bongers 2024-04-04 13:59:37 +02:00
  • 8d2eca3507 convert_hf_to_gguf: Add support for RWKV v6 Molly Sophia 2024-07-31 16:05:23 +08:00
  • c6328bc0ad threadpool: futher api cleanup and prep for future refactoring Max Krasnyansky 2024-08-27 18:55:59 -07:00
  • 5999d6d06e fix conflicts pidack 2024-08-28 09:49:17 +08:00
  • e3c2202049 threadpool: move all pause/resume logic into ggml Max Krasnyansky 2024-08-27 13:19:45 -07:00
  • 536086e712
    server : fix crash when error handler dumps invalid utf-8 json (#9195) Jan Boon 2024-08-27 18:28:06 +08:00
  • 5d4c0a1327 threadpool: move process priority setting into the apps (bench and cli) Max Krasnyansky 2024-08-27 16:31:34 -07:00
  • 951f1d9053 Merge remote-tracking branch 'origin' into add-support-for-phi3-vision Andrei Betlen 2024-08-27 18:13:54 -04:00
  • dc0625ab8f Add support for Phi3-vision-instruct Andrei Betlen 2024-08-27 18:11:41 -04:00
  • 74342d48c2
    Fix for Debian CMake package creation Eugeniusz 2024-08-27 22:46:43 +02:00
  • 20f1789dfb vulkan : fix build (#0) b3639 Georgi Gerganov 2024-08-27 22:10:58 +03:00
  • 231cff5f6f sync : ggml Georgi Gerganov 2024-08-27 22:01:45 +03:00
  • bbbd58c74c
    vulkan : fix build (#0) Georgi Gerganov 2024-08-27 22:10:58 +03:00
  • e3c2dbbfc9
    sync : ggml Georgi Gerganov 2024-08-27 22:01:45 +03:00
  • 3bcc4dee9a llama-bench: add support for cool off between tests --delay Max Krasnyansky 2024-08-27 10:59:07 -07:00
  • 8d5ab9a58e llama-bench: turn threadpool params into vectors, add output headers, etc Max Krasnyansky 2024-08-26 17:07:36 -07:00
  • 658f16c330 threadpool: update calling thread prio and affinity only at start/resume Max Krasnyansky 2024-08-26 13:10:11 -07:00
  • 8186e9615f threadpool: avoid updating process priority on the platforms that do not require it Max Krasnyansky 2024-08-26 12:25:48 -07:00
  • a7496bf7e5 threadpool: don't forget to free workers state when omp is enabled Max Krasnyansky 2024-08-24 19:05:54 -07:00
  • 93f170d868 threadpool: enable openmp by default for now Max Krasnyansky 2024-08-24 18:26:37 -07:00
  • 204377a0a8 threadpool: update threadpool resume/pause function names Max Krasnyansky 2024-08-24 18:03:06 -07:00
  • 49ac51f2a3 threadpool: simplify threadpool init logic and fix main thread affinity application Max Krasnyansky 2024-08-24 17:35:34 -07:00
  • 8008463aee threadpool: replace checks for compute_thread ret code with proper status check Max Krasnyansky 2024-08-24 15:36:02 -07:00
  • c506d7fc46 threadpool: enable --cpu-mask and other threadpool related options only if threadpool is enabled Max Krasnyansky 2024-08-24 15:23:51 -07:00
  • f64c975168 threadpool: fix swift wrapper errors due to n_threads int type cleanup Max Krasnyansky 2024-08-24 15:07:54 -07:00
  • 40648601f1 threadpool: fix apply_priority() function name Max Krasnyansky 2024-08-24 14:15:22 -07:00
  • 31541d7427 threadpool: move typedef into ggml.h Max Krasnyansky 2024-08-24 13:55:58 -07:00
  • c4452edfea threadpool: add support for ggml_threadpool_params_default/init Max Krasnyansky 2024-08-24 12:12:48 -07:00
  • 4a4d71501b threadpool: consistent use of int type for n_threads params Max Krasnyansky 2024-08-24 10:50:06 -07:00
  • 2358bb364b threadpool: better naming for thread/cpumask releated functions Max Krasnyansky 2024-08-24 10:27:53 -07:00
  • 63a0dad83c threadpool: remove abort_callback from threadpool state Max Krasnyansky 2024-08-24 10:09:51 -07:00
  • 307fece5d7 threadpool: use relaxed order for chunk sync Max Krasnyansky 2024-08-20 18:43:39 -07:00
  • db45b6d3a9 threadpool: do not clear barrier counters between graphs computes (fixes race with small graphs) Max Krasnyansky 2024-08-15 16:20:42 -07:00
  • 538bd9f730 threadpool: remove special-casing for disposable threadpools Max Krasnyansky 2024-08-12 22:18:16 -07:00
  • 9d3e78c6b8 threadpool: reduce the number of barrier required Max Krasnyansky 2024-08-12 19:04:01 -07:00
  • b630acdb73 threadpool: add support for hybrid polling Max Krasnyansky 2024-08-11 11:20:32 -07:00
  • 494e27c793 threadpool: reduce pause/resume/wakeup overhead in common cases Max Krasnyansky 2024-08-10 16:12:06 -07:00
  • 48aa8eec07 threadpool: do not create two threadpools if their params are identical Max Krasnyansky 2024-08-08 16:26:49 -07:00
  • 2e18f0d4c9 fix potential race condition in check_for_work fmz 2024-08-08 05:59:20 -07:00
  • dfa63778bd threadpool: do not wakeup threads in already paused threadpool Max Krasnyansky 2024-08-07 23:08:31 -07:00