Commit graph

  • 15305ad143
    how to add clblast that is avalible in the fedora repos Mohammadreza Hendiani 2024-04-20 16:40:04 +03:30
  • ae171cc1af
    added fedora to list of distros that may need the package (the packages have the same name on Fedora) Mohammadreza Hendiani 2024-04-20 16:37:21 +03:30
  • efbcdc1caf Common:ChatOn: ReversePrompts, SingleMsgChatTemplate wrapper HanishKVC 2024-04-20 11:58:15 +05:30
  • cf025dfbdb
    added numa dependencies as to use numa in numactl Mohammadreza Hendiani 2024-04-20 16:24:19 +03:30
  • 4bd26644bf llama: finally move the string KV override value to the stack Pierrick HYMBERT 2024-04-20 13:24:58 +02:00
  • aed82f6837
    common : try to fix Android CI (#6780) b2700 Georgi Gerganov 2024-04-20 13:27:12 +03:00
  • cc4efc5e4f
    common : another try Georgi Gerganov 2024-04-20 13:01:35 +03:00
  • 0a86385703
    common : disable get_math_cpu_count() until Android CI gets fixed Georgi Gerganov 2024-04-20 11:20:07 +03:00
  • 1e915d795f ci: fix job are cancelling each other Pierrick HYMBERT 2024-04-20 10:29:33 +02:00
  • 2606bc97bf Merge remote-tracking branch 'refs/remotes/origin/master' into hp/quantize/imatrix-metadata Pierrick HYMBERT 2024-04-20 10:20:56 +02:00
  • aa0e28f8fc common: add llama_model_kv_override_free common: free kv override if used after model loading Pierrick HYMBERT 2024-04-20 10:17:03 +02:00
  • a2410b6cb2
    fix multiple tokens warning Sigbjørn Skjæret 2024-04-20 09:34:01 +02:00
  • 942f023930
    Fixes unhandled status ready with default: switch ManniX-ITA 2024-04-20 09:05:57 +02:00
  • c4e6f6f4b9
    flake-- Sigbjørn Skjæret 2024-04-20 08:48:39 +02:00
  • 8d36967a2c
    improve help text Sigbjørn Skjæret 2024-04-20 08:45:33 +02:00
  • 9e4968cf67
    Add special token modification capability Sigbjørn Skjæret 2024-04-20 08:33:54 +02:00
  • db6f775c93 Common:ChatOn: Add arguments for chaton HanishKVC 2024-04-20 11:44:15 +05:30
  • 2ecc2ae900 grammars: update performance gotchas w/ repetition advice ochafik 2024-04-20 01:25:58 +01:00
  • 93b754ec5c json: use new GBNF repetitions{m,n} syntax ochafik 2024-04-20 00:44:11 +01:00
  • a06753581c Reverted last change of adding the end_of_text stop word for llama 3 Wouter Tichelaar 2024-04-20 01:41:09 +02:00
  • 15585e0f20 grammars: update reps parsing to bring ? / * / + closer to before ochafik 2024-04-19 22:18:59 +01:00
  • f3105b9eec
    Accept suggestion Pedro Cuenca 2024-04-19 22:12:20 +02:00
  • 77a1303e2d Added <|end_of_text|> as another stop token Wouter Tichelaar 2024-04-19 22:10:54 +02:00
  • 649b730bc7 Merge remote-tracking branch 'origin/master' into generate-assets ochafik 2024-04-19 20:56:15 +01:00
  • 5c4a94803d build: hex dump assets at cmake build time (not config time) ochafik 2024-04-19 20:51:13 +01:00
  • 6c257f4709 server: include prompt tokens in the EOS limit Pierrick HYMBERT 2024-04-19 21:01:44 +02:00
  • 0d3eca6920 minor: spaces Pierrick HYMBERT 2024-04-19 20:54:39 +02:00
  • 836c97c094
    Update tests/test-chat-template.cpp Wouter 2024-04-19 20:43:45 +02:00
  • ac6ae5daca Fix flash-attn for AMD Johannes Gäßler 2024-04-19 19:50:17 +02:00
  • 749cdb9c0f
    Update tests/test-chat-template.cpp Wouter 2024-04-19 19:15:10 +02:00
  • 0e4802b2ec
    ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748) b2699 loonerin 2024-04-19 13:03:35 -04:00
  • 61b483d3a8
    Update server.cpp after code review ManniX-ITA 2024-04-19 19:03:23 +02:00
  • 871fcb6e10
    ggml : fix soft_max with bias on CPU Georgi Gerganov 2024-04-19 18:03:56 +03:00
  • 3badef1fe1
    ggml : fix avx512 const correctness Georgi Gerganov 2024-04-19 17:45:08 +03:00
  • 52945429eb
    tests : remove benchmarks Georgi Gerganov 2024-04-19 17:38:28 +03:00
  • 29f6ad8d95
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-19 17:30:09 +03:00
  • bc346166f9
    metal : minor Georgi Gerganov 2024-04-19 17:24:52 +03:00
  • 84158931cf Add comments in Intel GPU linux Anas Ahouzi 2024-04-19 06:40:14 -07:00
  • 330ee579c6
    Recommended build instruction Anas Ahouzi 2024-04-19 15:20:10 +02:00
  • 62d80e271b
    Recommended build instruction Anas Ahouzi 2024-04-19 15:19:56 +02:00
  • 9a426b6a0b
    Recommended build instruction Anas Ahouzi 2024-04-19 15:19:35 +02:00
  • f1571c96fc Add backdoor to ggml to use DirectStorage to load tensors. Markus Tavenrath 2024-04-19 15:07:32 +02:00
  • 8cf382b51c Fix typo Anas Ahouzi 2024-04-19 05:57:13 -07:00
  • e1f6992d7c Fix FP32/FP16 build instructions Anas Ahouzi 2024-04-19 05:52:58 -07:00
  • 1a88565b44
    metal : clean-up kernel code Georgi Gerganov 2024-04-19 15:52:49 +03:00
  • 97eaece7d6
    metal : clean-up Georgi Gerganov 2024-04-19 15:30:27 +03:00
  • cec409aa98 DRAFT: Introduction of CUDA Graphs to LLama.cpp Alan Gray 2024-04-19 05:09:03 -07:00
  • 03f458c6f4 minor: spaces Pierrick HYMBERT 2024-04-19 13:59:36 +02:00
  • 1423fcece8 server: infinite loop, move in process_token server: infinite loop: set stop limit to true Pierrick HYMBERT 2024-04-19 13:58:11 +02:00
  • 5d64ffd837 server: fix infinite loop Pierrick HYMBERT 2024-04-19 13:33:16 +02:00
  • 703c6e6528
    ggml : fix arm fp16 store on windows Georgi Gerganov 2024-04-19 14:20:41 +03:00
  • 558f69083a Merge remote-tracking branch 'refs/remotes/origin/master' into hp/server/avoid-infinite-loop Pierrick HYMBERT 2024-04-19 13:19:56 +02:00
  • 637e9a86c2
    server: static: upstream upgrade (#6765) b2698 Pierrick Hymbert 2024-04-19 13:19:01 +02:00
  • 82e4187f95 llama: add llama_model_kv_override_free Pierrick HYMBERT 2024-04-19 13:16:42 +02:00
  • e32b281743
    llama : adapt build_olmo to changes Georgi Gerganov 2024-04-19 14:04:56 +03:00
  • 1db66c1dac
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-19 14:03:55 +03:00
  • 74d57f9513
    llama : simplify llama_build_kv_store Georgi Gerganov 2024-04-19 13:49:57 +03:00
  • ea0ad80a4f Merge remote-tracking branch 'refs/remotes/origin/master' into hp/quantize/imatrix-metadata Pierrick HYMBERT 2024-04-19 12:45:08 +02:00
  • 24e2a285fa server: static: upstream upgrade Pierrick HYMBERT 2024-04-19 12:06:04 +02:00
  • 373bab1bd7 Removed bos token from expected output from llama-3 Wouter Tichelaar 2024-04-19 11:45:38 +02:00
  • 9958c81b79
    Implement the OLMo architecture (#6741) b2697 nopperl 2024-04-19 09:35:54 +00:00
  • a43eb826eb fix load_params JustinLin610 2024-04-19 17:05:51 +08:00
  • 5622e3aa22 gguf-py: Add IQ1_M to GGML_QUANT_SIZES Piotr Myśliński 2024-04-19 06:59:07 +02:00
  • b79a41ee6a feat : Adding hf_token declaration and user input support Sourabrata Bose 2024-04-19 13:53:19 +05:30
  • a55d8a9348 Removed adding of BOS token before first message Wouter Tichelaar 2024-04-19 09:58:21 +02:00
  • a71963d1a3 remove obsolete parameter name filter nopperl 2024-04-19 09:50:10 +02:00
  • 8b1b1f4982
    train : add general name (#6752) b2696 Austin 2024-04-19 03:16:45 -04:00
  • 7370d663a3 Added EOS stop sequence according to https://github.com/ggerganov/llama.cpp/pull/6751#issuecomment-2065602862 Wouter Tichelaar 2024-04-19 08:30:36 +02:00
  • c9373e7efa
    Merge branch 'ggerganov:master' into master Sourabrata Bose 2024-04-19 11:31:29 +05:30
  • 2854d26646 Merge remote-tracking branch 'upstream/master' into support_codeqwen JustinLin610 2024-04-19 12:23:36 +08:00
  • bca40e9814
    fix wrong parameter in cmd in readme-sycl.md (#6755) Neo Zhang 2024-04-19 09:16:31 +08:00
  • 3594c71587 fix wrong parameter in cmd in readme-sycl.md jianyuzh 2024-04-19 08:33:37 +08:00
  • c0c95edc89 rebase, handle sometimes smaller embd & new type Matt Grosso 2024-04-18 17:00:46 -07:00
  • bb252102c9
    Merge branch 'master' into add-general-name-to-train teleprint-me 2024-04-18 19:38:58 -04:00
  • 27d6f84fa7
    train: Add 'general.name' to model metadata teleprint-me 2024-04-18 19:38:25 -04:00
  • bf63ff5f29
    Update tests/test-chat-template.cpp Wouter 2024-04-19 00:28:19 +02:00
  • 70eb88c842
    Update llama.cpp Wouter 2024-04-19 00:27:16 +02:00
  • 24874e7323
    Update llama.cpp Wouter 2024-04-19 00:27:03 +02:00
  • 1f7945f61f Added llama-3 chat template Wouter Tichelaar 2024-04-18 23:49:01 +02:00
  • f6c99ce011 fix clamp_kqv setting nopperl 2024-04-18 23:39:18 +02:00
  • b9613ef11a
    Removed leftover ManniX-ITA 2024-04-18 22:14:33 +02:00
  • cc32c73926 clarified comment nopperl 2024-04-18 21:45:22 +02:00
  • 5993a97eb8 remove superfluous moe, bias and rope tensors nopperl 2024-04-18 21:34:46 +02:00
  • 9855bb6d6c remove check for weight nopperl 2024-04-18 20:57:10 +02:00
  • 9ca869876e
    batched-bench : add fattn arg Georgi Gerganov 2024-04-18 21:41:32 +03:00
  • c16a7c2688
    metal : use F32 attention accumulators Georgi Gerganov 2024-04-18 20:08:52 +03:00
  • 7c693493ea ci: fix missing build numbers loonerin 2024-04-18 13:51:53 -04:00
  • fe77f6f70e ci: add ubuntu latest release loonerin 2024-04-18 10:44:12 -04:00
  • 112c4c4e9b style Pedro Cuenca 2024-04-18 18:46:03 +02:00
  • d79ab101c3 Support Llama 3 conversion Pedro Cuenca 2024-04-18 18:38:05 +02:00
  • 52a4d59747
    Moved endpoints registration before listener and fixes ManniX-ITA 2024-04-18 18:14:29 +02:00
  • 4de4670c83
    Merge branch 'ggerganov:master' into mannix-server-startup ManniX-ITA 2024-04-18 18:11:04 +02:00
  • 79bbf42495 Add test script z5269887 2024-04-18 22:21:05 +08:00
  • 5c82dae689 remove unused moe branch nopperl 2024-04-18 15:41:00 +02:00
  • 0d56246f4b
    ggml : group all experts in a single ggml_mul_mat_id (#6505) b2694 slaren 2024-04-18 15:18:48 +02:00
  • ba5b5467d1 llama : disable moe offloading with SYCL slaren 2024-04-18 15:16:43 +02:00
  • d30402712b remove unused variable nopperl 2024-04-18 15:07:36 +02:00
  • c13b140264
    Merge 48fbf8ca1a into 03c0946d73 Yui 2024-04-18 15:00:11 +02:00
  • bd17f27ce2 test-backend-ops : only run all mul mat tests for base types slaren 2024-04-18 14:49:02 +02:00
  • 29a767048f implement olmo architecture nopperl 2024-04-18 14:48:08 +02:00