llama : support Llama 3 HF conversion (#6745)

* Support Llama 3 conversion The tokenizer is BPE. * style * Accept suggestion Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> * llama : add llama_token_is_eog() ggml-ci * llama : auto-detect more EOT tokens when missing in KV data * convert : replacing EOS token is a hack * llama : fix codegemma EOT token + add TODOs * llama : fix model type string for 8B model --------- Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-21 13:50:41 +02:00 · 2024-04-21 13:50:41 +02:00 · b97bc3966e
commit b97bc3966e
parent b8109bc013
20 changed files with 123 additions and 64 deletions
--- a/examples/server/utils.hpp
+++ b/examples/server/utils.hpp
@ -381,10 +381,6 @@ static json oaicompat_completion_params_parse(
    } else {
        llama_params["stop"] = json_value(body, "stop", json::array());
    }
-    // Some chat templates don't use EOS token to stop generation
-    // We must add their end sequences to list of stop words
-    llama_params["stop"].push_back("<|im_end|>"); // chatml
-    llama_params["stop"].push_back("<end_of_turn>"); // gemma

    // Handle "response_format" field
    if (body.contains("response_format")) {