llama : llama_perf + option to disable timings during decode (#9355)

* llama : llama_perf + option to disable timings during decode

ggml-ci

* common : add llama_arg

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* perf : separate functions in the API

ggml-ci

* perf : safer pointer handling + naming update

ggml-ci

* minor : better local var name

* perf : abort on invalid sampler pointer

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
This commit is contained in:
Georgi Gerganov 2024-09-13 09:53:38 +03:00 committed by GitHub
parent bd35cb0ae3
commit 0abc6a2c25
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
23 changed files with 135 additions and 91 deletions

View file

@ -1630,7 +1630,7 @@ int main(int argc, char ** argv) {
fflush(p_err->fout);
}
llama_perf_print(ctx, LLAMA_PERF_TYPE_CONTEXT);
llama_perf_context_print(ctx);
llama_free(ctx);