metal : concurrently dispatch commands (#2358)

* metal: concurrently dispatch commands

Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.

* metal: don't call find_concurrency automatically.

* metal : code style changes

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
Shouzheng Liu 2023-07-25 08:00:19 -04:00 committed by GitHub
parent 9a08eaf3c4
commit 1aa18ef994
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
3 changed files with 138 additions and 19 deletions

View file

@ -1720,6 +1720,9 @@ static bool llama_eval_internal(
#ifdef GGML_USE_METAL
if (lctx.ctx_metal && N == 1) {
if (!ggml_metal_if_optimized(lctx.ctx_metal)) {
ggml_metal_graph_find_concurrency(lctx.ctx_metal,&gf);
}
ggml_metal_set_n_cb (lctx.ctx_metal, n_threads);
ggml_metal_graph_compute(lctx.ctx_metal, &gf);
ggml_metal_get_tensor (lctx.ctx_metal, cur);