metal : concurrently dispatch commands (#2358)

* metal: concurrently dispatch commands Function `ggml_metal_graph_find_concurrency` will run and write commands that can be issued concurrently to metal context `concur_list` array, when `ggml_metal_graph_compute` is called for the first time. * metal: don't call find_concurrency automatically. * metal : code style changes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-25 08:00:19 -04:00 · 2023-07-25 08:00:19 -04:00 · 1aa18ef994
commit 1aa18ef994
parent 9a08eaf3c4
3 changed files with 138 additions and 19 deletions
--- a/ggml-metal.h
+++ b/ggml-metal.h
@ -61,6 +61,13 @@ void ggml_metal_set_tensor(struct ggml_metal_context * ctx, struct ggml_tensor *
 // get data from the device into host memory
 void ggml_metal_get_tensor(struct ggml_metal_context * ctx, struct ggml_tensor * t);

+// try to find operations that can be run concurrently in the graph
+// you should run it again if the topology of your graph changes
+void ggml_metal_graph_find_concurrency(struct ggml_metal_context * ctx, struct ggml_cgraph * gf);
+
+// if the graph has been optimized for concurrently dispatch
+bool ggml_metal_if_optimized(struct ggml_metal_context * ctx);
+
 // same as ggml_graph_compute but uses Metal
 // creates gf->n_threads command buffers in parallel
 void ggml_metal_graph_compute(struct ggml_metal_context * ctx, struct ggml_cgraph * gf);