metal: only encode in one command buffer
It's advised a program should only have one command buffer. This slow inference by ~1 ms on 33B model, but we may avoid it by reusing previous command queue.
This commit is contained in:
parent
d924522a46
commit
c8e6ef1846
1 changed files with 517 additions and 548 deletions
1065
ggml-metal.m
1065
ggml-metal.m
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue