llama : support input embeddings directly (#1910)

* add interface for float input * fixed inpL shape and type * add examples of input floats * add test example for embd input * fixed sampling * add free for context * fixed add end condition for generating * add examples for llava.py * add READMD for llava.py * add READMD for llava.py * add example of PandaGPT * refactor the interface and fixed the styles * add cmake build for embd-input * add cmake build for embd-input * Add MiniGPT-4 example * change the order of the args of llama_eval_internal * fix ci error
2023-06-28 23:53:37 +08:00 · 2023-06-28 23:53:37 +08:00 · cfa0750bc9
commit cfa0750bc9
parent 9d23589d63
16 changed files with 811 additions and 22 deletions
--- a/llama.h
+++ b/llama.h
@ -226,6 +226,14 @@ extern "C" {
                             int   n_past,
                             int   n_threads);

+    // Same as llama_eval, but use float matrix input directly.
+    LLAMA_API int llama_eval_embd(
+            struct llama_context * ctx,
+                     const float * embd,
+                             int   n_tokens,
+                             int   n_past,
+                             int   n_threads);
+
    // Export a static computation graph for context of 511 and batch size of 1
    // NOTE: since this functionality is mostly for debugging and demonstration purposes, we hardcode these
    //       parameters here to keep things simple