Expose generation timings from server & update completions.js (#2116)

* use javascript generators as much cleaner API

Also add ways to access completion as promise and EventSource

* export llama_timings as struct and expose them in server

* update readme, update baked includes

* llama : uniform variable names + struct init

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
Tobias Lütke 2023-07-05 16:51:13 -04:00 committed by GitHub
parent 983b555e9d
commit 31cfbb1013
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
9 changed files with 1921 additions and 1363 deletions

View file

@ -26,20 +26,17 @@ Command line options:
## Build
Build llama.cpp with server from repository root with either make or CMake.
server is build alongside everything else from the root of the project
- Using `make`:
```bash
LLAMA_BUILD_SERVER=1 make
make
```
- Using `CMake`:
```bash
mkdir build-server
cd build-server
cmake -DLLAMA_BUILD_SERVER=ON ..
cmake --build . --config Release
```
@ -208,24 +205,30 @@ openai.api_base = "http://<Your api-server IP>:port"
Then you can utilize llama.cpp as an OpenAI's **chat.completion** or **text_completion** API
### Extending the Web Front End
### Extending or building alternative Web Front End
The default location for the static files is `examples/server/public`. You can extend the front end by running the server binary with `--path` set to `./your-directory` and importing `/completion.js` to get access to the llamaComplete() method. A simple example is below:
The default location for the static files is `examples/server/public`. You can extend the front end by running the server binary with `--path` set to `./your-directory` and importing `/completion.js` to get access to the llamaComplete() method.
```
Read the documentation in `/completion.js` to see convenient ways to access llama.
A simple example is below:
```html
<html>
<body>
<pre>
<script type="module">
import { llamaComplete } from '/completion.js'
import { llama } from '/completion.js'
llamaComplete({
prompt: "### Instruction:\nWrite dad jokes, each one paragraph. You can use html formatting if needed.\n\n### Response:",
n_predict: 1024,
},
null,
(chunk) => document.write(chunk.data.content)
)
const prompt = `### Instruction:
Write dad jokes, each one paragraph.
You can use html formatting if needed.
### Response:`
for await (const chunk of llama(prompt)) {
document.write(chunk.data.content)
}
</script>
</pre>
</body>