server : output embeddings for all tokens when pooling = none (#10861)
* server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : update readme [no ci] * server : fix spacing [no ci] Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * server : be explicit about the pooling type in the tests ggml-ci * server : update /embeddings and /v1/embeddings endpoints ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * server : update readme ggml-ci * server : fixes * tests : update server tests ggml-ci * server : update readme [no ci] * server : remove rebase artifact --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
This commit is contained in:
parent
0e70ba686e
commit
152610eda9
8 changed files with 158 additions and 37 deletions
|
@ -763,6 +763,8 @@ curl http://localhost:8080/v1/chat/completions \
|
|||
|
||||
### POST `/v1/embeddings`: OpenAI-compatible embeddings API
|
||||
|
||||
This endpoint requires that the model uses a pooling different than type `none`. The embeddings are normalized using the Eucledian norm.
|
||||
|
||||
*Options:*
|
||||
|
||||
See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-reference/embeddings).
|
||||
|
@ -795,6 +797,46 @@ See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-r
|
|||
}'
|
||||
```
|
||||
|
||||
### POST `/embeddings`: non-OpenAI-compatible embeddings API
|
||||
|
||||
This endpoint supports all poolings, including `--pooling none`. When the pooling is `none`, the responses will contain the *unnormalized* embeddings for *all* input tokens. For all other pooling types, only the pooled embeddings are returned, normalized using Euclidian norm.
|
||||
|
||||
Note that the response format of this endpoint is different from `/v1/embeddings`.
|
||||
|
||||
*Options:*
|
||||
|
||||
Same as the `/v1/embeddings` endpoint.
|
||||
|
||||
*Examples:*
|
||||
|
||||
Same as the `/v1/embeddings` endpoint.
|
||||
|
||||
**Response format**
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"index": 0,
|
||||
"embedding": [
|
||||
[ ... embeddings for token 0 ... ],
|
||||
[ ... embeddings for token 1 ... ],
|
||||
[ ... ]
|
||||
[ ... embeddings for token N-1 ... ],
|
||||
]
|
||||
},
|
||||
...
|
||||
{
|
||||
"index": P,
|
||||
"embedding": [
|
||||
[ ... embeddings for token 0 ... ],
|
||||
[ ... embeddings for token 1 ... ],
|
||||
[ ... ]
|
||||
[ ... embeddings for token N-1 ... ],
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### GET `/slots`: Returns the current slots processing state
|
||||
|
||||
> [!WARNING]
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue