llama : add reranking support (#9510)

* py : add XLMRobertaForSequenceClassification [no ci]

* py : fix scalar-tensor conversion [no ci]

* py : fix position embeddings chop [no ci]

* llama : read new cls tensors [no ci]

* llama : add classigication head (wip) [no ci]

* llama : add "rank" pooling type

ggml-ci

* server : add rerank endpoint

ggml-ci

* llama : aboud ggml_repeat during classification

* rerank : cleanup + comments

* server : accept /rerank endpoint in addition to /v1/rerank [no ci]

* embedding : parse special tokens

* jina : support v1 reranker

* vocab : minor style

ggml-ci

* server : initiate tests for later

ggml-ci

* server : add docs

* llama : add comment [no ci]

* llama : fix uninitialized tensors

* ci : add rerank tests

ggml-ci

* add reranking test

* change test data

* Update examples/server/server.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* add `--reranking` argument

* update server docs

* llama : fix comment [no ci]

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

This commit is contained in:

Georgi Gerganov

2024-09-28 17:42:03 +03:00

• committed by

GitHub

parent 1b2f992cd2

commit f4d2b8846a

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

18 changed files with 602 additions and 56 deletions

									
										2

examples/server/tests/features/embeddings.feature
									
										View file
										
				@ -15,7 +15,7 @@ Feature: llama.cpp server

				    And   128 as batch size

				    And   128 as ubatch size

				    And   512 KV cache size

				    And   embeddings extraction

				    And   enable embeddings endpoint

				    Then  the server is starting

				    Then  the server is healthy

Rows
Columns

llama : add reranking support (#9510)

2 examples/server/tests/features/embeddings.feature Unescape Escape View file

2

examples/server/tests/features/embeddings.feature

View file