Commit graph

2301 commits

Author SHA1 Message Date
Pierrick HYMBERT
530d3ae4c4 server: tests: reducing sleep time during scenario 2024-02-23 02:38:54 +01:00
Pierrick HYMBERT
bedf37c9d1 server: tests: reducing n_ctx and n_predict for // prompts as it is too slow in the CI. 2024-02-23 02:38:37 +01:00
Pierrick HYMBERT
5110de08e3 server: tests: fix coloring console 2024-02-23 02:31:44 +01:00
Pierrick HYMBERT
6bba3be151 server: tests: ci adding psmisc as it is not present by default in ubuntu base killall 2024-02-23 02:31:30 +01:00
Pierrick HYMBERT
6e71126c12 server: tests: ci adding curl as it is not present by default in ubuntu base for the hf.sh script 2024-02-23 02:19:47 +01:00
Pierrick HYMBERT
d0e0050843 server: tests: ci adding python3-pip as it is not present by default in ubuntu base 2024-02-23 02:16:56 +01:00
Pierrick HYMBERT
2bb4732c01 server: tests: ci adding cmake as it is not present by default in ubuntu base 2024-02-23 02:13:30 +01:00
Pierrick HYMBERT
6a215e5359 server: tests: ci adding container to specify server port and allow the server to listen to 2024-02-23 02:11:56 +01:00
Pierrick HYMBERT
2f756f84df server: tests: allow to override the server port before launching tests 2024-02-23 01:59:29 +01:00
Pierrick HYMBERT
70e90558ae server: tests: add log in server start to identify why the server does not listen on the CI 2024-02-23 01:46:08 +01:00
Pierrick HYMBERT
b38b9e60a1 server: tests: minor fix server --alias param passed twice 2024-02-23 01:31:56 +01:00
Pierrick HYMBERT
14b6ede152 server: tests: minor color change 2024-02-23 01:29:39 +01:00
Pierrick HYMBERT
1bd07e56c4 server: tests: assert embeddings are actually computed, make the embeddings endpoint configurable.
Add logs to investigate why the CI server test job is not starting
2024-02-23 01:25:08 +01:00
Pierrick HYMBERT
cba6d4ea17 server: tests: minor fix missing param. 2024-02-23 00:54:44 +01:00
Pierrick HYMBERT
51f527440a server: tests: ci triggered on any changes on server example path 2024-02-23 00:37:42 +01:00
Pierrick HYMBERT
26b66c5496 server: tests: Fix some random behavior where the wait for busy status is missing 2024-02-22 23:38:47 +01:00
Pierrick HYMBERT
aa591ef12d server: tests: add Multi users with total number of tokens to predict exceeds the KV Cache size 2024-02-22 23:37:56 +01:00
Pierrick HYMBERT
f820e10fa7 server: tests: ci ensure the server is stopped before scenario, and do not quit while the server is listening 2024-02-22 23:18:42 +01:00
Pierrick HYMBERT
8b96bdaf08 Merge remote-tracking branch 'origin/master' into test/server-add-ci-test 2024-02-22 22:11:36 +01:00
Pierrick HYMBERT
597c181abb server: tests: ci do not take a model anymore, fix trigger patch 2024-02-22 21:58:28 +01:00
Pierrick HYMBERT
e43406e36d server: tests: switch to asyncio for concurrent tests, match result content with regex 2024-02-22 21:55:40 +01:00
Pierrick HYMBERT
016b221549 server: fix health/slots endpoint slot state access available race condition 2024-02-22 21:55:18 +01:00
Someone
201294ae17
nix: init singularity and docker images (#5056)
Exposes a few attributes demonstrating how to build [singularity](https://docs.sylabs.io/guides/latest/user-guide/)/[apptainer](https://apptainer.org/) and Docker images re-using llama.cpp's Nix expression.

Built locally on `x86_64-linux` with `nix build github:someoneserge/llama.cpp/feat/nix/images#llamaPackages.{docker,docker-min,sif,llama-cpp}` and it's fast and effective.
2024-02-22 11:44:10 -08:00
Georgi Gerganov
5a9e2f60ba
py : minor fixes (#5668) 2024-02-22 20:13:25 +02:00
Xuan Son Nguyen
373ee3fbba
Add Gemma chat template (#5665)
* add gemma chat template

* gemma: only apply system_prompt on non-model message
2024-02-22 19:10:21 +01:00
Someone
4cb4d8b22d
workflows: nix: hardcode cachix ids, build unconditionally (#5663)
GitHub does not expose environment and repository variables to PRs coming from forks implies that we've been disabling the Nix CI actions for most PRs. 

The `if:` also didn't make much sense, because we can always pull from cachix, and there's no point (albeit no risk either) in pushing cache for the untrusted code.
2024-02-22 08:32:09 -08:00
Georgi Gerganov
3a03541ced
minor : fix trailing whitespace (#5638) 2024-02-22 13:54:03 +02:00
Georgi Gerganov
41676d9920
ci : actually no reason to exclude GPU code from triggers 2024-02-22 13:33:00 +02:00
Georgi Gerganov
a697cd1314
minor : fix missing new line 2024-02-22 13:29:20 +02:00
Georgi Gerganov
56d03d92be
readme : update hot topics 2024-02-22 10:35:54 +02:00
Xuan Son Nguyen
a46f50747b
server : fallback to chatml, add AlphaMonarch chat template (#5628)
* server: fallback to chatml

* add new chat template

* server: add AlphaMonarch to test chat template

* server: only check model template if there is no custom tmpl

* remove TODO
2024-02-22 10:33:24 +02:00
Alexey Parfenov
c5688c6250
server : clarify some params in the docs (#5640) 2024-02-22 10:27:32 +02:00
Dat Quoc Nguyen
4ef245a92a
mpt : add optional bias tensors (#5638)
Update for MPT with optional bias parameters: to work with PhoGPT and SEA-LION models that were pre-trained with 'bias'.
2024-02-22 10:15:13 +02:00
slaren
973053d8b0
llama : fix loading models with shared tok_embd and output (#5651)
ggml-ci
2024-02-22 00:42:09 +01:00
Xuan Son Nguyen
7c8bcc11dc
Add docs for llama_chat_apply_template (#5645)
* add docs for llama_chat_apply_template

* fix typo
2024-02-22 00:31:00 +01:00
Pierrick HYMBERT
534998dbb9 server: tests: ci tests.sh exit code 2024-02-21 23:06:20 +01:00
slaren
7fe4678b02
llama : fix session save/load with quantized KV (#5649) 2024-02-21 22:52:39 +01:00
Pierrick HYMBERT
01cca6625b server: tests: ci fix model download path 2024-02-21 22:43:39 +01:00
slaren
ba2135ccae
gemma : allow offloading the output tensor (#5646) 2024-02-21 22:18:23 +01:00
Pierrick HYMBERT
6406208174 server: tests:
* start the server at each scenario
  * split the features as each requires different server config
2024-02-21 22:13:37 +01:00
Pierrick HYMBERT
68b8d4eb55 Merge remote-tracking branch 'origin/master' into test/server-add-ci-test 2024-02-21 18:41:14 +01:00
Pierrick HYMBERT
600cbeb7eb server: test: ci change the GitHub workflow trigger 2024-02-21 18:35:21 +01:00
Jared Van Bortel
89febfed93
examples : do not assume BOS when shifting context (#5622) 2024-02-21 10:33:54 -05:00
Georgi Gerganov
5022cf242d
sync : ggml 2024-02-21 16:52:52 +02:00
Pierrick Hymbert
1ecea255eb
server: health: fix race condition on slots data using tasks queue (#5634)
* server: health: fix race condition on slots data using tasks queue

* server: health:
    * include_slots only if slots_endpoint
    * fix compile warning task.target_id not initialized.
2024-02-21 15:47:48 +01:00
Ettore Di Giacinto
a00a35cef9
readme : add LocalAI to the availables UI (#5629) 2024-02-21 16:39:10 +02:00
Georgi Gerganov
eccd7a26dd
sync : ggml (#5633)
* ggml : fix conv_2d batch mode (ggml/737)

Co-authored-by: bssrdf <bssrdf@gmail.com>

* ggml : compute forward no longer pass src tensors (ggml/729)

* sync : ggml

ggml-ci

---------

Co-authored-by: bssrdf <merlintiger@hotmail.com>
Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-02-21 16:17:10 +02:00
Georgi Gerganov
c14f72db9c
readme : update hot topics 2024-02-21 15:39:54 +02:00
Daniel Bevenius
cc6cac08e3
llava : add --skip-unknown to 1.6 convert.py (#5632)
This commit adds the `--skip-unknown` option to the convert.py script
and removes the saving of the updated checkpoints to avoid updating
possibly checked out files.

The motivation for this change is that this was done for 1.5
in Commit fc0c8d286a ("llava :
update surgery script to not remove tensors") and makes the examples
more consistent.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-21 15:36:57 +02:00
postmasters
580111d42b
llama : add gemma model (#5631)
There are couple things in this architecture:

1. Shared input and output embedding parameters.
2. Key length and value length are not derived from `n_embd`.

More information about the models can be found at
https://ai.google.dev/gemma. GGUFs can be downloaded from
https://huggingface.co/google.
2024-02-21 15:08:22 +02:00