Pierrick HYMBERT
5110de08e3
server: tests: fix coloring console
2024-02-23 02:31:44 +01:00
Pierrick HYMBERT
6bba3be151
server: tests: ci adding psmisc as it is not present by default in ubuntu base killall
2024-02-23 02:31:30 +01:00
Pierrick HYMBERT
6e71126c12
server: tests: ci adding curl as it is not present by default in ubuntu base for the hf.sh script
2024-02-23 02:19:47 +01:00
Pierrick HYMBERT
d0e0050843
server: tests: ci adding python3-pip as it is not present by default in ubuntu base
2024-02-23 02:16:56 +01:00
Pierrick HYMBERT
2bb4732c01
server: tests: ci adding cmake as it is not present by default in ubuntu base
2024-02-23 02:13:30 +01:00
Pierrick HYMBERT
6a215e5359
server: tests: ci adding container to specify server port and allow the server to listen to
2024-02-23 02:11:56 +01:00
Pierrick HYMBERT
2f756f84df
server: tests: allow to override the server port before launching tests
2024-02-23 01:59:29 +01:00
Pierrick HYMBERT
70e90558ae
server: tests: add log in server start to identify why the server does not listen on the CI
2024-02-23 01:46:08 +01:00
Pierrick HYMBERT
b38b9e60a1
server: tests: minor fix server --alias param passed twice
2024-02-23 01:31:56 +01:00
Pierrick HYMBERT
14b6ede152
server: tests: minor color change
2024-02-23 01:29:39 +01:00
Pierrick HYMBERT
1bd07e56c4
server: tests: assert embeddings are actually computed, make the embeddings endpoint configurable.
...
Add logs to investigate why the CI server test job is not starting
2024-02-23 01:25:08 +01:00
Pierrick HYMBERT
cba6d4ea17
server: tests: minor fix missing param.
2024-02-23 00:54:44 +01:00
Pierrick HYMBERT
51f527440a
server: tests: ci triggered on any changes on server example path
2024-02-23 00:37:42 +01:00
Pierrick HYMBERT
26b66c5496
server: tests: Fix some random behavior where the wait for busy status is missing
2024-02-22 23:38:47 +01:00
Pierrick HYMBERT
aa591ef12d
server: tests: add Multi users with total number of tokens to predict exceeds the KV Cache size
2024-02-22 23:37:56 +01:00
Pierrick HYMBERT
f820e10fa7
server: tests: ci ensure the server is stopped before scenario, and do not quit while the server is listening
2024-02-22 23:18:42 +01:00
Pierrick HYMBERT
8b96bdaf08
Merge remote-tracking branch 'origin/master' into test/server-add-ci-test
2024-02-22 22:11:36 +01:00
Pierrick HYMBERT
597c181abb
server: tests: ci do not take a model anymore, fix trigger patch
2024-02-22 21:58:28 +01:00
Pierrick HYMBERT
e43406e36d
server: tests: switch to asyncio for concurrent tests, match result content with regex
2024-02-22 21:55:40 +01:00
Pierrick HYMBERT
016b221549
server: fix health/slots endpoint slot state access available race condition
2024-02-22 21:55:18 +01:00
Someone
201294ae17
nix: init singularity and docker images ( #5056 )
...
Exposes a few attributes demonstrating how to build [singularity](https://docs.sylabs.io/guides/latest/user-guide/ )/[apptainer](https://apptainer.org/ ) and Docker images re-using llama.cpp's Nix expression.
Built locally on `x86_64-linux` with `nix build github:someoneserge/llama.cpp/feat/nix/images#llamaPackages.{docker,docker-min,sif,llama-cpp}` and it's fast and effective.
2024-02-22 11:44:10 -08:00
Georgi Gerganov
5a9e2f60ba
py : minor fixes ( #5668 )
2024-02-22 20:13:25 +02:00
Xuan Son Nguyen
373ee3fbba
Add Gemma chat template ( #5665 )
...
* add gemma chat template
* gemma: only apply system_prompt on non-model message
2024-02-22 19:10:21 +01:00
Someone
4cb4d8b22d
workflows: nix: hardcode cachix ids, build unconditionally ( #5663 )
...
GitHub does not expose environment and repository variables to PRs coming from forks implies that we've been disabling the Nix CI actions for most PRs.
The `if:` also didn't make much sense, because we can always pull from cachix, and there's no point (albeit no risk either) in pushing cache for the untrusted code.
2024-02-22 08:32:09 -08:00
Georgi Gerganov
3a03541ced
minor : fix trailing whitespace ( #5638 )
2024-02-22 13:54:03 +02:00
Georgi Gerganov
41676d9920
ci : actually no reason to exclude GPU code from triggers
2024-02-22 13:33:00 +02:00
Georgi Gerganov
a697cd1314
minor : fix missing new line
2024-02-22 13:29:20 +02:00
Georgi Gerganov
56d03d92be
readme : update hot topics
2024-02-22 10:35:54 +02:00
Xuan Son Nguyen
a46f50747b
server : fallback to chatml, add AlphaMonarch chat template ( #5628 )
...
* server: fallback to chatml
* add new chat template
* server: add AlphaMonarch to test chat template
* server: only check model template if there is no custom tmpl
* remove TODO
2024-02-22 10:33:24 +02:00
Alexey Parfenov
c5688c6250
server : clarify some params in the docs ( #5640 )
2024-02-22 10:27:32 +02:00
Dat Quoc Nguyen
4ef245a92a
mpt : add optional bias tensors ( #5638 )
...
Update for MPT with optional bias parameters: to work with PhoGPT and SEA-LION models that were pre-trained with 'bias'.
2024-02-22 10:15:13 +02:00
slaren
973053d8b0
llama : fix loading models with shared tok_embd and output ( #5651 )
...
ggml-ci
2024-02-22 00:42:09 +01:00
Xuan Son Nguyen
7c8bcc11dc
Add docs for llama_chat_apply_template ( #5645 )
...
* add docs for llama_chat_apply_template
* fix typo
2024-02-22 00:31:00 +01:00
Pierrick HYMBERT
534998dbb9
server: tests: ci tests.sh exit code
2024-02-21 23:06:20 +01:00
slaren
7fe4678b02
llama : fix session save/load with quantized KV ( #5649 )
2024-02-21 22:52:39 +01:00
Pierrick HYMBERT
01cca6625b
server: tests: ci fix model download path
2024-02-21 22:43:39 +01:00
slaren
ba2135ccae
gemma : allow offloading the output tensor ( #5646 )
2024-02-21 22:18:23 +01:00
Pierrick HYMBERT
6406208174
server: tests:
...
* start the server at each scenario
* split the features as each requires different server config
2024-02-21 22:13:37 +01:00
Pierrick HYMBERT
68b8d4eb55
Merge remote-tracking branch 'origin/master' into test/server-add-ci-test
2024-02-21 18:41:14 +01:00
Pierrick HYMBERT
600cbeb7eb
server: test: ci change the GitHub workflow trigger
2024-02-21 18:35:21 +01:00
Jared Van Bortel
89febfed93
examples : do not assume BOS when shifting context ( #5622 )
2024-02-21 10:33:54 -05:00
Georgi Gerganov
5022cf242d
sync : ggml
2024-02-21 16:52:52 +02:00
Pierrick Hymbert
1ecea255eb
server: health: fix race condition on slots data using tasks queue ( #5634 )
...
* server: health: fix race condition on slots data using tasks queue
* server: health:
* include_slots only if slots_endpoint
* fix compile warning task.target_id not initialized.
2024-02-21 15:47:48 +01:00
Ettore Di Giacinto
a00a35cef9
readme : add LocalAI to the availables UI ( #5629 )
2024-02-21 16:39:10 +02:00
Georgi Gerganov
eccd7a26dd
sync : ggml ( #5633 )
...
* ggml : fix conv_2d batch mode (ggml/737)
Co-authored-by: bssrdf <bssrdf@gmail.com>
* ggml : compute forward no longer pass src tensors (ggml/729)
* sync : ggml
ggml-ci
---------
Co-authored-by: bssrdf <merlintiger@hotmail.com>
Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-02-21 16:17:10 +02:00
Georgi Gerganov
c14f72db9c
readme : update hot topics
2024-02-21 15:39:54 +02:00
Daniel Bevenius
cc6cac08e3
llava : add --skip-unknown to 1.6 convert.py ( #5632 )
...
This commit adds the `--skip-unknown` option to the convert.py script
and removes the saving of the updated checkpoints to avoid updating
possibly checked out files.
The motivation for this change is that this was done for 1.5
in Commit fc0c8d286a
("llava :
update surgery script to not remove tensors") and makes the examples
more consistent.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-21 15:36:57 +02:00
postmasters
580111d42b
llama : add gemma
model ( #5631 )
...
There are couple things in this architecture:
1. Shared input and output embedding parameters.
2. Key length and value length are not derived from `n_embd`.
More information about the models can be found at
https://ai.google.dev/gemma . GGUFs can be downloaded from
https://huggingface.co/google .
2024-02-21 15:08:22 +02:00
Georgi Gerganov
f1d4138c13
server : fix initialization thread issues
2024-02-21 13:08:57 +02:00
Meng, Hengyu
88c46cbdac
[SYCL] conext add name ( #5624 )
...
* [SYCL] conext add name
* name should start with SYCL*
2024-02-21 17:52:06 +08:00