llama.cpp

Author	SHA1	Message	Date
Pierrick HYMBERT	530d3ae4c4	server: tests: reducing sleep time during scenario	2024-02-23 02:38:54 +01:00
Pierrick HYMBERT	bedf37c9d1	server: tests: reducing n_ctx and n_predict for // prompts as it is too slow in the CI.	2024-02-23 02:38:37 +01:00
Pierrick HYMBERT	5110de08e3	server: tests: fix coloring console	2024-02-23 02:31:44 +01:00
Pierrick HYMBERT	6bba3be151	server: tests: ci adding psmisc as it is not present by default in ubuntu base killall	2024-02-23 02:31:30 +01:00
Pierrick HYMBERT	6e71126c12	server: tests: ci adding curl as it is not present by default in ubuntu base for the hf.sh script	2024-02-23 02:19:47 +01:00
Pierrick HYMBERT	d0e0050843	server: tests: ci adding python3-pip as it is not present by default in ubuntu base	2024-02-23 02:16:56 +01:00
Pierrick HYMBERT	2bb4732c01	server: tests: ci adding cmake as it is not present by default in ubuntu base	2024-02-23 02:13:30 +01:00
Pierrick HYMBERT	6a215e5359	server: tests: ci adding container to specify server port and allow the server to listen to	2024-02-23 02:11:56 +01:00
Pierrick HYMBERT	2f756f84df	server: tests: allow to override the server port before launching tests	2024-02-23 01:59:29 +01:00
Pierrick HYMBERT	70e90558ae	server: tests: add log in server start to identify why the server does not listen on the CI	2024-02-23 01:46:08 +01:00
Pierrick HYMBERT	b38b9e60a1	server: tests: minor fix server --alias param passed twice	2024-02-23 01:31:56 +01:00
Pierrick HYMBERT	14b6ede152	server: tests: minor color change	2024-02-23 01:29:39 +01:00
Pierrick HYMBERT	1bd07e56c4	server: tests: assert embeddings are actually computed, make the embeddings endpoint configurable. Add logs to investigate why the CI server test job is not starting	2024-02-23 01:25:08 +01:00
Pierrick HYMBERT	cba6d4ea17	server: tests: minor fix missing param.	2024-02-23 00:54:44 +01:00
Pierrick HYMBERT	51f527440a	server: tests: ci triggered on any changes on server example path	2024-02-23 00:37:42 +01:00
Pierrick HYMBERT	26b66c5496	server: tests: Fix some random behavior where the wait for busy status is missing	2024-02-22 23:38:47 +01:00
Pierrick HYMBERT	aa591ef12d	server: tests: add Multi users with total number of tokens to predict exceeds the KV Cache size	2024-02-22 23:37:56 +01:00
Pierrick HYMBERT	f820e10fa7	server: tests: ci ensure the server is stopped before scenario, and do not quit while the server is listening	2024-02-22 23:18:42 +01:00
Pierrick HYMBERT	8b96bdaf08	Merge remote-tracking branch 'origin/master' into test/server-add-ci-test	2024-02-22 22:11:36 +01:00
Pierrick HYMBERT	597c181abb	server: tests: ci do not take a model anymore, fix trigger patch	2024-02-22 21:58:28 +01:00
Pierrick HYMBERT	e43406e36d	server: tests: switch to asyncio for concurrent tests, match result content with regex	2024-02-22 21:55:40 +01:00
Pierrick HYMBERT	016b221549	server: fix health/slots endpoint slot state access available race condition	2024-02-22 21:55:18 +01:00
Someone	201294ae17	nix: init singularity and docker images (#5056 ) Exposes a few attributes demonstrating how to build [singularity](https://docs.sylabs.io/guides/latest/user-guide/)/[apptainer](https://apptainer.org/) and Docker images re-using llama.cpp's Nix expression. Built locally on `x86_64-linux` with `nix build github:someoneserge/llama.cpp/feat/nix/images#llamaPackages.{docker,docker-min,sif,llama-cpp}` and it's fast and effective.	2024-02-22 11:44:10 -08:00
Georgi Gerganov	5a9e2f60ba	py : minor fixes (#5668 )	2024-02-22 20:13:25 +02:00
Xuan Son Nguyen	373ee3fbba	Add Gemma chat template (#5665 ) * add gemma chat template * gemma: only apply system_prompt on non-model message	2024-02-22 19:10:21 +01:00
Someone	4cb4d8b22d	workflows: nix: hardcode cachix ids, build unconditionally (#5663 ) GitHub does not expose environment and repository variables to PRs coming from forks implies that we've been disabling the Nix CI actions for most PRs. The `if:` also didn't make much sense, because we can always pull from cachix, and there's no point (albeit no risk either) in pushing cache for the untrusted code.	2024-02-22 08:32:09 -08:00
Georgi Gerganov	3a03541ced	minor : fix trailing whitespace (#5638 )	2024-02-22 13:54:03 +02:00
Georgi Gerganov	41676d9920	ci : actually no reason to exclude GPU code from triggers	2024-02-22 13:33:00 +02:00
Georgi Gerganov	a697cd1314	minor : fix missing new line	2024-02-22 13:29:20 +02:00
Georgi Gerganov	56d03d92be	readme : update hot topics	2024-02-22 10:35:54 +02:00
Xuan Son Nguyen	a46f50747b	server : fallback to chatml, add AlphaMonarch chat template (#5628 ) * server: fallback to chatml * add new chat template * server: add AlphaMonarch to test chat template * server: only check model template if there is no custom tmpl * remove TODO	2024-02-22 10:33:24 +02:00
Alexey Parfenov	c5688c6250	server : clarify some params in the docs (#5640 )	2024-02-22 10:27:32 +02:00
Dat Quoc Nguyen	4ef245a92a	mpt : add optional bias tensors (#5638 ) Update for MPT with optional bias parameters: to work with PhoGPT and SEA-LION models that were pre-trained with 'bias'.	2024-02-22 10:15:13 +02:00
slaren	973053d8b0	llama : fix loading models with shared tok_embd and output (#5651 ) ggml-ci	2024-02-22 00:42:09 +01:00
Xuan Son Nguyen	7c8bcc11dc	Add docs for llama_chat_apply_template (#5645 ) * add docs for llama_chat_apply_template * fix typo	2024-02-22 00:31:00 +01:00
Pierrick HYMBERT	534998dbb9	server: tests: ci tests.sh exit code	2024-02-21 23:06:20 +01:00
slaren	7fe4678b02	llama : fix session save/load with quantized KV (#5649 )	2024-02-21 22:52:39 +01:00
Pierrick HYMBERT	01cca6625b	server: tests: ci fix model download path	2024-02-21 22:43:39 +01:00
slaren	ba2135ccae	gemma : allow offloading the output tensor (#5646 )	2024-02-21 22:18:23 +01:00
Pierrick HYMBERT	6406208174	server: tests: * start the server at each scenario * split the features as each requires different server config	2024-02-21 22:13:37 +01:00
Pierrick HYMBERT	68b8d4eb55	Merge remote-tracking branch 'origin/master' into test/server-add-ci-test	2024-02-21 18:41:14 +01:00
Pierrick HYMBERT	600cbeb7eb	server: test: ci change the GitHub workflow trigger	2024-02-21 18:35:21 +01:00
Jared Van Bortel	89febfed93	examples : do not assume BOS when shifting context (#5622 )	2024-02-21 10:33:54 -05:00
Georgi Gerganov	5022cf242d	sync : ggml	2024-02-21 16:52:52 +02:00
Pierrick Hymbert	1ecea255eb	server: health: fix race condition on slots data using tasks queue (#5634 ) * server: health: fix race condition on slots data using tasks queue * server: health: * include_slots only if slots_endpoint * fix compile warning task.target_id not initialized.	2024-02-21 15:47:48 +01:00
Ettore Di Giacinto	a00a35cef9	readme : add LocalAI to the availables UI (#5629 )	2024-02-21 16:39:10 +02:00
Georgi Gerganov	eccd7a26dd	sync : ggml (#5633 ) * ggml : fix conv_2d batch mode (ggml/737) Co-authored-by: bssrdf <bssrdf@gmail.com> * ggml : compute forward no longer pass src tensors (ggml/729) * sync : ggml ggml-ci --------- Co-authored-by: bssrdf <merlintiger@hotmail.com> Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-02-21 16:17:10 +02:00
Georgi Gerganov	c14f72db9c	readme : update hot topics	2024-02-21 15:39:54 +02:00
Daniel Bevenius	cc6cac08e3	llava : add --skip-unknown to 1.6 convert.py (#5632 ) This commit adds the `--skip-unknown` option to the convert.py script and removes the saving of the updated checkpoints to avoid updating possibly checked out files. The motivation for this change is that this was done for 1.5 in Commit `fc0c8d286a` ("llava : update surgery script to not remove tensors") and makes the examples more consistent. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-21 15:36:57 +02:00
postmasters	580111d42b	llama : add `gemma` model (#5631 ) There are couple things in this architecture: 1. Shared input and output embedding parameters. 2. Key length and value length are not derived from `n_embd`. More information about the models can be found at https://ai.google.dev/gemma. GGUFs can be downloaded from https://huggingface.co/google.	2024-02-21 15:08:22 +02:00

1 2 3 4 5 ...

2301 commits