From bedf37c9d180b3c9203ce3506efaa19c5978c4b3 Mon Sep 17 00:00:00 2001
From: Pierrick HYMBERT <pierrick.hymbert@gmail.com>
Date: Fri, 23 Feb 2024 02:38:37 +0100
Subject: [PATCH] server: tests: reducing n_ctx and n_predict for // prompts as
 it is too slow in the CI.

---
 examples/server/tests/features/parallel.feature | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/examples/server/tests/features/parallel.feature b/examples/server/tests/features/parallel.feature
index 8fe1befd0..d4d403ead 100644
--- a/examples/server/tests/features/parallel.feature
+++ b/examples/server/tests/features/parallel.feature
@@ -6,7 +6,7 @@ Feature: Parallel
     And   a model file stories260K.gguf
     And   a model alias tinyllama-2
     And   42 as server seed
-    And   32 KV cache size
+    And   64 KV cache size
     And   2 slots
     And   continuous batching
     Then  the server is starting
@@ -29,7 +29,7 @@ Feature: Parallel
     Then all prompts are predicted with <n_predict> tokens
     Examples:
       | n_predict |
-      | 512       |
+      | 128       |
 
   Scenario Outline: Multi users OAI completions compatibility
     Given a system prompt You are a writer.
@@ -50,15 +50,15 @@ Feature: Parallel
     Then all prompts are predicted with <n_predict> tokens
     Examples:
       | streaming | n_predict |
-      | disabled  | 512       |
-      #| enabled   | 512       | FIXME: phymbert: need to investigate why in aiohttp with streaming only one token is generated
+      | disabled  | 64       |
+      #| enabled   | 64       | FIXME: phymbert: need to investigate why in aiohttp with streaming only one token is generated
 
   Scenario:  Multi users with total number of tokens to predict exceeds the KV Cache size #3969
     Given a server listening on localhost:8080
     And   a model file stories260K.gguf
     And   42 as server seed
     And   2 slots
-    And   1024 KV cache size
+    And   64 KV cache size
     Then  the server is starting
     Then  the server is healthy
     Given a prompt:
@@ -77,7 +77,7 @@ Feature: Parallel
       """
       Write a very long joke.
       """
-    And 2048 max tokens to predict
+    And 128 max tokens to predict
     Given concurrent completion requests
     Then the server is busy
     Then the server is idle