server: tests: add infinite loop scenario

This commit is contained in:
Pierrick HYMBERT 2024-02-20 23:11:59 +01:00
parent 6b9dc4f291
commit 68574c6f98
2 changed files with 3 additions and 3 deletions

View file

@ -42,7 +42,7 @@ Feature: llama.cpp server
"""
Write another very long music lyrics.
"""
And 512 max tokens to predict
And 256 max tokens to predict
Given concurrent completion requests
Then the server is busy
And all slots are busy
@ -62,7 +62,7 @@ Feature: llama.cpp server
"""
Write another very long music lyrics.
"""
And 512 max tokens to predict
And 256 max tokens to predict
And streaming is enabled
Given concurrent OAI completions requests
Then the server is busy

View file

@ -176,7 +176,7 @@ def oai_chat_completions(context, user_prompt):
model=context.model,
max_tokens=context.n_predict,
stream=context.enable_streaming,
seed = context.seed
seed=context.seed
)
if context.enable_streaming:
completion_response = {