Commit graph

656 commits

Author SHA1 Message Date
anon
dd30219332 buffer incomplete multi-byte characters 2023-05-31 12:31:27 -03:00
anon
27911d6d68 fix default model alias 2023-05-31 12:31:25 -03:00
anon
aa2bbb2d35 fix parameter type 2023-05-31 11:14:34 -03:00
anon
f1710b90dc add infinite generation when n_predict is -1 2023-05-31 11:14:34 -03:00
anon
284bc293b1 reserve memory for generated_text 2023-05-31 11:14:34 -03:00
anon
2c08f29691 make api server use only a single thread 2023-05-31 09:04:33 -03:00
anon
c1cbde82a1 print error when server can't bind to the interface 2023-05-31 09:04:16 -03:00
Randall Fitzgerald
9f2424ac47
Merge pull request #5 from anon998/stop-stream
Stop generating tokens when the stream is closed.
2023-05-30 22:16:32 -04:00
anon
3a079d5cc8 stop generating when the stream is closed 2023-05-30 23:12:00 -03:00
anon
7a8104fbd2 add missing quote when printing stopping strings 2023-05-30 23:11:32 -03:00
digiwombat
b6f536dfb3 Cull to end of generated_text when encountering a stopping string in case it's a partial token.
Will roll this back if it proves to be a problem.
2023-05-30 21:14:24 -04:00
Randall Fitzgerald
9197674a6b
Merge pull request #4 from anon998/logging
Add the --verbose flag and request logging.
2023-05-30 20:58:18 -04:00
anon
aa0788b650 add --verbose flag and request logging 2023-05-30 21:45:56 -03:00
anon
7a853dc56d prevent the server from swallowing exceptions in debug mode
So it's easier to catch them inside a debugger.
2023-05-30 21:39:30 -03:00
Randall Fitzgerald
e6de69abfb
Merge pull request #3 from anon998/sse
Add streaming via server-sent events.
Has some changes that I didn't make, and I decided I prefer "stream" to "streaming"
2023-05-30 20:36:52 -04:00
Randall Fitzgerald
2533878b79
Merge branch 'master' into sse 2023-05-30 20:34:48 -04:00
digiwombat
a25f830fe1 Default streaming to false if it's not set in the request body. 2023-05-30 20:17:18 -04:00
digiwombat
38eaf2b7f7 Removed testing fprintf calls. 2023-05-30 19:48:43 -04:00
digiwombat
3292f057dc Changed to single API endpoint for streaming and non.
next-token endpoint removed.
"as_loop" setting changed to "streaming"
2023-05-30 19:44:16 -04:00
anon
d6fff56e22 add streaming via server-sent events
Removes /next-token endpoint and adds a "stream" parameter to the
/completion one.
2023-05-30 19:33:33 -03:00
digiwombat
03ea8f013a Fix for the regen issue. 2023-05-30 15:48:55 -04:00
Henri Vasserman
42cf4d8433
Merge branch 'master' into master 2023-05-29 01:05:19 +03:00
digiwombat
33b6957177 Fixed failing to return result on stopping token. 2023-05-28 16:45:05 -04:00
Johannes Gäßler
3b126f654f
LLAMA_DEBUG adds debug symbols (#1617) 2023-05-28 21:01:02 +02:00
digiwombat
6c58f64a3b --ctx_size flag to --ctx-size to match common.cpp 2023-05-28 14:17:36 -04:00
digiwombat
b38d41ef52 --memory_f32 flag to --memory-f32 to match common.cpp 2023-05-28 13:58:25 -04:00
digiwombat
655899db89 Add ignore_eos option to generation settings. 2023-05-28 13:49:45 -04:00
Kerfuffle
1b78ed2081
Only show -ngl option when relevant + other doc/arg handling updates (#1625)
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS)
2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible.
3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation
4. Update `main` and `server` examples documentation to use the new style dash separator argument format
5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility.
6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-28 11:48:57 -06:00
Vladimir Zorin
337aea1139
examples : add --alias option to gpt_params to set use friendly model name (#1614) 2023-05-28 20:14:24 +03:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
Randall Fitzgerald
2c9ee7a052
Apply suggestions from code review
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-28 09:34:11 -07:00
Henri Vasserman
74c6f36bf1
Editorconfig suggested fixes
delete whitespace
2023-05-28 19:19:34 +03:00
digiwombat
15ddc4903b Merge remote-tracking branch 'slyecho/server_refactor' 2023-05-28 11:09:32 -04:00
Henri Vasserman
7186d655a1
seed and gen params 2023-05-28 17:03:01 +03:00
digiwombat
7740301db9 Set unspecified generation settings back to default. (Notes below)
- If a given set of values coming along doesn't contain top_k for example, but did before, it would have stayed on the old value, I'm pretty sure. This fixes that.
- I don't know if this could be done a bit prettier by just setting llama.params = gpt_params(); since I'm not sure how the default constructor would react since there's not one defined.
2023-05-28 09:18:47 -04:00
digiwombat
dda915cac4 Added capturing the stopping word and sending it along with the final JSON.
Fixed an fprintf warning
Fixed a bug that broke streaming
Properly removed thread changing in json (only grabbed batch_size before)
2023-05-28 08:43:38 -04:00
digiwombat
2e5c5ee224 Changed JSON names to match the parameter name rather than the variable name. 2023-05-28 08:12:48 -04:00
digiwombat
23928f2887 Added generation_settings to final json object. 2023-05-28 08:04:05 -04:00
digiwombat
e8efd75492 Initial timeout code and expanded json return on completion.
Now passing server params to the help printer so they defaults are ouput.
Bad UTF while streaming now returns a replacement character (\uFFFD)
Changed some error language very slightly.
The JSON now returns extra values, only on `stop` for streaming requests.
New JSON Return Values:
  - tokens_predicted (added to streaming)
  - seed (just pulls it from params, might return -1)
  - prompt (Might be useful)
  - generated_text (Full generated response for streaming requests)
2023-05-28 07:44:31 -04:00
digiwombat
177868e68a Changed to params/args
Seed is now set by the CLI, defaults to -1 if not seed is set
Threads and batch size are now properly launch parameters.
2023-05-28 06:29:11 -04:00
Henri Vasserman
549291fe61
keep processed from the beginning
this means no limit to the input prompt,
it will just get reset again as normal
2023-05-28 12:11:41 +03:00
Randall Fitzgerald
df0e0d094c
Forgot to remove some testing code. 2023-05-28 12:11:14 +03:00
Randall Fitzgerald
f93fe36c5b
Add all generation parameters to server.cpp and allow resetting context
sever.cpp left out a few generation parameters and also seems built to assume un-editable chatting with no regens or swipes. I added a simple "reload_ctx" flag that can be passed on generation that will cause the prompt to be reloaded.
2023-05-28 12:11:10 +03:00
Henri Vasserman
51e09944ce
server rewrite
Remove unnecessary things and radically rewrite server
2023-05-28 12:10:16 +03:00
Randall Fitzgerald
1f40a789e6
Didn't see the already defined top_k var.
lol. Embarrassing. Don't edit code in the github web viewer, kids.
2023-05-27 17:10:09 -07:00
Randall Fitzgerald
e84b802161
Change top_k type.
Is my lack of knowledge of the code base showing? Yes it is.
2023-05-27 17:07:45 -07:00
Randall Fitzgerald
fdce8951ac
Merge branch 'ggerganov:master' into master 2023-05-27 19:57:37 -04:00
Randall Fitzgerald
d20f36b93c
Removed unnecessary last_prompt_token set
Added the one that was supposed to be there.
Apologies for the extra commits, I'm copy pasting from my editor to preserve the two-space indent formatting.
2023-05-27 16:46:05 -07:00
Randall Fitzgerald
36c86d794d
Automate Context resetting and minor fixes
Fixed top_k still not being set.
Removed an unnecessary loop.
2023-05-27 16:43:08 -07:00