Commit graph

672 commits

Author SHA1 Message Date
Henri Vasserman
bed308c69c
Apply suggestions from code review
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-01 01:15:48 +03:00
Randall Fitzgerald
8478e59b08
Merge pull request #8 from SlyEcho/server_refactor
Change how the token buffers work.
2023-05-31 18:03:40 -04:00
Henri Vasserman
9104fe5a7c
Change how the token buffers work.
There is now just embd (and last_n_tokens).

The input can also be of any length in which case it will be truncated
like it normally would.
2023-06-01 00:47:11 +03:00
Randall Fitzgerald
f2e1130901
Merge pull request #7 from anon998/logging-reuse
Reuse format_generation_settings for logging.
2023-05-31 17:08:12 -04:00
anon
497160a60d remove old log function 2023-05-31 18:01:07 -03:00
anon
1bd7cc60a8 reuse format_generation_settings for logging 2023-05-31 18:00:07 -03:00
anon
43d295fddc filter empty stopping strings 2023-05-31 18:00:07 -03:00
digiwombat
276fa99873 Misunderstood the instructions, I think. Back to the raw JSON output only. 2023-05-31 16:45:57 -04:00
digiwombat
1b96df2b5f Spacing fix. Nothing to see here. 2023-05-31 16:42:43 -04:00
digiwombat
86337e3a9b Server console logs now come in one flavor: Verbose. 2023-05-31 16:41:34 -04:00
digiwombat
dda4c10d64 Switch to the CPPHTTPLIB logger. Verbose adds body dump as well as request info. 2023-05-31 16:23:39 -04:00
digiwombat
7332b41f9f Simple single-line server log for requests 2023-05-31 15:56:27 -04:00
Randall Fitzgerald
96fa480147
Merge pull request #6 from anon998/fix-multibyte
Buffer incomplete multibyte characters + other stuff.
2023-05-31 12:14:43 -04:00
anon
3edaf6bd8b print timings by default 2023-05-31 12:55:19 -03:00
anon
d58e48663d default penalize_nl to false + format 2023-05-31 12:44:27 -03:00
anon
40e13805d9 print timings + build info
I don't know if llama_free is needed but it was used in main.cpp.
2023-05-31 12:44:24 -03:00
anon
dd30219332 buffer incomplete multi-byte characters 2023-05-31 12:31:27 -03:00
anon
27911d6d68 fix default model alias 2023-05-31 12:31:25 -03:00
anon
aa2bbb2d35 fix parameter type 2023-05-31 11:14:34 -03:00
anon
f1710b90dc add infinite generation when n_predict is -1 2023-05-31 11:14:34 -03:00
anon
284bc293b1 reserve memory for generated_text 2023-05-31 11:14:34 -03:00
anon
2c08f29691 make api server use only a single thread 2023-05-31 09:04:33 -03:00
anon
c1cbde82a1 print error when server can't bind to the interface 2023-05-31 09:04:16 -03:00
Randall Fitzgerald
9f2424ac47
Merge pull request #5 from anon998/stop-stream
Stop generating tokens when the stream is closed.
2023-05-30 22:16:32 -04:00
anon
3a079d5cc8 stop generating when the stream is closed 2023-05-30 23:12:00 -03:00
anon
7a8104fbd2 add missing quote when printing stopping strings 2023-05-30 23:11:32 -03:00
digiwombat
b6f536dfb3 Cull to end of generated_text when encountering a stopping string in case it's a partial token.
Will roll this back if it proves to be a problem.
2023-05-30 21:14:24 -04:00
Randall Fitzgerald
9197674a6b
Merge pull request #4 from anon998/logging
Add the --verbose flag and request logging.
2023-05-30 20:58:18 -04:00
anon
aa0788b650 add --verbose flag and request logging 2023-05-30 21:45:56 -03:00
anon
7a853dc56d prevent the server from swallowing exceptions in debug mode
So it's easier to catch them inside a debugger.
2023-05-30 21:39:30 -03:00
Randall Fitzgerald
e6de69abfb
Merge pull request #3 from anon998/sse
Add streaming via server-sent events.
Has some changes that I didn't make, and I decided I prefer "stream" to "streaming"
2023-05-30 20:36:52 -04:00
Randall Fitzgerald
2533878b79
Merge branch 'master' into sse 2023-05-30 20:34:48 -04:00
digiwombat
a25f830fe1 Default streaming to false if it's not set in the request body. 2023-05-30 20:17:18 -04:00
digiwombat
38eaf2b7f7 Removed testing fprintf calls. 2023-05-30 19:48:43 -04:00
digiwombat
3292f057dc Changed to single API endpoint for streaming and non.
next-token endpoint removed.
"as_loop" setting changed to "streaming"
2023-05-30 19:44:16 -04:00
anon
d6fff56e22 add streaming via server-sent events
Removes /next-token endpoint and adds a "stream" parameter to the
/completion one.
2023-05-30 19:33:33 -03:00
digiwombat
03ea8f013a Fix for the regen issue. 2023-05-30 15:48:55 -04:00
Henri Vasserman
42cf4d8433
Merge branch 'master' into master 2023-05-29 01:05:19 +03:00
digiwombat
33b6957177 Fixed failing to return result on stopping token. 2023-05-28 16:45:05 -04:00
Johannes Gäßler
3b126f654f
LLAMA_DEBUG adds debug symbols (#1617) 2023-05-28 21:01:02 +02:00
digiwombat
6c58f64a3b --ctx_size flag to --ctx-size to match common.cpp 2023-05-28 14:17:36 -04:00
digiwombat
b38d41ef52 --memory_f32 flag to --memory-f32 to match common.cpp 2023-05-28 13:58:25 -04:00
digiwombat
655899db89 Add ignore_eos option to generation settings. 2023-05-28 13:49:45 -04:00
Kerfuffle
1b78ed2081
Only show -ngl option when relevant + other doc/arg handling updates (#1625)
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS)
2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible.
3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation
4. Update `main` and `server` examples documentation to use the new style dash separator argument format
5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility.
6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-28 11:48:57 -06:00
Vladimir Zorin
337aea1139
examples : add --alias option to gpt_params to set use friendly model name (#1614) 2023-05-28 20:14:24 +03:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
Randall Fitzgerald
2c9ee7a052
Apply suggestions from code review
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-28 09:34:11 -07:00
Henri Vasserman
74c6f36bf1
Editorconfig suggested fixes
delete whitespace
2023-05-28 19:19:34 +03:00
digiwombat
15ddc4903b Merge remote-tracking branch 'slyecho/server_refactor' 2023-05-28 11:09:32 -04:00