Henri Vasserman
9104fe5a7c
Change how the token buffers work.
...
There is now just embd (and last_n_tokens).
The input can also be of any length in which case it will be truncated
like it normally would.
2023-06-01 00:47:11 +03:00
Randall Fitzgerald
f2e1130901
Merge pull request #7 from anon998/logging-reuse
...
Reuse format_generation_settings for logging.
2023-05-31 17:08:12 -04:00
anon
497160a60d
remove old log function
2023-05-31 18:01:07 -03:00
anon
1bd7cc60a8
reuse format_generation_settings for logging
2023-05-31 18:00:07 -03:00
anon
43d295fddc
filter empty stopping strings
2023-05-31 18:00:07 -03:00
digiwombat
276fa99873
Misunderstood the instructions, I think. Back to the raw JSON output only.
2023-05-31 16:45:57 -04:00
digiwombat
1b96df2b5f
Spacing fix. Nothing to see here.
2023-05-31 16:42:43 -04:00
digiwombat
86337e3a9b
Server console logs now come in one flavor: Verbose.
2023-05-31 16:41:34 -04:00
digiwombat
dda4c10d64
Switch to the CPPHTTPLIB logger. Verbose adds body dump as well as request info.
2023-05-31 16:23:39 -04:00
digiwombat
7332b41f9f
Simple single-line server log for requests
2023-05-31 15:56:27 -04:00
Randall Fitzgerald
96fa480147
Merge pull request #6 from anon998/fix-multibyte
...
Buffer incomplete multibyte characters + other stuff.
2023-05-31 12:14:43 -04:00
anon
3edaf6bd8b
print timings by default
2023-05-31 12:55:19 -03:00
anon
d58e48663d
default penalize_nl to false + format
2023-05-31 12:44:27 -03:00
anon
40e13805d9
print timings + build info
...
I don't know if llama_free is needed but it was used in main.cpp.
2023-05-31 12:44:24 -03:00
anon
dd30219332
buffer incomplete multi-byte characters
2023-05-31 12:31:27 -03:00
anon
27911d6d68
fix default model alias
2023-05-31 12:31:25 -03:00
anon
aa2bbb2d35
fix parameter type
2023-05-31 11:14:34 -03:00
anon
f1710b90dc
add infinite generation when n_predict is -1
2023-05-31 11:14:34 -03:00
anon
284bc293b1
reserve memory for generated_text
2023-05-31 11:14:34 -03:00
anon
2c08f29691
make api server use only a single thread
2023-05-31 09:04:33 -03:00
anon
c1cbde82a1
print error when server can't bind to the interface
2023-05-31 09:04:16 -03:00
Randall Fitzgerald
9f2424ac47
Merge pull request #5 from anon998/stop-stream
...
Stop generating tokens when the stream is closed.
2023-05-30 22:16:32 -04:00
anon
3a079d5cc8
stop generating when the stream is closed
2023-05-30 23:12:00 -03:00
anon
7a8104fbd2
add missing quote when printing stopping strings
2023-05-30 23:11:32 -03:00
digiwombat
b6f536dfb3
Cull to end of generated_text when encountering a stopping string in case it's a partial token.
...
Will roll this back if it proves to be a problem.
2023-05-30 21:14:24 -04:00
Randall Fitzgerald
9197674a6b
Merge pull request #4 from anon998/logging
...
Add the --verbose flag and request logging.
2023-05-30 20:58:18 -04:00
anon
aa0788b650
add --verbose flag and request logging
2023-05-30 21:45:56 -03:00
anon
7a853dc56d
prevent the server from swallowing exceptions in debug mode
...
So it's easier to catch them inside a debugger.
2023-05-30 21:39:30 -03:00
Randall Fitzgerald
e6de69abfb
Merge pull request #3 from anon998/sse
...
Add streaming via server-sent events.
Has some changes that I didn't make, and I decided I prefer "stream" to "streaming"
2023-05-30 20:36:52 -04:00
Randall Fitzgerald
2533878b79
Merge branch 'master' into sse
2023-05-30 20:34:48 -04:00
digiwombat
a25f830fe1
Default streaming to false if it's not set in the request body.
2023-05-30 20:17:18 -04:00
digiwombat
38eaf2b7f7
Removed testing fprintf calls.
2023-05-30 19:48:43 -04:00
digiwombat
3292f057dc
Changed to single API endpoint for streaming and non.
...
next-token endpoint removed.
"as_loop" setting changed to "streaming"
2023-05-30 19:44:16 -04:00
anon
d6fff56e22
add streaming via server-sent events
...
Removes /next-token endpoint and adds a "stream" parameter to the
/completion one.
2023-05-30 19:33:33 -03:00
digiwombat
03ea8f013a
Fix for the regen issue.
2023-05-30 15:48:55 -04:00
Henri Vasserman
42cf4d8433
Merge branch 'master' into master
2023-05-29 01:05:19 +03:00
digiwombat
33b6957177
Fixed failing to return result on stopping token.
2023-05-28 16:45:05 -04:00
Johannes Gäßler
3b126f654f
LLAMA_DEBUG adds debug symbols ( #1617 )
2023-05-28 21:01:02 +02:00
digiwombat
6c58f64a3b
--ctx_size flag to --ctx-size to match common.cpp
2023-05-28 14:17:36 -04:00
digiwombat
b38d41ef52
--memory_f32 flag to --memory-f32 to match common.cpp
2023-05-28 13:58:25 -04:00
digiwombat
655899db89
Add ignore_eos option to generation settings.
2023-05-28 13:49:45 -04:00
Kerfuffle
1b78ed2081
Only show -ngl option when relevant + other doc/arg handling updates ( #1625 )
...
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS)
2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible.
3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation
4. Update `main` and `server` examples documentation to use the new style dash separator argument format
5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility.
6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-28 11:48:57 -06:00
Vladimir Zorin
337aea1139
examples : add --alias option to gpt_params to set use friendly model name ( #1614 )
2023-05-28 20:14:24 +03:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap ( #1612 )
2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported ( #1611 )
...
* Use strstr to check if fp16 supported
* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
Randall Fitzgerald
2c9ee7a052
Apply suggestions from code review
...
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-28 09:34:11 -07:00
Henri Vasserman
74c6f36bf1
Editorconfig suggested fixes
...
delete whitespace
2023-05-28 19:19:34 +03:00
digiwombat
15ddc4903b
Merge remote-tracking branch 'slyecho/server_refactor'
2023-05-28 11:09:32 -04:00
Henri Vasserman
7186d655a1
seed and gen params
2023-05-28 17:03:01 +03:00
digiwombat
7740301db9
Set unspecified generation settings back to default. (Notes below)
...
- If a given set of values coming along doesn't contain top_k for example, but did before, it would have stayed on the old value, I'm pretty sure. This fixes that.
- I don't know if this could be done a bit prettier by just setting llama.params = gpt_params(); since I'm not sure how the default constructor would react since there's not one defined.
2023-05-28 09:18:47 -04:00