Randall Fitzgerald
797155a0d1
Merge pull request #10 from cirk2/master
...
Add Options enpoints and Access-Control-Allow-Headers to satisfy CORS
2023-06-01 08:10:26 -04:00
Henri Vasserman
9531ae60db
Add logit bias support
2023-06-01 13:57:47 +03:00
Henri Vasserman
8c6a5fc92b
last tokens fixes
2023-06-01 13:18:12 +03:00
Felix Hellmann
5bbc030338
Add Options enpoints and Access-Control-Allow-Headers to satisfy CORS rules
2023-06-01 10:47:53 +02:00
digiwombat
f7882e2d69
Fixed a crash caused by erasing from empty last_n_tokens
2023-05-31 20:35:28 -04:00
Randall Fitzgerald
5f6e16da36
Merge pull request #9 from anon998/stopping-strings
...
Fix stopping strings.
2023-05-31 20:05:18 -04:00
anon
e9b1f0bf5c
fix stopping strings
2023-05-31 21:00:21 -03:00
digiwombat
342604bb81
Added a super simple CORS header as default for all endpoints.
2023-05-31 19:54:05 -04:00
Henri Vasserman
bed308c69c
Apply suggestions from code review
...
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-01 01:15:48 +03:00
Randall Fitzgerald
8478e59b08
Merge pull request #8 from SlyEcho/server_refactor
...
Change how the token buffers work.
2023-05-31 18:03:40 -04:00
Henri Vasserman
9104fe5a7c
Change how the token buffers work.
...
There is now just embd (and last_n_tokens).
The input can also be of any length in which case it will be truncated
like it normally would.
2023-06-01 00:47:11 +03:00
Randall Fitzgerald
f2e1130901
Merge pull request #7 from anon998/logging-reuse
...
Reuse format_generation_settings for logging.
2023-05-31 17:08:12 -04:00
anon
497160a60d
remove old log function
2023-05-31 18:01:07 -03:00
anon
1bd7cc60a8
reuse format_generation_settings for logging
2023-05-31 18:00:07 -03:00
anon
43d295fddc
filter empty stopping strings
2023-05-31 18:00:07 -03:00
digiwombat
276fa99873
Misunderstood the instructions, I think. Back to the raw JSON output only.
2023-05-31 16:45:57 -04:00
digiwombat
1b96df2b5f
Spacing fix. Nothing to see here.
2023-05-31 16:42:43 -04:00
digiwombat
86337e3a9b
Server console logs now come in one flavor: Verbose.
2023-05-31 16:41:34 -04:00
digiwombat
dda4c10d64
Switch to the CPPHTTPLIB logger. Verbose adds body dump as well as request info.
2023-05-31 16:23:39 -04:00
digiwombat
7332b41f9f
Simple single-line server log for requests
2023-05-31 15:56:27 -04:00
Randall Fitzgerald
96fa480147
Merge pull request #6 from anon998/fix-multibyte
...
Buffer incomplete multibyte characters + other stuff.
2023-05-31 12:14:43 -04:00
anon
3edaf6bd8b
print timings by default
2023-05-31 12:55:19 -03:00
anon
d58e48663d
default penalize_nl to false + format
2023-05-31 12:44:27 -03:00
anon
40e13805d9
print timings + build info
...
I don't know if llama_free is needed but it was used in main.cpp.
2023-05-31 12:44:24 -03:00
anon
dd30219332
buffer incomplete multi-byte characters
2023-05-31 12:31:27 -03:00
anon
27911d6d68
fix default model alias
2023-05-31 12:31:25 -03:00
anon
aa2bbb2d35
fix parameter type
2023-05-31 11:14:34 -03:00
anon
f1710b90dc
add infinite generation when n_predict is -1
2023-05-31 11:14:34 -03:00
anon
284bc293b1
reserve memory for generated_text
2023-05-31 11:14:34 -03:00
anon
2c08f29691
make api server use only a single thread
2023-05-31 09:04:33 -03:00
anon
c1cbde82a1
print error when server can't bind to the interface
2023-05-31 09:04:16 -03:00
Randall Fitzgerald
9f2424ac47
Merge pull request #5 from anon998/stop-stream
...
Stop generating tokens when the stream is closed.
2023-05-30 22:16:32 -04:00
anon
3a079d5cc8
stop generating when the stream is closed
2023-05-30 23:12:00 -03:00
anon
7a8104fbd2
add missing quote when printing stopping strings
2023-05-30 23:11:32 -03:00
digiwombat
b6f536dfb3
Cull to end of generated_text when encountering a stopping string in case it's a partial token.
...
Will roll this back if it proves to be a problem.
2023-05-30 21:14:24 -04:00
Randall Fitzgerald
9197674a6b
Merge pull request #4 from anon998/logging
...
Add the --verbose flag and request logging.
2023-05-30 20:58:18 -04:00
anon
aa0788b650
add --verbose flag and request logging
2023-05-30 21:45:56 -03:00
anon
7a853dc56d
prevent the server from swallowing exceptions in debug mode
...
So it's easier to catch them inside a debugger.
2023-05-30 21:39:30 -03:00
Randall Fitzgerald
e6de69abfb
Merge pull request #3 from anon998/sse
...
Add streaming via server-sent events.
Has some changes that I didn't make, and I decided I prefer "stream" to "streaming"
2023-05-30 20:36:52 -04:00
Randall Fitzgerald
2533878b79
Merge branch 'master' into sse
2023-05-30 20:34:48 -04:00
digiwombat
a25f830fe1
Default streaming to false if it's not set in the request body.
2023-05-30 20:17:18 -04:00
digiwombat
38eaf2b7f7
Removed testing fprintf calls.
2023-05-30 19:48:43 -04:00
digiwombat
3292f057dc
Changed to single API endpoint for streaming and non.
...
next-token endpoint removed.
"as_loop" setting changed to "streaming"
2023-05-30 19:44:16 -04:00
anon
d6fff56e22
add streaming via server-sent events
...
Removes /next-token endpoint and adds a "stream" parameter to the
/completion one.
2023-05-30 19:33:33 -03:00
digiwombat
03ea8f013a
Fix for the regen issue.
2023-05-30 15:48:55 -04:00
Henri Vasserman
ffb06a345e
OpenLLaMA 3B support ( #1588 )
...
This adds support to llama.cpp to load the model.
Currently missing are changes that are required from convert.py to convert the model correctly. It needs some changes to start reading the JSON configuration for HF models instead of deriving the values by guessing.
Co-authored-by: FNsi <125447286+FNsi@users.noreply.github.com>
2023-05-30 21:24:22 +03:00
Georgi Gerganov
7552ac5863
ggml : sync cgraph import / export API
2023-05-29 19:31:44 +03:00
Georgi Gerganov
5d1830b99d
ggml : fix bug in ggml_alibi
2023-05-29 19:30:49 +03:00
DannyDaemonic
248367605e
Work around for recalculating logits in cached prompts ( Fixes #1585 ) ( #1609 )
...
* Work around for recalculating logits in cached prompts
2023-05-29 05:13:40 -07:00
Jiří Podivín
0e730dd23b
Adding git in container package dependencies ( #1621 )
...
Git added to build packages for version information in docker image
Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2023-05-28 21:45:50 -07:00