llama.cpp

Author	SHA1	Message	Date
Henri Vasserman	9104fe5a7c	Change how the token buffers work. There is now just embd (and last_n_tokens). The input can also be of any length in which case it will be truncated like it normally would.	2023-06-01 00:47:11 +03:00
Randall Fitzgerald	f2e1130901	Merge pull request #7 from anon998/logging-reuse Reuse format_generation_settings for logging.	2023-05-31 17:08:12 -04:00
anon	497160a60d	remove old log function	2023-05-31 18:01:07 -03:00
anon	1bd7cc60a8	reuse format_generation_settings for logging	2023-05-31 18:00:07 -03:00
anon	43d295fddc	filter empty stopping strings	2023-05-31 18:00:07 -03:00
digiwombat	276fa99873	Misunderstood the instructions, I think. Back to the raw JSON output only.	2023-05-31 16:45:57 -04:00
digiwombat	1b96df2b5f	Spacing fix. Nothing to see here.	2023-05-31 16:42:43 -04:00
digiwombat	86337e3a9b	Server console logs now come in one flavor: Verbose.	2023-05-31 16:41:34 -04:00
digiwombat	dda4c10d64	Switch to the CPPHTTPLIB logger. Verbose adds body dump as well as request info.	2023-05-31 16:23:39 -04:00
digiwombat	7332b41f9f	Simple single-line server log for requests	2023-05-31 15:56:27 -04:00
Randall Fitzgerald	96fa480147	Merge pull request #6 from anon998/fix-multibyte Buffer incomplete multibyte characters + other stuff.	2023-05-31 12:14:43 -04:00
anon	3edaf6bd8b	print timings by default	2023-05-31 12:55:19 -03:00
anon	d58e48663d	default penalize_nl to false + format	2023-05-31 12:44:27 -03:00
anon	40e13805d9	print timings + build info I don't know if llama_free is needed but it was used in main.cpp.	2023-05-31 12:44:24 -03:00
anon	dd30219332	buffer incomplete multi-byte characters	2023-05-31 12:31:27 -03:00
anon	27911d6d68	fix default model alias	2023-05-31 12:31:25 -03:00
anon	aa2bbb2d35	fix parameter type	2023-05-31 11:14:34 -03:00
anon	f1710b90dc	add infinite generation when n_predict is -1	2023-05-31 11:14:34 -03:00
anon	284bc293b1	reserve memory for generated_text	2023-05-31 11:14:34 -03:00
anon	2c08f29691	make api server use only a single thread	2023-05-31 09:04:33 -03:00
anon	c1cbde82a1	print error when server can't bind to the interface	2023-05-31 09:04:16 -03:00
Randall Fitzgerald	9f2424ac47	Merge pull request #5 from anon998/stop-stream Stop generating tokens when the stream is closed.	2023-05-30 22:16:32 -04:00
anon	3a079d5cc8	stop generating when the stream is closed	2023-05-30 23:12:00 -03:00
anon	7a8104fbd2	add missing quote when printing stopping strings	2023-05-30 23:11:32 -03:00
digiwombat	b6f536dfb3	Cull to end of generated_text when encountering a stopping string in case it's a partial token. Will roll this back if it proves to be a problem.	2023-05-30 21:14:24 -04:00
Randall Fitzgerald	9197674a6b	Merge pull request #4 from anon998/logging Add the --verbose flag and request logging.	2023-05-30 20:58:18 -04:00
anon	aa0788b650	add --verbose flag and request logging	2023-05-30 21:45:56 -03:00
anon	7a853dc56d	prevent the server from swallowing exceptions in debug mode So it's easier to catch them inside a debugger.	2023-05-30 21:39:30 -03:00
Randall Fitzgerald	e6de69abfb	Merge pull request #3 from anon998/sse Add streaming via server-sent events. Has some changes that I didn't make, and I decided I prefer "stream" to "streaming"	2023-05-30 20:36:52 -04:00
Randall Fitzgerald	2533878b79	Merge branch 'master' into sse	2023-05-30 20:34:48 -04:00
digiwombat	a25f830fe1	Default streaming to false if it's not set in the request body.	2023-05-30 20:17:18 -04:00
digiwombat	38eaf2b7f7	Removed testing fprintf calls.	2023-05-30 19:48:43 -04:00
digiwombat	3292f057dc	Changed to single API endpoint for streaming and non. next-token endpoint removed. "as_loop" setting changed to "streaming"	2023-05-30 19:44:16 -04:00
anon	d6fff56e22	add streaming via server-sent events Removes /next-token endpoint and adds a "stream" parameter to the /completion one.	2023-05-30 19:33:33 -03:00
digiwombat	03ea8f013a	Fix for the regen issue.	2023-05-30 15:48:55 -04:00
Henri Vasserman	42cf4d8433	Merge branch 'master' into master	2023-05-29 01:05:19 +03:00
digiwombat	33b6957177	Fixed failing to return result on stopping token.	2023-05-28 16:45:05 -04:00
Johannes Gäßler	3b126f654f	LLAMA_DEBUG adds debug symbols (#1617 )	2023-05-28 21:01:02 +02:00
digiwombat	6c58f64a3b	--ctx_size flag to --ctx-size to match common.cpp	2023-05-28 14:17:36 -04:00
digiwombat	b38d41ef52	--memory_f32 flag to --memory-f32 to match common.cpp	2023-05-28 13:58:25 -04:00
digiwombat	655899db89	Add ignore_eos option to generation settings.	2023-05-28 13:49:45 -04:00
Kerfuffle	1b78ed2081	Only show -ngl option when relevant + other doc/arg handling updates (#1625 ) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356	2023-05-28 11:48:57 -06:00
Vladimir Zorin	337aea1139	examples : add --alias option to gpt_params to set use friendly model name (#1614 )	2023-05-28 20:14:24 +03:00
Howard Su	bb051d9723	opencl : no need to allocate cl_mem on heap (#1612 )	2023-05-28 20:13:36 +03:00
Howard Su	ca74884f66	opencl : use strstr to check if fp16 supported (#1611 ) * Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated	2023-05-28 20:09:56 +03:00
Randall Fitzgerald	2c9ee7a052	Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Henri Vasserman <henv@hot.ee>	2023-05-28 09:34:11 -07:00
Henri Vasserman	74c6f36bf1	Editorconfig suggested fixes delete whitespace	2023-05-28 19:19:34 +03:00
digiwombat	15ddc4903b	Merge remote-tracking branch 'slyecho/server_refactor'	2023-05-28 11:09:32 -04:00
Henri Vasserman	7186d655a1	seed and gen params	2023-05-28 17:03:01 +03:00
digiwombat	7740301db9	Set unspecified generation settings back to default. (Notes below) - If a given set of values coming along doesn't contain top_k for example, but did before, it would have stayed on the old value, I'm pretty sure. This fixes that. - I don't know if this could be done a bit prettier by just setting llama.params = gpt_params(); since I'm not sure how the default constructor would react since there's not one defined.	2023-05-28 09:18:47 -04:00

1 2 3 4 5 ...

670 commits