llama.cpp

Author	SHA1	Message	Date
Vladimir Zorin	337aea1139	examples : add --alias option to gpt_params to set use friendly model name (#1614 )	2023-05-28 20:14:24 +03:00
Howard Su	bb051d9723	opencl : no need to allocate cl_mem on heap (#1612 )	2023-05-28 20:13:36 +03:00
Howard Su	ca74884f66	opencl : use strstr to check if fp16 supported (#1611 ) * Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated	2023-05-28 20:09:56 +03:00
Randall Fitzgerald	2c9ee7a052	Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Henri Vasserman <henv@hot.ee>	2023-05-28 09:34:11 -07:00
Henri Vasserman	74c6f36bf1	Editorconfig suggested fixes delete whitespace	2023-05-28 19:19:34 +03:00
digiwombat	15ddc4903b	Merge remote-tracking branch 'slyecho/server_refactor'	2023-05-28 11:09:32 -04:00
Henri Vasserman	7186d655a1	seed and gen params	2023-05-28 17:03:01 +03:00
digiwombat	7740301db9	Set unspecified generation settings back to default. (Notes below) - If a given set of values coming along doesn't contain top_k for example, but did before, it would have stayed on the old value, I'm pretty sure. This fixes that. - I don't know if this could be done a bit prettier by just setting llama.params = gpt_params(); since I'm not sure how the default constructor would react since there's not one defined.	2023-05-28 09:18:47 -04:00
digiwombat	dda915cac4	Added capturing the stopping word and sending it along with the final JSON. Fixed an fprintf warning Fixed a bug that broke streaming Properly removed thread changing in json (only grabbed batch_size before)	2023-05-28 08:43:38 -04:00
digiwombat	2e5c5ee224	Changed JSON names to match the parameter name rather than the variable name.	2023-05-28 08:12:48 -04:00
digiwombat	23928f2887	Added generation_settings to final json object.	2023-05-28 08:04:05 -04:00
digiwombat	e8efd75492	Initial timeout code and expanded json return on completion. Now passing server params to the help printer so they defaults are ouput. Bad UTF while streaming now returns a replacement character (\uFFFD) Changed some error language very slightly. The JSON now returns extra values, only on `stop` for streaming requests. New JSON Return Values: - tokens_predicted (added to streaming) - seed (just pulls it from params, might return -1) - prompt (Might be useful) - generated_text (Full generated response for streaming requests)	2023-05-28 07:44:31 -04:00
digiwombat	177868e68a	Changed to params/args Seed is now set by the CLI, defaults to -1 if not seed is set Threads and batch size are now properly launch parameters.	2023-05-28 06:29:11 -04:00
Henri Vasserman	549291fe61	keep processed from the beginning this means no limit to the input prompt, it will just get reset again as normal	2023-05-28 12:11:41 +03:00
Randall Fitzgerald	df0e0d094c	Forgot to remove some testing code.	2023-05-28 12:11:14 +03:00
Randall Fitzgerald	f93fe36c5b	Add all generation parameters to server.cpp and allow resetting context sever.cpp left out a few generation parameters and also seems built to assume un-editable chatting with no regens or swipes. I added a simple "reload_ctx" flag that can be passed on generation that will cause the prompt to be reloaded.	2023-05-28 12:11:10 +03:00
Henri Vasserman	51e09944ce	server rewrite Remove unnecessary things and radically rewrite server	2023-05-28 12:10:16 +03:00
Randall Fitzgerald	1f40a789e6	Didn't see the already defined top_k var. lol. Embarrassing. Don't edit code in the github web viewer, kids.	2023-05-27 17:10:09 -07:00
Randall Fitzgerald	e84b802161	Change top_k type. Is my lack of knowledge of the code base showing? Yes it is.	2023-05-27 17:07:45 -07:00
Randall Fitzgerald	fdce8951ac	Merge branch 'ggerganov:master' into master	2023-05-27 19:57:37 -04:00
Randall Fitzgerald	d20f36b93c	Removed unnecessary last_prompt_token set Added the one that was supposed to be there. Apologies for the extra commits, I'm copy pasting from my editor to preserve the two-space indent formatting.	2023-05-27 16:46:05 -07:00
Randall Fitzgerald	36c86d794d	Automate Context resetting and minor fixes Fixed top_k still not being set. Removed an unnecessary loop.	2023-05-27 16:43:08 -07:00
apcameron	a6704643b6	ggml : add support for the RISCV architecture (#1616 )	2023-05-27 23:03:25 +03:00
Randall Fitzgerald	66ed19d01f	Corrected dashes in the help lines. Co-authored-by: Henri Vasserman <henv@hot.ee>	2023-05-27 11:51:21 -07:00
Randall Fitzgerald	48cb16a51a	Merge branch 'ggerganov:master' into master	2023-05-27 13:08:03 -04:00
Kerfuffle	0df7d63e5b	Include server in releases + other build system cleanups (#1610 ) Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases. Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default) Fix issue where `vdot` binary wasn't removed when running `make clean`. Fix compile warnings in `server` example. Add `.hpp` files to trigger workflow (the server example has one).	2023-05-27 11:04:14 -06:00
Henri Vasserman	97c9b77c4f	Add documentation about CLBlast (#1604 ) Installing, compiling and using.	2023-05-27 18:47:55 +03:00
Henri Vasserman	0ecb1bbbeb	[CI] Fix openblas (#1613 ) * Fix OpenBLAS build * Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.	2023-05-27 17:24:06 +03:00
Georgi Gerganov	93618031c7	ggml : add ggml_tensor_overhead()	2023-05-27 16:19:56 +03:00
Henri Vasserman	83c54e6da5	[CI] CLBlast: Fix directory name (#1606 )	2023-05-27 14:18:25 +02:00
Georgi Gerganov	bdbda1b17a	ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name())	2023-05-27 12:23:16 +03:00
Kerfuffle	66874d4fbc	Some improvements to loading the session with --prompt-cache (#1550 ) Improvements to loading the session with `--prompt-cache` in the `main` example. 1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt. 2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.	2023-05-25 20:18:01 -06:00
Johannes Gäßler	1fcdcc28b1	cuda : performance optimizations (#1530 ) * xor hack * block y dim * loop unrolling * Fixed cmake LLAMA_CUDA_BY option * Removed hipblas compatibility code * Define GGML_CUDA_DMMV_BLOCK_Y if not defined * Fewer iters, more ops per iter * Renamed DMMV X/Y compilation options	2023-05-26 00:07:29 +03:00
Randall Fitzgerald	c2b55cc917	Added LoRA Loading Someone please test this. I have no LoRAs available to test. The code is direct from the base repo so it should be fine.	2023-05-25 12:53:05 -07:00
Henri Vasserman	ac7876ac20	Update CLBlast to 1.6.0 (#1580 ) * Update CLBlast to 1.6.0	2023-05-24 10:30:09 +03:00
Evan Jones	c31bbe934b	readme : add docs for chat-persistent.sh (#1568 ) * readme : add docs for chat-persistent.sh * Update README.md	2023-05-24 09:24:01 +03:00
Senemu	1359b6aba5	chat-persistent.sh : use bracket expressions in grep (#1564 )	2023-05-24 09:16:22 +03:00
Randall Fitzgerald	8d7b28c28d	Fixed some types in the params. Quickly copy pasted without fixing them up. Whoopsies.	2023-05-23 13:35:12 -07:00
Randall Fitzgerald	3537ad1821	Merge branch 'ggerganov:master' into master	2023-05-23 13:31:14 -04:00
Maarten ter Huurne	7d873811f3	Fix handling of "invalid property" when creating OpenCL command queue (#1565 ) The `clCreateCommandQueue()` function will return the code `CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties, not `CL_INVALID_PROPERTY` as the original code was checking for.	2023-05-23 19:01:15 +03:00
Randall Fitzgerald	add5f1bdc9	Update examples/server/server.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-05-23 07:34:41 -07:00
Randall Fitzgerald	421e66b330	Update examples/server/server.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-05-23 07:34:36 -07:00
Randall Fitzgerald	2071d730fa	Forgot to remove some testing code.	2023-05-23 06:22:30 -07:00
Randall Fitzgerald	1c3fdf8cfd	Add all generation parameters to server.cpp and allow resetting context sever.cpp left out a few generation parameters and also seems built to assume un-editable chatting with no regens or swipes. I added a simple "reload_ctx" flag that can be passed on generation that will cause the prompt to be reloaded.	2023-05-23 06:16:54 -07:00
0cc4m	2e6cd4b025	OpenCL Token Generation Acceleration (#1459 ) * Move back to C++ for OpenCL * Refactor OpenCL code to work more like the CUDA code, add missing functions * Deduplicate dequant kernels * Add OpenCL compile options * Use compile args for preprocessing constants * Restore default platform + device selection by id behavior --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Henri Vasserman <henv@hot.ee>	2023-05-23 00:33:24 +03:00
Steward Garcia	7e4ea5beff	examples : add server example with REST API (#1443 ) * Added httplib support * Added readme for server example * fixed some bugs * Fix the build error on Macbook * changed json11 to nlohmann-json * removed some whitespaces * remove trailing whitespace * added support custom prompts and more functions * some corrections and added as cmake option	2023-05-21 20:51:18 +03:00
Stefan Sydow	7780e4f479	make : .PHONY clean (#1553 )	2023-05-21 17:03:44 +03:00
Georgi Gerganov	265db9834e	ggml : output 3d sizes in ggml_graph_dump_dot()	2023-05-21 11:56:23 +03:00
Georgi Gerganov	fab49c685e	ggml : update WASM SIMD	2023-05-20 20:00:41 +03:00
Zenix	b8ee340abe	feature : support blis and other blas implementation (#1536 ) * feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix: blas changes on ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-20 17:58:31 +03:00

1 2 3 4 5 ...

678 commits