llama.cpp

Author	SHA1	Message	Date
digiwombat	e8efd75492	Initial timeout code and expanded json return on completion. Now passing server params to the help printer so they defaults are ouput. Bad UTF while streaming now returns a replacement character (\uFFFD) Changed some error language very slightly. The JSON now returns extra values, only on `stop` for streaming requests. New JSON Return Values: - tokens_predicted (added to streaming) - seed (just pulls it from params, might return -1) - prompt (Might be useful) - generated_text (Full generated response for streaming requests)	2023-05-28 07:44:31 -04:00
digiwombat	177868e68a	Changed to params/args Seed is now set by the CLI, defaults to -1 if not seed is set Threads and batch size are now properly launch parameters.	2023-05-28 06:29:11 -04:00
Henri Vasserman	549291fe61	keep processed from the beginning this means no limit to the input prompt, it will just get reset again as normal	2023-05-28 12:11:41 +03:00
Randall Fitzgerald	df0e0d094c	Forgot to remove some testing code.	2023-05-28 12:11:14 +03:00
Randall Fitzgerald	f93fe36c5b	Add all generation parameters to server.cpp and allow resetting context sever.cpp left out a few generation parameters and also seems built to assume un-editable chatting with no regens or swipes. I added a simple "reload_ctx" flag that can be passed on generation that will cause the prompt to be reloaded.	2023-05-28 12:11:10 +03:00
Henri Vasserman	51e09944ce	server rewrite Remove unnecessary things and radically rewrite server	2023-05-28 12:10:16 +03:00
Randall Fitzgerald	1f40a789e6	Didn't see the already defined top_k var. lol. Embarrassing. Don't edit code in the github web viewer, kids.	2023-05-27 17:10:09 -07:00
Randall Fitzgerald	e84b802161	Change top_k type. Is my lack of knowledge of the code base showing? Yes it is.	2023-05-27 17:07:45 -07:00
Randall Fitzgerald	fdce8951ac	Merge branch 'ggerganov:master' into master	2023-05-27 19:57:37 -04:00
Randall Fitzgerald	d20f36b93c	Removed unnecessary last_prompt_token set Added the one that was supposed to be there. Apologies for the extra commits, I'm copy pasting from my editor to preserve the two-space indent formatting.	2023-05-27 16:46:05 -07:00
Randall Fitzgerald	36c86d794d	Automate Context resetting and minor fixes Fixed top_k still not being set. Removed an unnecessary loop.	2023-05-27 16:43:08 -07:00
apcameron	a6704643b6	ggml : add support for the RISCV architecture (#1616 )	2023-05-27 23:03:25 +03:00
Randall Fitzgerald	66ed19d01f	Corrected dashes in the help lines. Co-authored-by: Henri Vasserman <henv@hot.ee>	2023-05-27 11:51:21 -07:00
Randall Fitzgerald	48cb16a51a	Merge branch 'ggerganov:master' into master	2023-05-27 13:08:03 -04:00
Kerfuffle	0df7d63e5b	Include server in releases + other build system cleanups (#1610 ) Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases. Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default) Fix issue where `vdot` binary wasn't removed when running `make clean`. Fix compile warnings in `server` example. Add `.hpp` files to trigger workflow (the server example has one).	2023-05-27 11:04:14 -06:00
Henri Vasserman	97c9b77c4f	Add documentation about CLBlast (#1604 ) Installing, compiling and using.	2023-05-27 18:47:55 +03:00
Henri Vasserman	0ecb1bbbeb	[CI] Fix openblas (#1613 ) * Fix OpenBLAS build * Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.	2023-05-27 17:24:06 +03:00
Georgi Gerganov	93618031c7	ggml : add ggml_tensor_overhead()	2023-05-27 16:19:56 +03:00
Henri Vasserman	83c54e6da5	[CI] CLBlast: Fix directory name (#1606 )	2023-05-27 14:18:25 +02:00
Georgi Gerganov	bdbda1b17a	ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name())	2023-05-27 12:23:16 +03:00
Kerfuffle	66874d4fbc	Some improvements to loading the session with --prompt-cache (#1550 ) Improvements to loading the session with `--prompt-cache` in the `main` example. 1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt. 2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.	2023-05-25 20:18:01 -06:00
Johannes Gäßler	1fcdcc28b1	cuda : performance optimizations (#1530 ) * xor hack * block y dim * loop unrolling * Fixed cmake LLAMA_CUDA_BY option * Removed hipblas compatibility code * Define GGML_CUDA_DMMV_BLOCK_Y if not defined * Fewer iters, more ops per iter * Renamed DMMV X/Y compilation options	2023-05-26 00:07:29 +03:00
Randall Fitzgerald	c2b55cc917	Added LoRA Loading Someone please test this. I have no LoRAs available to test. The code is direct from the base repo so it should be fine.	2023-05-25 12:53:05 -07:00
Henri Vasserman	ac7876ac20	Update CLBlast to 1.6.0 (#1580 ) * Update CLBlast to 1.6.0	2023-05-24 10:30:09 +03:00
Evan Jones	c31bbe934b	readme : add docs for chat-persistent.sh (#1568 ) * readme : add docs for chat-persistent.sh * Update README.md	2023-05-24 09:24:01 +03:00
Senemu	1359b6aba5	chat-persistent.sh : use bracket expressions in grep (#1564 )	2023-05-24 09:16:22 +03:00
Randall Fitzgerald	8d7b28c28d	Fixed some types in the params. Quickly copy pasted without fixing them up. Whoopsies.	2023-05-23 13:35:12 -07:00
Randall Fitzgerald	3537ad1821	Merge branch 'ggerganov:master' into master	2023-05-23 13:31:14 -04:00
Maarten ter Huurne	7d873811f3	Fix handling of "invalid property" when creating OpenCL command queue (#1565 ) The `clCreateCommandQueue()` function will return the code `CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties, not `CL_INVALID_PROPERTY` as the original code was checking for.	2023-05-23 19:01:15 +03:00
Randall Fitzgerald	add5f1bdc9	Update examples/server/server.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-05-23 07:34:41 -07:00
Randall Fitzgerald	421e66b330	Update examples/server/server.cpp Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-05-23 07:34:36 -07:00
Randall Fitzgerald	2071d730fa	Forgot to remove some testing code.	2023-05-23 06:22:30 -07:00
Randall Fitzgerald	1c3fdf8cfd	Add all generation parameters to server.cpp and allow resetting context sever.cpp left out a few generation parameters and also seems built to assume un-editable chatting with no regens or swipes. I added a simple "reload_ctx" flag that can be passed on generation that will cause the prompt to be reloaded.	2023-05-23 06:16:54 -07:00
0cc4m	2e6cd4b025	OpenCL Token Generation Acceleration (#1459 ) * Move back to C++ for OpenCL * Refactor OpenCL code to work more like the CUDA code, add missing functions * Deduplicate dequant kernels * Add OpenCL compile options * Use compile args for preprocessing constants * Restore default platform + device selection by id behavior --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Henri Vasserman <henv@hot.ee>	2023-05-23 00:33:24 +03:00
Steward Garcia	7e4ea5beff	examples : add server example with REST API (#1443 ) * Added httplib support * Added readme for server example * fixed some bugs * Fix the build error on Macbook * changed json11 to nlohmann-json * removed some whitespaces * remove trailing whitespace * added support custom prompts and more functions * some corrections and added as cmake option	2023-05-21 20:51:18 +03:00
Stefan Sydow	7780e4f479	make : .PHONY clean (#1553 )	2023-05-21 17:03:44 +03:00
Georgi Gerganov	265db9834e	ggml : output 3d sizes in ggml_graph_dump_dot()	2023-05-21 11:56:23 +03:00
Georgi Gerganov	fab49c685e	ggml : update WASM SIMD	2023-05-20 20:00:41 +03:00
Zenix	b8ee340abe	feature : support blis and other blas implementation (#1536 ) * feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix: blas changes on ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-20 17:58:31 +03:00
Henri Vasserman	9ecb30f959	OpenCL: Fixes for older devices. (#1435 ) * Remove `constant` * Rewrite platform and device selection * Fix Q8_0	2023-05-20 17:57:39 +03:00
Juuso Alasuutari	29cf5596fe	llama : define magic numbers as integer constants (#1518 ) (#1520 ) The underlying representation of multibyte character literals is implementation-defined. This could, at least in principle, cause cross-build data export/import issues independent of endianness. Define magic numbers as integer literals to be on the safe side. Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>	2023-05-20 15:58:15 +03:00
Georgi Gerganov	3de84b2606	ggml : add ggml_clamp() (#1539 ) * ggml : add ggml_clamp() * ggml : indentation	2023-05-20 15:34:45 +03:00
Johannes Gäßler	affc76edfd	cuda : loading models directly into VRAM, norm calculation on GPU, broadcasting for ggml_mul (#1483 ) * Broadcasting for ggml_mul * CUDA kernel for ggml_mul, norms in VRAM * GPU weights not in RAM, direct loading with cuFile * fixup! GPU weights not in RAM, direct loading with cuFile * fixup! GPU weights not in RAM, direct loading with cuFile * define default model path once, sync path with readme (#1366) * ~7% faster Q5_1 AVX2 code (#1477) * convert.py: Support models which are stored in a single pytorch_model.bin (#1469) * Support models in a single pytorch_model.bin * Remove spurious line with typo * benchmark-matmul: Print the average of the test results (#1490) * Remove unused n_parts parameter (#1509) * Fixes #1511 lambda issue for w64devkit (mingw) (#1513) * Fix for w64devkit and mingw * make kv_f16 the default for api users (#1517) * minor : fix compile warnings * readme : adds WizardLM to the list of supported models (#1485) * main : make reverse prompt option act as a stop token in non-interactive mode (#1032) * Make reverse prompt option act as a stop token in non-interactive scenarios * Making requested review changes * Update gpt_params_parse and fix a merge error * Revert "Update gpt_params_parse and fix a merge error" This reverts commit `2bb2ff1748`. * Update gpt_params_parse and fix a merge error take 2 * examples : add persistent chat (#1495) * examples : add persistent chat * examples : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * tests : add missing header * ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508) * ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0 * llama : bump LLAMA_FILE_VERSION to 3 * cuda : update Q4 and Q8 dequantize kernels * ggml : fix AVX dot products * readme : update performance table + hot topics * ggml : fix scalar implementation of Q4_1 dot * llama : fix compile warnings in llama_set_state_data() * llama : fix name shadowing and C4146 (#1526) * Fix name shadowing and C4146 * Fix if macros not using defined when required * Update llama-util.h Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update llama-util.h Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Code style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix for mingw (#1462) * llama : add llama_init_backend() API (close #1527) * feature : add blis and other BLAS implementation support (#1502) * feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Revert "feature : add blis and other BLAS implementation support (#1502)" This reverts commit `07e9ace0f9`. * GPU weights not in RAM, direct loading with cuFile * llama : code style fixes + progress print fix * ggml : ggml_mul better broadcast support * cmake : workarounds for cufile when CMake version < 3.25 * gg rebase fixup * Loop in llama.cpp, fixed progress callback * Attempt clang-tidy fix * llama : fix vram size computation * Add forgotten fclose() --------- Co-authored-by: András Salamon <ott2@users.noreply.github.com> Co-authored-by: Ilya Kurdyukov <59548320+ilyakurdyukov@users.noreply.github.com> Co-authored-by: Tom Jobbins <784313+TheBloke@users.noreply.github.com> Co-authored-by: rankaiyx <rankaiyx@rankaiyx.com> Co-authored-by: Stephan Walter <stephan@walter.name> Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com> Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: David Kennedy <dakennedyd@gmail.com> Co-authored-by: Jason McCartney <jmac@theroot.org> Co-authored-by: Evan Jones <evan.q.jones@gmail.com> Co-authored-by: Maxime <672982+maximegmd@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Zenix <zenixls2@gmail.com>	2023-05-20 15:19:28 +03:00
Georgi Gerganov	ea600071cb	Revert "feature : add blis and other BLAS implementation support (#1502 )" This reverts commit `07e9ace0f9`.	2023-05-20 12:03:48 +03:00
Zenix	07e9ace0f9	feature : add blis and other BLAS implementation support (#1502 ) * feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-20 12:02:48 +03:00
Georgi Gerganov	ec2e10c444	llama : add llama_init_backend() API (close #1527 )	2023-05-20 11:06:37 +03:00
DannyDaemonic	d2c59b8ba4	Fix for mingw (#1462 )	2023-05-20 00:40:02 -07:00
Maxime	503db28849	llama : fix name shadowing and C4146 (#1526 ) * Fix name shadowing and C4146 * Fix if macros not using defined when required * Update llama-util.h Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update llama-util.h Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Code style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-20 10:22:37 +03:00
Georgi Gerganov	8a203f9fa1	llama : fix compile warnings in llama_set_state_data()	2023-05-20 10:14:43 +03:00
Georgi Gerganov	4fd3e29297	ggml : fix scalar implementation of Q4_1 dot	2023-05-20 10:13:19 +03:00

1 2 3 4 5 ...

617 commits