Commit graph

685 commits

Author SHA1 Message Date
Henri Vasserman
42cf4d8433
Merge branch 'master' into master 2023-05-29 01:05:19 +03:00
digiwombat
33b6957177 Fixed failing to return result on stopping token. 2023-05-28 16:45:05 -04:00
Johannes Gäßler
3b126f654f
LLAMA_DEBUG adds debug symbols (#1617) 2023-05-28 21:01:02 +02:00
digiwombat
6c58f64a3b --ctx_size flag to --ctx-size to match common.cpp 2023-05-28 14:17:36 -04:00
digiwombat
b38d41ef52 --memory_f32 flag to --memory-f32 to match common.cpp 2023-05-28 13:58:25 -04:00
digiwombat
655899db89 Add ignore_eos option to generation settings. 2023-05-28 13:49:45 -04:00
Kerfuffle
1b78ed2081
Only show -ngl option when relevant + other doc/arg handling updates (#1625)
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS)
2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible.
3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation
4. Update `main` and `server` examples documentation to use the new style dash separator argument format
5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility.
6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-28 11:48:57 -06:00
Vladimir Zorin
337aea1139
examples : add --alias option to gpt_params to set use friendly model name (#1614) 2023-05-28 20:14:24 +03:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
Randall Fitzgerald
2c9ee7a052
Apply suggestions from code review
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-28 09:34:11 -07:00
Henri Vasserman
74c6f36bf1
Editorconfig suggested fixes
delete whitespace
2023-05-28 19:19:34 +03:00
digiwombat
15ddc4903b Merge remote-tracking branch 'slyecho/server_refactor' 2023-05-28 11:09:32 -04:00
Henri Vasserman
7186d655a1
seed and gen params 2023-05-28 17:03:01 +03:00
digiwombat
7740301db9 Set unspecified generation settings back to default. (Notes below)
- If a given set of values coming along doesn't contain top_k for example, but did before, it would have stayed on the old value, I'm pretty sure. This fixes that.
- I don't know if this could be done a bit prettier by just setting llama.params = gpt_params(); since I'm not sure how the default constructor would react since there's not one defined.
2023-05-28 09:18:47 -04:00
digiwombat
dda915cac4 Added capturing the stopping word and sending it along with the final JSON.
Fixed an fprintf warning
Fixed a bug that broke streaming
Properly removed thread changing in json (only grabbed batch_size before)
2023-05-28 08:43:38 -04:00
digiwombat
2e5c5ee224 Changed JSON names to match the parameter name rather than the variable name. 2023-05-28 08:12:48 -04:00
digiwombat
23928f2887 Added generation_settings to final json object. 2023-05-28 08:04:05 -04:00
digiwombat
e8efd75492 Initial timeout code and expanded json return on completion.
Now passing server params to the help printer so they defaults are ouput.
Bad UTF while streaming now returns a replacement character (\uFFFD)
Changed some error language very slightly.
The JSON now returns extra values, only on `stop` for streaming requests.
New JSON Return Values:
  - tokens_predicted (added to streaming)
  - seed (just pulls it from params, might return -1)
  - prompt (Might be useful)
  - generated_text (Full generated response for streaming requests)
2023-05-28 07:44:31 -04:00
digiwombat
177868e68a Changed to params/args
Seed is now set by the CLI, defaults to -1 if not seed is set
Threads and batch size are now properly launch parameters.
2023-05-28 06:29:11 -04:00
Henri Vasserman
549291fe61
keep processed from the beginning
this means no limit to the input prompt,
it will just get reset again as normal
2023-05-28 12:11:41 +03:00
Randall Fitzgerald
df0e0d094c
Forgot to remove some testing code. 2023-05-28 12:11:14 +03:00
Randall Fitzgerald
f93fe36c5b
Add all generation parameters to server.cpp and allow resetting context
sever.cpp left out a few generation parameters and also seems built to assume un-editable chatting with no regens or swipes. I added a simple "reload_ctx" flag that can be passed on generation that will cause the prompt to be reloaded.
2023-05-28 12:11:10 +03:00
Henri Vasserman
51e09944ce
server rewrite
Remove unnecessary things and radically rewrite server
2023-05-28 12:10:16 +03:00
Randall Fitzgerald
1f40a789e6
Didn't see the already defined top_k var.
lol. Embarrassing. Don't edit code in the github web viewer, kids.
2023-05-27 17:10:09 -07:00
Randall Fitzgerald
e84b802161
Change top_k type.
Is my lack of knowledge of the code base showing? Yes it is.
2023-05-27 17:07:45 -07:00
Randall Fitzgerald
fdce8951ac
Merge branch 'ggerganov:master' into master 2023-05-27 19:57:37 -04:00
Randall Fitzgerald
d20f36b93c
Removed unnecessary last_prompt_token set
Added the one that was supposed to be there.
Apologies for the extra commits, I'm copy pasting from my editor to preserve the two-space indent formatting.
2023-05-27 16:46:05 -07:00
Randall Fitzgerald
36c86d794d
Automate Context resetting and minor fixes
Fixed top_k still not being set.
Removed an unnecessary loop.
2023-05-27 16:43:08 -07:00
apcameron
a6704643b6
ggml : add support for the RISCV architecture (#1616) 2023-05-27 23:03:25 +03:00
Randall Fitzgerald
66ed19d01f
Corrected dashes in the help lines.
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-27 11:51:21 -07:00
Randall Fitzgerald
48cb16a51a
Merge branch 'ggerganov:master' into master 2023-05-27 13:08:03 -04:00
Kerfuffle
0df7d63e5b
Include server in releases + other build system cleanups (#1610)
Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases.

Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default)

Fix issue where `vdot` binary wasn't removed when running `make clean`.

Fix compile warnings in `server` example.

Add `.hpp` files to trigger workflow (the server example has one).
2023-05-27 11:04:14 -06:00
Henri Vasserman
97c9b77c4f
Add documentation about CLBlast (#1604)
Installing, compiling and using.
2023-05-27 18:47:55 +03:00
Henri Vasserman
0ecb1bbbeb
[CI] Fix openblas (#1613)
* Fix OpenBLAS build

* Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.
2023-05-27 17:24:06 +03:00
Georgi Gerganov
93618031c7
ggml : add ggml_tensor_overhead() 2023-05-27 16:19:56 +03:00
Henri Vasserman
83c54e6da5
[CI] CLBlast: Fix directory name (#1606) 2023-05-27 14:18:25 +02:00
Georgi Gerganov
bdbda1b17a
ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name()) 2023-05-27 12:23:16 +03:00
Kerfuffle
66874d4fbc
Some improvements to loading the session with --prompt-cache (#1550)
Improvements to loading the session with `--prompt-cache` in the `main` example.

1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt.
2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.
2023-05-25 20:18:01 -06:00
Johannes Gäßler
1fcdcc28b1
cuda : performance optimizations (#1530)
* xor hack

* block y dim

* loop unrolling

* Fixed cmake LLAMA_CUDA_BY option

* Removed hipblas compatibility code

* Define GGML_CUDA_DMMV_BLOCK_Y if not defined

* Fewer iters, more ops per iter

* Renamed DMMV X/Y compilation options
2023-05-26 00:07:29 +03:00
Randall Fitzgerald
c2b55cc917
Added LoRA Loading
Someone please test this. I have no LoRAs available to test. The code is direct from the base repo so it should be fine.
2023-05-25 12:53:05 -07:00
Henri Vasserman
ac7876ac20
Update CLBlast to 1.6.0 (#1580)
* Update CLBlast to 1.6.0
2023-05-24 10:30:09 +03:00
Evan Jones
c31bbe934b
readme : add docs for chat-persistent.sh (#1568)
* readme : add docs for chat-persistent.sh

* Update README.md
2023-05-24 09:24:01 +03:00
Senemu
1359b6aba5
chat-persistent.sh : use bracket expressions in grep (#1564) 2023-05-24 09:16:22 +03:00
Randall Fitzgerald
8d7b28c28d
Fixed some types in the params.
Quickly copy pasted without fixing them up. Whoopsies.
2023-05-23 13:35:12 -07:00
Randall Fitzgerald
3537ad1821
Merge branch 'ggerganov:master' into master 2023-05-23 13:31:14 -04:00
Maarten ter Huurne
7d873811f3
Fix handling of "invalid property" when creating OpenCL command queue (#1565)
The `clCreateCommandQueue()` function will return the code
`CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties,
not `CL_INVALID_PROPERTY` as the original code was checking for.
2023-05-23 19:01:15 +03:00
Randall Fitzgerald
add5f1bdc9
Update examples/server/server.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-05-23 07:34:41 -07:00
Randall Fitzgerald
421e66b330
Update examples/server/server.cpp
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-05-23 07:34:36 -07:00
Randall Fitzgerald
2071d730fa
Forgot to remove some testing code. 2023-05-23 06:22:30 -07:00