Commit graph

791 commits

Author SHA1 Message Date
digiwombat
03ea8f013a Fix for the regen issue. 2023-05-30 15:48:55 -04:00
Henri Vasserman
ffb06a345e
OpenLLaMA 3B support (#1588)
This adds support to llama.cpp to load the model.

Currently missing are changes that are required from convert.py to convert the model correctly. It needs some changes to start reading the JSON configuration for HF models instead of deriving the values by guessing.

Co-authored-by: FNsi <125447286+FNsi@users.noreply.github.com>
2023-05-30 21:24:22 +03:00
Georgi Gerganov
7552ac5863
ggml : sync cgraph import / export API 2023-05-29 19:31:44 +03:00
Georgi Gerganov
5d1830b99d
ggml : fix bug in ggml_alibi 2023-05-29 19:30:49 +03:00
DannyDaemonic
248367605e
Work around for recalculating logits in cached prompts (Fixes #1585) (#1609)
* Work around for recalculating logits in cached prompts
2023-05-29 05:13:40 -07:00
Jiří Podivín
0e730dd23b
Adding git in container package dependencies (#1621)
Git added to build packages for version information in docker image

Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2023-05-28 21:45:50 -07:00
Henri Vasserman
42cf4d8433
Merge branch 'master' into master 2023-05-29 01:05:19 +03:00
digiwombat
33b6957177 Fixed failing to return result on stopping token. 2023-05-28 16:45:05 -04:00
Johannes Gäßler
3b126f654f
LLAMA_DEBUG adds debug symbols (#1617) 2023-05-28 21:01:02 +02:00
digiwombat
6c58f64a3b --ctx_size flag to --ctx-size to match common.cpp 2023-05-28 14:17:36 -04:00
digiwombat
b38d41ef52 --memory_f32 flag to --memory-f32 to match common.cpp 2023-05-28 13:58:25 -04:00
digiwombat
655899db89 Add ignore_eos option to generation settings. 2023-05-28 13:49:45 -04:00
Kerfuffle
1b78ed2081
Only show -ngl option when relevant + other doc/arg handling updates (#1625)
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS)
2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible.
3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation
4. Update `main` and `server` examples documentation to use the new style dash separator argument format
5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility.
6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-28 11:48:57 -06:00
Vladimir Zorin
337aea1139
examples : add --alias option to gpt_params to set use friendly model name (#1614) 2023-05-28 20:14:24 +03:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
Randall Fitzgerald
2c9ee7a052
Apply suggestions from code review
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-28 09:34:11 -07:00
Henri Vasserman
74c6f36bf1
Editorconfig suggested fixes
delete whitespace
2023-05-28 19:19:34 +03:00
digiwombat
15ddc4903b Merge remote-tracking branch 'slyecho/server_refactor' 2023-05-28 11:09:32 -04:00
Henri Vasserman
7186d655a1
seed and gen params 2023-05-28 17:03:01 +03:00
digiwombat
7740301db9 Set unspecified generation settings back to default. (Notes below)
- If a given set of values coming along doesn't contain top_k for example, but did before, it would have stayed on the old value, I'm pretty sure. This fixes that.
- I don't know if this could be done a bit prettier by just setting llama.params = gpt_params(); since I'm not sure how the default constructor would react since there's not one defined.
2023-05-28 09:18:47 -04:00
digiwombat
dda915cac4 Added capturing the stopping word and sending it along with the final JSON.
Fixed an fprintf warning
Fixed a bug that broke streaming
Properly removed thread changing in json (only grabbed batch_size before)
2023-05-28 08:43:38 -04:00
digiwombat
2e5c5ee224 Changed JSON names to match the parameter name rather than the variable name. 2023-05-28 08:12:48 -04:00
digiwombat
23928f2887 Added generation_settings to final json object. 2023-05-28 08:04:05 -04:00
digiwombat
e8efd75492 Initial timeout code and expanded json return on completion.
Now passing server params to the help printer so they defaults are ouput.
Bad UTF while streaming now returns a replacement character (\uFFFD)
Changed some error language very slightly.
The JSON now returns extra values, only on `stop` for streaming requests.
New JSON Return Values:
  - tokens_predicted (added to streaming)
  - seed (just pulls it from params, might return -1)
  - prompt (Might be useful)
  - generated_text (Full generated response for streaming requests)
2023-05-28 07:44:31 -04:00
digiwombat
177868e68a Changed to params/args
Seed is now set by the CLI, defaults to -1 if not seed is set
Threads and batch size are now properly launch parameters.
2023-05-28 06:29:11 -04:00
Henri Vasserman
549291fe61
keep processed from the beginning
this means no limit to the input prompt,
it will just get reset again as normal
2023-05-28 12:11:41 +03:00
Randall Fitzgerald
df0e0d094c
Forgot to remove some testing code. 2023-05-28 12:11:14 +03:00
Randall Fitzgerald
f93fe36c5b
Add all generation parameters to server.cpp and allow resetting context
sever.cpp left out a few generation parameters and also seems built to assume un-editable chatting with no regens or swipes. I added a simple "reload_ctx" flag that can be passed on generation that will cause the prompt to be reloaded.
2023-05-28 12:11:10 +03:00
Henri Vasserman
51e09944ce
server rewrite
Remove unnecessary things and radically rewrite server
2023-05-28 12:10:16 +03:00
Randall Fitzgerald
1f40a789e6
Didn't see the already defined top_k var.
lol. Embarrassing. Don't edit code in the github web viewer, kids.
2023-05-27 17:10:09 -07:00
Randall Fitzgerald
e84b802161
Change top_k type.
Is my lack of knowledge of the code base showing? Yes it is.
2023-05-27 17:07:45 -07:00
Randall Fitzgerald
fdce8951ac
Merge branch 'ggerganov:master' into master 2023-05-27 19:57:37 -04:00
Randall Fitzgerald
d20f36b93c
Removed unnecessary last_prompt_token set
Added the one that was supposed to be there.
Apologies for the extra commits, I'm copy pasting from my editor to preserve the two-space indent formatting.
2023-05-27 16:46:05 -07:00
Randall Fitzgerald
36c86d794d
Automate Context resetting and minor fixes
Fixed top_k still not being set.
Removed an unnecessary loop.
2023-05-27 16:43:08 -07:00
apcameron
a6704643b6
ggml : add support for the RISCV architecture (#1616) 2023-05-27 23:03:25 +03:00
Randall Fitzgerald
66ed19d01f
Corrected dashes in the help lines.
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-27 11:51:21 -07:00
Randall Fitzgerald
48cb16a51a
Merge branch 'ggerganov:master' into master 2023-05-27 13:08:03 -04:00
Kerfuffle
0df7d63e5b
Include server in releases + other build system cleanups (#1610)
Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases.

Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default)

Fix issue where `vdot` binary wasn't removed when running `make clean`.

Fix compile warnings in `server` example.

Add `.hpp` files to trigger workflow (the server example has one).
2023-05-27 11:04:14 -06:00
Henri Vasserman
97c9b77c4f
Add documentation about CLBlast (#1604)
Installing, compiling and using.
2023-05-27 18:47:55 +03:00
Henri Vasserman
0ecb1bbbeb
[CI] Fix openblas (#1613)
* Fix OpenBLAS build

* Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.
2023-05-27 17:24:06 +03:00
Georgi Gerganov
93618031c7
ggml : add ggml_tensor_overhead() 2023-05-27 16:19:56 +03:00
Henri Vasserman
83c54e6da5
[CI] CLBlast: Fix directory name (#1606) 2023-05-27 14:18:25 +02:00
Georgi Gerganov
bdbda1b17a
ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name()) 2023-05-27 12:23:16 +03:00
Kerfuffle
66874d4fbc
Some improvements to loading the session with --prompt-cache (#1550)
Improvements to loading the session with `--prompt-cache` in the `main` example.

1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt.
2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.
2023-05-25 20:18:01 -06:00
Johannes Gäßler
1fcdcc28b1
cuda : performance optimizations (#1530)
* xor hack

* block y dim

* loop unrolling

* Fixed cmake LLAMA_CUDA_BY option

* Removed hipblas compatibility code

* Define GGML_CUDA_DMMV_BLOCK_Y if not defined

* Fewer iters, more ops per iter

* Renamed DMMV X/Y compilation options
2023-05-26 00:07:29 +03:00
Randall Fitzgerald
c2b55cc917
Added LoRA Loading
Someone please test this. I have no LoRAs available to test. The code is direct from the base repo so it should be fine.
2023-05-25 12:53:05 -07:00
Henri Vasserman
ac7876ac20
Update CLBlast to 1.6.0 (#1580)
* Update CLBlast to 1.6.0
2023-05-24 10:30:09 +03:00
Evan Jones
c31bbe934b
readme : add docs for chat-persistent.sh (#1568)
* readme : add docs for chat-persistent.sh

* Update README.md
2023-05-24 09:24:01 +03:00
Senemu
1359b6aba5
chat-persistent.sh : use bracket expressions in grep (#1564) 2023-05-24 09:16:22 +03:00