Concedo
446e42a8c6
change dmmv block size
2023-05-31 21:40:12 +08:00
Concedo
077ee4e989
Revert "Revert "opencl : no need to allocate cl_mem on heap ( #1612 )""
...
This reverts commit 4afa38e744
.
2023-05-31 18:00:52 +08:00
Concedo
50c85bea4c
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-05-31 17:53:14 +08:00
Concedo
32dada5e5f
updated lite
2023-05-31 17:52:09 +08:00
0cc4m
5e1eecfe12
Adapt to #1612 cl_mem malloc changes
2023-05-31 07:07:47 +02:00
0cc4m
49aaf08387
Merge remote-tracking branch 'origin/master' into opencl-dev
2023-05-31 06:58:51 +02:00
Concedo
a5a85d68c6
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# llama.cpp
2023-05-31 10:51:54 +08:00
Concedo
85c9f7df41
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-05-31 10:20:32 +08:00
Concedo
4afa38e744
Revert "opencl : no need to allocate cl_mem on heap ( #1612 )"
...
This reverts commit bb051d9723
.
2023-05-31 10:20:23 +08:00
Henri Vasserman
ffb06a345e
OpenLLaMA 3B support ( #1588 )
...
This adds support to llama.cpp to load the model.
Currently missing are changes that are required from convert.py to convert the model correctly. It needs some changes to start reading the JSON configuration for HF models instead of deriving the values by guessing.
Co-authored-by: FNsi <125447286+FNsi@users.noreply.github.com>
2023-05-30 21:24:22 +03:00
0cc4m
ac6b49ed45
Reduce queueing overhead for contiguous tensors by using single mul kernel call
2023-05-30 18:49:53 +02:00
Concedo
56456797f4
Merge branch 'master' into concedo_experimental
2023-05-30 22:15:58 +08:00
Georgi Gerganov
7552ac5863
ggml : sync cgraph import / export API
2023-05-29 19:31:44 +03:00
Georgi Gerganov
5d1830b99d
ggml : fix bug in ggml_alibi
2023-05-29 19:30:49 +03:00
Concedo
ea336bfa33
rwkv eos
2023-05-29 22:40:27 +08:00
Concedo
6b3373cb81
revert bad fix
2023-05-29 22:06:12 +08:00
DannyDaemonic
248367605e
Work around for recalculating logits in cached prompts ( Fixes #1585 ) ( #1609 )
...
* Work around for recalculating logits in cached prompts
2023-05-29 05:13:40 -07:00
Concedo
ef16d09a51
fix for older gcc, updated lite
2023-05-29 18:54:15 +08:00
Concedo
3a73ebe8d2
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/full.Dockerfile
# .devops/main.Dockerfile
# Makefile
2023-05-29 16:47:32 +08:00
Concedo
254a9ff12c
Merge commit ' ebc5d0651a
' into concedo_experimental
...
# Conflicts:
# ggml-opencl.cpp
2023-05-29 16:26:24 +08:00
Concedo
30ff1133f5
allow users to rename models for use in horde
2023-05-29 16:01:05 +08:00
Concedo
97b39f875c
fixed fstat64 build error on mac
2023-05-29 15:50:07 +08:00
Jiří Podivín
0e730dd23b
Adding git in container package dependencies ( #1621 )
...
Git added to build packages for version information in docker image
Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2023-05-28 21:45:50 -07:00
Johannes Gäßler
3b126f654f
LLAMA_DEBUG adds debug symbols ( #1617 )
2023-05-28 21:01:02 +02:00
Kerfuffle
1b78ed2081
Only show -ngl option when relevant + other doc/arg handling updates ( #1625 )
...
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS)
2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible.
3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation
4. Update `main` and `server` examples documentation to use the new style dash separator argument format
5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility.
6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-28 11:48:57 -06:00
Vladimir Zorin
337aea1139
examples : add --alias option to gpt_params to set use friendly model name ( #1614 )
2023-05-28 20:14:24 +03:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap ( #1612 )
2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported ( #1611 )
...
* Use strstr to check if fp16 supported
* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
Concedo
28f1196f65
adjust default rep pen range
2023-05-28 19:36:21 +08:00
Concedo
7d159bacd7
updated kobold lite
2023-05-28 11:23:20 +08:00
apcameron
a6704643b6
ggml : add support for the RISCV architecture ( #1616 )
2023-05-27 23:03:25 +03:00
Concedo
dcc426e2de
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
2023-05-28 01:08:39 +08:00
Kerfuffle
0df7d63e5b
Include server in releases + other build system cleanups ( #1610 )
...
Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases.
Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default)
Fix issue where `vdot` binary wasn't removed when running `make clean`.
Fix compile warnings in `server` example.
Add `.hpp` files to trigger workflow (the server example has one).
2023-05-27 11:04:14 -06:00
Concedo
5d9f5b28a6
rwkv integration completed
2023-05-28 00:48:56 +08:00
Henri Vasserman
97c9b77c4f
Add documentation about CLBlast ( #1604 )
...
Installing, compiling and using.
2023-05-27 18:47:55 +03:00
Concedo
55e0fbf024
wip integrating new rwkv
2023-05-27 22:45:28 +08:00
Henri Vasserman
0ecb1bbbeb
[CI] Fix openblas ( #1613 )
...
* Fix OpenBLAS build
* Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.
2023-05-27 17:24:06 +03:00
Georgi Gerganov
93618031c7
ggml : add ggml_tensor_overhead()
2023-05-27 16:19:56 +03:00
Henri Vasserman
83c54e6da5
[CI] CLBlast: Fix directory name ( #1606 )
2023-05-27 14:18:25 +02:00
Concedo
fe63bfdb0f
Revert "allow 2048 blasbatchsize"
...
This reverts commit 94dc5c2324
.
2023-05-27 18:13:27 +08:00
0cc4m
97c5cca4e5
OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel
2023-05-27 12:00:56 +02:00
Concedo
94dc5c2324
allow 2048 blasbatchsize
2023-05-27 17:47:18 +08:00
Concedo
92a0d77712
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
2023-05-27 17:44:14 +08:00
Concedo
abfdfb702e
added top_a sampler
2023-05-27 17:32:37 +08:00
Georgi Gerganov
bdbda1b17a
ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name())
2023-05-27 12:23:16 +03:00
0cc4m
ebc5d0651a
Use events instead of clFinish, where possible
2023-05-27 10:03:35 +02:00
Concedo
01a0f206df
added support for starcoder, which is basically gpt2
2023-05-27 13:35:40 +08:00
Concedo
6d7749c98f
no difference
2023-05-27 12:42:19 +08:00
Concedo
bd4fe936f5
cleanup sampling code
2023-05-27 11:58:39 +08:00
Concedo
3c8f404243
integrated token probability viewer in debugmode
2023-05-26 16:40:26 +08:00