Commit graph

1768 commits

Author SHA1 Message Date
Concedo
469d70be45 add support for precompiled binaries, used as a fallback 2023-08-15 13:49:05 +08:00
Concedo
9483288e03 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
2023-08-12 16:04:11 +08:00
byte-6174
b19edd54d5
Adding support for llama2.c models (#2559) 2023-08-12 01:17:25 +02:00
Equim
53dc399472
server: fixed wrong variable name in timing json (#2579)
* server: fixed wrong variable name in timing json

* remove redunct entry
2023-08-12 00:35:14 +02:00
Concedo
dae9dffa6a rename koboldcpp.dll to koboldcpp_default.dll 2023-08-11 14:54:27 +08:00
DannyDaemonic
9ca4abed89
Handle ENABLE_VIRTUAL_TERMINAL_PROCESSING more gracefully on earlier versions of Windows. 2023-08-10 13:11:36 -07:00
Christian Demsar
e59fcb2bc1
Add --n-predict -2 for stopping generation on full context (#2565) 2023-08-10 16:28:27 +02:00
Concedo
886f4eed79 updated lite, up ver, remove bell 2023-08-10 22:01:33 +08:00
Martin Krasser
1638757767
Fix grammar-based sampling issue in server (#2566) 2023-08-10 13:16:38 +03:00
Concedo
c5f5209d37 globalize args 2023-08-10 16:30:02 +08:00
Sam Spilsbury
916a9acdd0
ggml-alloc: Don't try to re-use buffers of external tensors (#2562)
* ggml-alloc: Don't try to re-use buffers of external tensors

They might be weights that came from another context, so we
have no control over them (and they might be re-used elsewhere
so writing to them would be a bad idea).

* ggml-alloc: >= when checking for out-of-bounds

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-08-09 22:47:42 +02:00
grahameth
ea04a4ca19
add log_callback to llama_context_params for custom logging. (#2234)
* add log_callback to llama_context_params for custom logging.

* Fix macro expansion on gcc

* Add struct llama_state for global variables and move log_callback there

* Turn log level into enum and some minor changes.

* Remove model_for_logging parameter (not needed anymore)

* Convert remaining fprintf(stderr, ...) calls to use new macros.

* Fix enum and initialize g_state

* Fix log calls after merge

* Fix missing static

* Add back all the new lines in the logging strings

* Add comment for llama_log_callback and replace remaining printf calls

---------

Co-authored-by: grahameth <->
Co-authored-by: Helmut <helmut.buhler@inf.h-brs.de>
2023-08-09 22:46:40 +02:00
Concedo
a07e6dd3ad revert cuda changes as they are bugggy 2023-08-09 22:36:41 +08:00
Concedo
f8376c7e61 up ver, fixed compile (+1 squashed commits)
Squashed commits:

[ca51aa9e] up ver
2023-08-09 21:31:24 +08:00
Concedo
ba09f1c807 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	ggml-cuda.cu
2023-08-09 21:18:34 +08:00
Concedo
3a7853d259 handle stablecode-completion-alpha-3b 2023-08-09 21:07:57 +08:00
Johannes Gäßler
25d43e0eb5
CUDA: tuned mul_mat_q kernels (#2546) 2023-08-09 09:42:34 +02:00
Concedo
90058d96b0 sleep longer before exit 2023-08-09 15:28:07 +08:00
Concedo
19cf2a8663 add idle field and up ver 2023-08-09 12:42:59 +08:00
Concedo
4b8a354895 cudatoolkit version 2023-08-09 12:25:21 +08:00
Concedo
159ad9269d up ver, set the cuda pool malloc lookahead back to 5% instead of 2% (+1 squashed commits)
Squashed commits:

[e0f65278] up ver, set the cuda pool malloc lookahead back to 5% instead of 2%
2023-08-09 12:06:42 +08:00
Concedo
926d90fbab Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
2023-08-09 01:09:04 +08:00
Concedo
793cfd136c fixed 70B detection again, try fix horde issues, fixed lite unicode issue, fixed cmake for cuda 2023-08-09 01:05:00 +08:00
Martin Krasser
f5bfea0580
Allow passing grammar to completion endpoint (#2532)
* Allow passing grammar to completion endpoint
2023-08-08 16:29:19 +03:00
Johannes Gäßler
acfc5478ff
CUDA: tighter VRAM scratch size for 65b/70b (#2551) 2023-08-08 14:38:16 +02:00
chaihahaha
7ed8d1fe7f
llm.vim : multiline autocompletion, get rid of "^@" (#2543) 2023-08-08 15:07:02 +03:00
Georgi Gerganov
e7f94d6fdc
vim : bring back simple llm.vim example 2023-08-08 15:06:18 +03:00
AustinMroz
2d7baaf50f
vim : streaming and more (#2495)
* Update Vim plugin

* Remove getbufoneline usage, Add input bind example.

getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.

An additional example that explains how to add a keybind that works in
insert mode was added.
2023-08-08 14:44:48 +03:00
klosax
f3c3b4b167
Add --rope-scale parameter (#2544)
* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling
2023-08-07 19:07:19 +02:00
Concedo
3554080502 fixed blasbatchmul multiplier 2023-08-08 00:41:02 +08:00
Concedo
28ad80b6e4 Merge branch 'master' into concedo_experimental 2023-08-08 00:34:10 +08:00
Concedo
3c7d938d95 update lite, resize scratch buffers for blasbatch 2048 2023-08-08 00:32:51 +08:00
Georgi Gerganov
93356bdb7a
ggml : mul mat tweaks (#2372)
* ggml : mul mat wip

ggml-ci

* ggml : alternative thread distribution for mul_mat

ggml-ci

* ggml : mul_mat block tiling attempt

* ggml : mul_mat threads yield

ggml-ci
2023-08-07 14:25:58 +03:00
Georgi Gerganov
60baff7c85
ggml : pad result of ggml_nbytes() 2023-08-07 14:24:42 +03:00
Georgi Gerganov
9082b5dfbf
ggml : change params pointer (style change) (#2539)
ggml-ci
2023-08-07 13:55:18 +03:00
Georgi Gerganov
99d29c0094
ggml : sync (custom ops) (#2537)
ggml-ci
2023-08-07 13:20:09 +03:00
Concedo
9133e456d2 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	build.zig
2023-08-07 17:33:42 +08:00
Concedo
cae6a847ad cuda free only for non mmq (+2 squashed commit)
Squashed commit:

[3aca763a] only cuda free for non mmq

[e69a8c9f] revert to pool alloc to try again
2023-08-07 17:12:05 +08:00
Johannes Gäßler
3d9a551816
Fixed mmap prefetch for GPU offloading (#2529) 2023-08-07 10:09:40 +02:00
Georgi Gerganov
f6f9896ac3
metal : fix out-of-bounds access + inc concurrency nodes (#2416)
* metal : fix out-of-bounds access + style changes

* metal : increase concurrency nodes to 2*GGML_MAX_NODES
2023-08-07 10:52:57 +03:00
Concedo
9f16a4c4ef switch to upstream implementation of pool malloc 2023-08-07 15:16:37 +08:00
GiviMAD
34a14b28ff
[Makefile] Move ARM CFLAGS before compilation (#2536) 2023-08-07 09:21:46 +03:00
Henri Vasserman
7297128db8
[Zig] Rewrite build for Zig 0.11 (#2514)
* zig build fixes

* Disable LTO on Windows.
2023-08-07 08:35:53 +03:00
Concedo
6659652c9f lower actual temp used when temp=0 2023-08-07 11:05:06 +08:00
Concedo
0e41b94f40 improve detection for 70B. 2023-08-07 10:43:06 +08:00
Concedo
fb44d72a78 Merge remote-tracking branch 'johannes/cuda-fix-mmap-prefetch' into concedo_experimental 2023-08-07 10:17:43 +08:00
Concedo
559c0e2d1f updated lite again, fix for wi 2023-08-07 10:15:20 +08:00
JohannesGaessler
d9024df759 Fixed mmap prefetch for GPU offloading 2023-08-06 20:28:16 +02:00
Concedo
d442888626 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
2023-08-06 22:47:33 +08:00
Concedo
198cc826fc updated lite 2023-08-06 22:19:18 +08:00