Commit graph

1848 commits

Author SHA1 Message Date
Concedo
159ad9269d up ver, set the cuda pool malloc lookahead back to 5% instead of 2% (+1 squashed commits)
Squashed commits:

[e0f65278] up ver, set the cuda pool malloc lookahead back to 5% instead of 2%
2023-08-09 12:06:42 +08:00
Concedo
926d90fbab Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
2023-08-09 01:09:04 +08:00
Concedo
793cfd136c fixed 70B detection again, try fix horde issues, fixed lite unicode issue, fixed cmake for cuda 2023-08-09 01:05:00 +08:00
Martin Krasser
f5bfea0580
Allow passing grammar to completion endpoint (#2532)
* Allow passing grammar to completion endpoint
2023-08-08 16:29:19 +03:00
Johannes Gäßler
acfc5478ff
CUDA: tighter VRAM scratch size for 65b/70b (#2551) 2023-08-08 14:38:16 +02:00
chaihahaha
7ed8d1fe7f
llm.vim : multiline autocompletion, get rid of "^@" (#2543) 2023-08-08 15:07:02 +03:00
Georgi Gerganov
e7f94d6fdc
vim : bring back simple llm.vim example 2023-08-08 15:06:18 +03:00
AustinMroz
2d7baaf50f
vim : streaming and more (#2495)
* Update Vim plugin

* Remove getbufoneline usage, Add input bind example.

getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.

An additional example that explains how to add a keybind that works in
insert mode was added.
2023-08-08 14:44:48 +03:00
klosax
f3c3b4b167
Add --rope-scale parameter (#2544)
* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling
2023-08-07 19:07:19 +02:00
Concedo
3554080502 fixed blasbatchmul multiplier 2023-08-08 00:41:02 +08:00
Concedo
28ad80b6e4 Merge branch 'master' into concedo_experimental 2023-08-08 00:34:10 +08:00
Concedo
3c7d938d95 update lite, resize scratch buffers for blasbatch 2048 2023-08-08 00:32:51 +08:00
Georgi Gerganov
93356bdb7a
ggml : mul mat tweaks (#2372)
* ggml : mul mat wip

ggml-ci

* ggml : alternative thread distribution for mul_mat

ggml-ci

* ggml : mul_mat block tiling attempt

* ggml : mul_mat threads yield

ggml-ci
2023-08-07 14:25:58 +03:00
Georgi Gerganov
60baff7c85
ggml : pad result of ggml_nbytes() 2023-08-07 14:24:42 +03:00
Georgi Gerganov
9082b5dfbf
ggml : change params pointer (style change) (#2539)
ggml-ci
2023-08-07 13:55:18 +03:00
Georgi Gerganov
99d29c0094
ggml : sync (custom ops) (#2537)
ggml-ci
2023-08-07 13:20:09 +03:00
Concedo
9133e456d2 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	build.zig
2023-08-07 17:33:42 +08:00
Concedo
cae6a847ad cuda free only for non mmq (+2 squashed commit)
Squashed commit:

[3aca763a] only cuda free for non mmq

[e69a8c9f] revert to pool alloc to try again
2023-08-07 17:12:05 +08:00
Johannes Gäßler
3d9a551816
Fixed mmap prefetch for GPU offloading (#2529) 2023-08-07 10:09:40 +02:00
Georgi Gerganov
f6f9896ac3
metal : fix out-of-bounds access + inc concurrency nodes (#2416)
* metal : fix out-of-bounds access + style changes

* metal : increase concurrency nodes to 2*GGML_MAX_NODES
2023-08-07 10:52:57 +03:00
Concedo
9f16a4c4ef switch to upstream implementation of pool malloc 2023-08-07 15:16:37 +08:00
GiviMAD
34a14b28ff
[Makefile] Move ARM CFLAGS before compilation (#2536) 2023-08-07 09:21:46 +03:00
Henri Vasserman
7297128db8
[Zig] Rewrite build for Zig 0.11 (#2514)
* zig build fixes

* Disable LTO on Windows.
2023-08-07 08:35:53 +03:00
Concedo
6659652c9f lower actual temp used when temp=0 2023-08-07 11:05:06 +08:00
Concedo
0e41b94f40 improve detection for 70B. 2023-08-07 10:43:06 +08:00
Concedo
fb44d72a78 Merge remote-tracking branch 'johannes/cuda-fix-mmap-prefetch' into concedo_experimental 2023-08-07 10:17:43 +08:00
Concedo
559c0e2d1f updated lite again, fix for wi 2023-08-07 10:15:20 +08:00
JohannesGaessler
d9024df759 Fixed mmap prefetch for GPU offloading 2023-08-06 20:28:16 +02:00
Concedo
d442888626 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
2023-08-06 22:47:33 +08:00
Concedo
198cc826fc updated lite 2023-08-06 22:19:18 +08:00
Concedo
e99416cdfe blasbatchsize 2023-08-06 17:47:59 +08:00
Concedo
bcfdd0e662 fixed bbs -1 and allow bbs = 2048 2023-08-06 17:47:05 +08:00
DannyDaemonic
86c3219895
console : fix issue related to Windows 11 PowerShell console mode persistence (#2521) 2023-08-06 09:49:34 +03:00
Keiichi Tabata
2e8265ae17
convert.py : add missing abstract methods for quantized data (#2491) 2023-08-06 09:34:05 +03:00
Johannes Gäßler
f514d1b306
CUDA: faster k-quant mul_mat_q kernels (#2525) 2023-08-05 18:20:44 +02:00
Jonas Wunderlich
332311234a
fix firefox autoscroll (#2519) 2023-08-04 22:16:11 +02:00
Cebtenzzre
182af739c4
server: regenerate completion.js.hpp (#2515) 2023-08-04 21:00:57 +02:00
Cebtenzzre
4329d1acb0
CUDA: use min compute capability of GPUs actually used (#2506) 2023-08-04 17:35:22 +02:00
Cebtenzzre
02f9d96a86
CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)
Fixes #2503
2023-08-04 17:34:32 +02:00
DannyDaemonic
3498588e0f
Add --simple-io option for subprocesses and break out console.h and cpp (#1558) 2023-08-04 08:20:12 -07:00
Concedo
18bb0ab127 up ver, support 16k ctx 2023-08-04 21:47:17 +08:00
Stephen Nichols
5f631c2679
Fixing race condition in server and partial stream handling in frontend. (#2391)
* Fixing race condition in server.cpp and partial stream handling in completion.js

* Reverting assert edits.

* Adding newline to eof
2023-08-04 13:37:24 +02:00
l3utterfly
415e99fec2
Stream save llama context data to file instead of allocating entire buffer upfront (#2488)
* added stream saving context data to file to avoid allocating unnecessary amounts of memory

* generalised copying state data to file or buffer

* added comments explaining how copy_state_data works

* fixed trailing whitespaces

* fixed save load state example

* updated save load state to use public function in llama.cpp

* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function

* fixed function declaration order

* restored save load state example

* fixed whitepace

* removed unused llama-util.h include

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* Apply code review suggestions

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-08-04 13:29:52 +02:00
Borislav Stanimirov
ff966e7ca6
build : fix several cast and printf warnings (#2499) 2023-08-04 13:07:21 +03:00
Concedo
f0764c6cfb fix indentation, increase server thread count 2023-08-04 10:29:56 +08:00
Concedo
d09e54aad1 Merge remote-tracking branch 'duncan/api-stream-double-write-fix' into concedo_experimental 2023-08-04 10:22:53 +08:00
duncannah
63ec711a70
fix: still send full result after streaming 2023-08-03 14:35:43 +02:00
Concedo
4709545c06 Merge remote-tracking branch 'duncan/api-stream-double-write-fix' into concedo_experimental 2023-08-03 12:52:43 +08:00
Concedo
ba2040d1df compile fix for ARM NEON 2023-08-03 12:52:06 +08:00
Concedo
3fa6befdaf increase max free blocks 2023-08-03 10:50:16 +08:00