Concedo
d442888626
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
2023-08-06 22:47:33 +08:00
Concedo
198cc826fc
updated lite
2023-08-06 22:19:18 +08:00
Concedo
e99416cdfe
blasbatchsize
2023-08-06 17:47:59 +08:00
Concedo
bcfdd0e662
fixed bbs -1 and allow bbs = 2048
2023-08-06 17:47:05 +08:00
DannyDaemonic
86c3219895
console : fix issue related to Windows 11 PowerShell console mode persistence ( #2521 )
2023-08-06 09:49:34 +03:00
Keiichi Tabata
2e8265ae17
convert.py : add missing abstract methods for quantized data ( #2491 )
2023-08-06 09:34:05 +03:00
Johannes Gäßler
f514d1b306
CUDA: faster k-quant mul_mat_q kernels ( #2525 )
2023-08-05 18:20:44 +02:00
Jonas Wunderlich
332311234a
fix firefox autoscroll ( #2519 )
2023-08-04 22:16:11 +02:00
Cebtenzzre
182af739c4
server: regenerate completion.js.hpp ( #2515 )
2023-08-04 21:00:57 +02:00
Cebtenzzre
4329d1acb0
CUDA: use min compute capability of GPUs actually used ( #2506 )
2023-08-04 17:35:22 +02:00
Cebtenzzre
02f9d96a86
CUDA: check if event is NULL before cudaStreamWaitEvent ( #2505 )
...
Fixes #2503
2023-08-04 17:34:32 +02:00
DannyDaemonic
3498588e0f
Add --simple-io option for subprocesses and break out console.h and cpp ( #1558 )
2023-08-04 08:20:12 -07:00
Concedo
18bb0ab127
up ver, support 16k ctx
2023-08-04 21:47:17 +08:00
Stephen Nichols
5f631c2679
Fixing race condition in server and partial stream handling in frontend. ( #2391 )
...
* Fixing race condition in server.cpp and partial stream handling in completion.js
* Reverting assert edits.
* Adding newline to eof
2023-08-04 13:37:24 +02:00
l3utterfly
415e99fec2
Stream save llama context data to file instead of allocating entire buffer upfront ( #2488 )
...
* added stream saving context data to file to avoid allocating unnecessary amounts of memory
* generalised copying state data to file or buffer
* added comments explaining how copy_state_data works
* fixed trailing whitespaces
* fixed save load state example
* updated save load state to use public function in llama.cpp
* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function
* fixed function declaration order
* restored save load state example
* fixed whitepace
* removed unused llama-util.h include
* Apply suggestions from code review
Co-authored-by: slaren <slarengh@gmail.com>
* Apply code review suggestions
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2023-08-04 13:29:52 +02:00
Borislav Stanimirov
ff966e7ca6
build : fix several cast and printf warnings ( #2499 )
2023-08-04 13:07:21 +03:00
Concedo
f0764c6cfb
fix indentation, increase server thread count
2023-08-04 10:29:56 +08:00
Concedo
d09e54aad1
Merge remote-tracking branch 'duncan/api-stream-double-write-fix' into concedo_experimental
2023-08-04 10:22:53 +08:00
duncannah
63ec711a70
fix: still send full result after streaming
2023-08-03 14:35:43 +02:00
Concedo
4709545c06
Merge remote-tracking branch 'duncan/api-stream-double-write-fix' into concedo_experimental
2023-08-03 12:52:43 +08:00
Concedo
ba2040d1df
compile fix for ARM NEON
2023-08-03 12:52:06 +08:00
Concedo
3fa6befdaf
increase max free blocks
2023-08-03 10:50:16 +08:00
Concedo
34e60be41a
compile fix
2023-08-03 10:36:14 +08:00
Evan Jones
8183159cf3
examples : generate JSON according to schema ( #1887 )
...
* examples : add JSON schema grammars
* complete JSON grammar
* ensure primitive types can be used as root of schema
* support integer type and adjust usage text
2023-08-02 22:05:44 -04:00
duncannah
9281c2801f
fix: don't send headers twice when streaming
2023-08-02 23:42:43 +02:00
Johannes Gäßler
468ea24fb4
CUDA: faster non k-quant mul_mat_q kernels ( #2483 )
2023-08-02 18:04:04 +02:00
Concedo
b2eaec4261
updated lite
2023-08-02 22:54:17 +08:00
Johannes Gäßler
4f6b60c776
CUDA: Fix models with output size != 32000 ( #2480 )
2023-08-02 16:48:10 +02:00
Concedo
4c90fdc5cd
Merge remote-tracking branch 'johannes/cuda-fix-output-size' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
2023-08-02 22:37:41 +08:00
Concedo
6fe92318f8
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-double-float.cpp
# tests/test-grad0.cpp
# tests/test-opt.cpp
2023-08-02 22:36:00 +08:00
JohannesGaessler
1e64d511d5
CUDA: Fix models with output size != 32000
2023-08-02 10:26:53 +02:00
ldwang
220d931864
readme : add Aquila-7B model series to supported models ( #2487 )
...
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
* Add Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
* Up Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-08-02 11:21:11 +03:00
Eve
81844fbcfd
tests : Fix compilation warnings (Linux/GCC) ( #2451 )
...
* fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params
2023-08-02 11:06:19 +03:00
Yiming Cui
a312193e18
readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models ( #2475 )
...
* add support for chinese llama-2 / alpaca-2
* remove white spaces
2023-08-02 09:18:31 +03:00
Bono Lv
c574bddb36
fix a typo in examples/server/README.md ( #2478 )
2023-08-01 14:54:28 +02:00
Concedo
c58ffc92e5
fixed compile error
2023-08-01 18:28:49 +08:00
Concedo
84b28c4282
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
2023-08-01 18:13:27 +08:00
Concedo
46682e5cb3
added mmq launch flag
2023-08-01 17:57:13 +08:00
ebraminio
86aeb27734
server : Support dark mode ( #2414 )
...
* server : Support dark mode
So it respects user system light / dark settings.
* Update index.html.hpp by running ./deps.sh
2023-08-01 10:56:23 +02:00
Matteo Boschini
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal ( #2459 )
...
* Added gqa8 kernel to allow llama-2-70B on metal
* Update ggml-metal.m
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast
* Added ne03==ne13 assertion
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-08-01 10:43:12 +03:00
Johannes Gäßler
49e7cb5bb1
CUDA: fixed LLAMA_FAST compilation option ( #2473 )
2023-07-31 21:02:19 +02:00
Johannes Gäßler
b772bba42e
CUDA: fixed cmake F16 option ( #2471 )
2023-07-31 19:52:22 +02:00
Concedo
e221843147
trying out mmq
...
Merge branch 'master' into concedo_experimental
# Conflicts:
# CMakeLists.txt
# README.md
2023-07-31 22:51:15 +08:00
Concedo
3e370f83ef
Warning: Very experimental merge, do not use until confirmed stable.
2023-07-31 22:33:43 +08:00
Johannes Gäßler
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues ( #2453 )
2023-07-31 15:44:35 +02:00
Johannes Gäßler
1215ed7d5c
CUDA: Implemented row flattening for non-glm RoPE ( #2468 )
2023-07-31 14:32:30 +02:00
Johannes Gäßler
2dbf518911
CUDA: fewer memory bank conflicts for mul_mat_q ( #2458 )
2023-07-31 13:18:51 +02:00
Concedo
84ce184c4f
layout
2023-07-31 17:33:31 +08:00
slaren
9d2382b3e4
Fix Metal backend broken from the allocator changes ( #2455 )
...
* fix Metal backend broken from the allocator changes
2023-07-31 11:02:53 +02:00
YellowRoseCx
f27972777f
correct semantic error in import_vars ( #355 )
...
* Hide unavailable backends & Add tooltip over backend count
Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command
Add tooltip when hovering over backend count label
hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built
* add some code comments
* hide "missing" if all are built
move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
" if len(runopts)==6 else + "
* small typo fix
* remove wrongly added leftover device choosing code
* fix labels
* move tooltip to function
* import vars logic fix
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-07-31 15:51:35 +08:00