Georgi Gerganov
f1d4138c13
server : fix initialization thread issues
2024-02-21 13:08:57 +02:00
Meng, Hengyu
88c46cbdac
[SYCL] conext add name ( #5624 )
...
* [SYCL] conext add name
* name should start with SYCL*
2024-02-21 17:52:06 +08:00
Kawrakow
a14679cc30
IQ4_NL: 4-bit non-linear quants with blocks of 32 ( #5590 )
...
* iq4_nl: squash commits for easier rebase
* Basics (quantize, dequantize)
* CUDA dequantize and dot product
* Slightly faster CUDA dot product (120 t/s)
* Switch to 6-bit scales
* Scalar dot product
* AVX2 dot product
* ARM_NEON dot product
* Works on metal, but still slow
* Slightly better Metal dot product
* Another small Metal improvement
* Metal dot product is getting there
* Faster CUDA dot product
* Add 1/8 ffn_down layers as Q5_K when no imatrix has been provided
* Report the actual bpw
* Add _xs mix that is 4.05 bpw for non-MoE models
* Remove IQ4_XS for now, slightly adjust kvalues_iq4nl
* AVX2 dot product uses Q8_0 instead of Q8_K
* Add to test-backend-ops
* Minor fix
* Also use use Q5_K for attn_output in MoE models
* Fixes after merging latest master
* Switching to blocks of 32
* AVX2 for blocks of 32
* Scaler dot product for blocks of 32
* ARM_NEON dot product for blocks of 32
* Metal kernels for blocks of 32
* Slightly faster Metal kernels
* iq4_nl: Fix after merging with master
* iq4_nl: another fix after merging with master
* Use IQ4_NL instead of Q4_K when using k-quants is not possible
* Fix typo that makes several tests fail
* It was the ggml_vdotq thing missed inside the brackets
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-21 11:39:52 +02:00
Pierrick HYMBERT
2a37bd6b86
server: tests: fix the multi users infinite loop test
2024-02-21 02:29:50 +01:00
Pierrick HYMBERT
469af4b4ec
server: tests: change CI workflow trigger
2024-02-21 02:20:44 +01:00
Pierrick HYMBERT
3322bfa980
server: tests: add a small check to be sure all started threads have generated response
2024-02-21 02:04:59 +01:00
Pierrick HYMBERT
672d98f6f0
server: tests: CORS and api key checks scenario
2024-02-21 01:51:33 +01:00
Pierrick HYMBERT
6dcbcfe6ba
server: tests: simplify completion scenario
2024-02-21 00:43:50 +01:00
Pierrick HYMBERT
19664b9f01
server: tests: detokenize endpoint issue reference added
2024-02-21 00:17:38 +01:00
Pierrick HYMBERT
1065f6d41b
server: tests: add tokenize/detokenize scenario
2024-02-21 00:13:53 +01:00
Pierrick HYMBERT
e6d482088d
server: tests: add embeddings scenario
2024-02-21 00:02:30 +01:00
Pierrick HYMBERT
1ecda0d13e
server: tests: disable issue 3969 scenario
2024-02-20 23:35:44 +01:00
Pierrick HYMBERT
b0b6d83c76
server: tests: add infinite loop scenario
2024-02-20 23:17:00 +01:00
Pierrick HYMBERT
68574c6f98
server: tests: add infinite loop scenario
2024-02-20 23:11:59 +01:00
Pierrick HYMBERT
6b9dc4f291
server: tests: add infinite loop
2024-02-20 23:05:27 +01:00
Pierrick HYMBERT
0772884b06
server: tests: add a constant seed in completion request
2024-02-20 22:55:29 +01:00
Pierrick HYMBERT
b9f8390d28
server: tests: check for infinite loops
2024-02-20 22:49:36 +01:00
Pierrick HYMBERT
367b59a15c
server: tests: check for infinite loops
2024-02-20 22:45:30 +01:00
Pierrick HYMBERT
c355f76427
server: tests: slots endpoint checks
2024-02-20 22:32:11 +01:00
Pierrick HYMBERT
11adf1d864
server: tests: add OAI multi user scenario
2024-02-20 22:00:09 +01:00
Pierrick HYMBERT
9b7ea97979
server: tests: add OAI stream test, fix file end of line, fast fail behave
2024-02-20 21:34:35 +01:00
Pierrick HYMBERT
56583bee41
server: tests: refactor steps and vocabulary
2024-02-20 20:52:24 +01:00
Pierrick HYMBERT
6c95ec6587
server: tests: change model to: @karpathy's tinyllamas
2024-02-20 20:50:14 +01:00
CJ Pais
6560bed3f0
server : support llava 1.6 ( #5553 )
...
* server: init working 1.6
* move clip_image to header
* remove commented code
* remove c++ style from header
* remove todo
* expose llava_image_embed_make_with_clip_img
* fix zig build
2024-02-20 21:07:22 +02:00
slaren
06bf2cf8c4
make : fix debug build with CUDA ( #5616 )
2024-02-20 20:06:17 +01:00
Pierrick HYMBERT
8bb586bf06
server: tests: add health check and concurrent request example
2024-02-20 19:05:21 +01:00
Pierrick HYMBERT
1680599b01
server: tests: build only the server
2024-02-20 19:05:21 +01:00
Pierrick HYMBERT
fe9866a52d
server: tests: use ngxson llama_xs_q4.bin
2024-02-20 19:05:21 +01:00
Pierrick HYMBERT
30aa323fb9
server: tests: fix ci workflow
2024-02-20 19:05:21 +01:00
Pierrick HYMBERT
4e5245e6b8
server: tests: fix ci workflow
2024-02-20 19:05:21 +01:00
Pierrick HYMBERT
6497755de5
server: tests: fix ci workflow
2024-02-20 19:05:21 +01:00
Pierrick HYMBERT
9b63d7057a
server: tests: reduce number of files, all in one tests shell script
2024-02-20 19:05:21 +01:00
Pierrick HYMBERT
157bcf2286
server: init functional test
2024-02-20 19:05:21 +01:00
Daniel Bevenius
4ed8e4fbef
llava : add explicit instructions for llava-1.6 ( #5611 )
...
This commit contains a suggestion for the README.md in the llava
example. The suggestion adds explicit instructions for how to convert
a llava-1.6 model and run it using llava-cli.
The motivation for this is that having explicit instructions similar to
the 1.5 instructions will make it easier for users to try this out.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-20 19:30:27 +02:00
Xuan Son Nguyen
9c405c9f9a
Server: use llama_chat_apply_template ( #5593 )
...
* server: use llama_chat_apply_template
* server: remove trailing space
* server: fix format_chat
* server: fix help message
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: fix formatted_chat
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-20 15:58:27 +01:00
Dane Madsen
5207b3fbc5
readme : update UI list ( #5605 )
...
* Add maid to ui list
* Specify licence
2024-02-20 12:00:23 +02:00
Haoxiang Fei
8dbbd75754
metal : add build system support for embedded metal library ( #5604 )
...
* add build support for embedded metal library
* Update Makefile
---------
Co-authored-by: Haoxiang Fei <feihaoxiang@idea.edu.cn>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-20 11:58:36 +02:00
Pierrick Hymbert
c0a8c6db37
server : health endpoint configurable failure on no slot ( #5594 )
2024-02-20 09:48:19 +02:00
AidanBeltonS
b9111bd209
Update ggml_sycl_op_mul_mat_vec_q ( #5502 )
...
* Update ggml_sycl_op_mul_mat_vec_q
* Apply suggestions from code review
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* revert suggestion on macro
* fix bug
* Add quant type GGML_TYPE_IQ1_S to unsupported
* fix format
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-02-20 12:31:25 +05:30
Mathijs de Bruin
633782b8d9
nix: now that we can do so, allow MacOS to build Vulkan binaries
...
Author: Philip Taron <philip.taron@gmail.com>
Date: Tue Feb 13 20:28:02 2024 +0000
2024-02-19 14:49:49 -08:00
0cc4m
22f83f0c38
Enable Vulkan MacOS CI
2024-02-19 14:49:49 -08:00
0cc4m
bb9dcd560a
Refactor validation and enumeration platform checks into functions to clean up ggml_vk_instance_init()
2024-02-19 14:49:49 -08:00
0cc4m
f50db6ae0b
Add check for VK_KHR_portability_enumeration for MoltenVK support
2024-02-19 14:49:49 -08:00
Mathijs de Bruin
d8c054517d
Add preprocessor checks for Apple devices.
...
Based on work by @rbourgeat in https://github.com/ggerganov/llama.cpp/pull/5322/files
2024-02-19 14:49:49 -08:00
Mathijs de Bruin
42f664a382
Resolve ErrorIncompatibleDriver with Vulkan on MacOS.
...
Refs:
- https://chat.openai.com/share/7020ce72-65fc-45ec-b7be-9d9d798a5f3f
- https://github.com/SaschaWillems/Vulkan/issues/954
- https://github.com/haasn/libplacebo/issues/128
- https://github.com/KhronosGroup/Vulkan-Samples/issues/476
2024-02-19 14:49:49 -08:00
Mathijs de Bruin
5dde540897
Allow for Vulkan build with Accelerate.
...
Closes #5304
2024-02-19 14:49:49 -08:00
slaren
40c3a6c1e1
cuda : ignore peer access already enabled errors ( #5597 )
...
* cuda : ignore peer access already enabled errors
* fix hip
2024-02-19 23:40:26 +01:00
Jared Van Bortel
f24ed14ee0
make : pass CPPFLAGS directly to nvcc, not via -Xcompiler ( #5598 )
2024-02-19 15:54:12 -05:00
nopperl
9d679f0fcc
examples : support minItems/maxItems in JSON grammar converter ( #5039 )
...
* support minLength and maxLength in JSON schema grammar converter
* Update examples/json-schema-to-grammar.py
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19 16:14:07 +02:00
Georgi Gerganov
1387cf60f7
llava : remove extra cont ( #5587 )
2024-02-19 15:23:17 +02:00