llama.cpp

Author	SHA1	Message	Date
Jared Van Bortel	89febfed93	examples : do not assume BOS when shifting context (#5622 )	2024-02-21 10:33:54 -05:00
Georgi Gerganov	5022cf242d	sync : ggml	2024-02-21 16:52:52 +02:00
Pierrick Hymbert	1ecea255eb	server: health: fix race condition on slots data using tasks queue (#5634 ) * server: health: fix race condition on slots data using tasks queue * server: health: * include_slots only if slots_endpoint * fix compile warning task.target_id not initialized.	2024-02-21 15:47:48 +01:00
Ettore Di Giacinto	a00a35cef9	readme : add LocalAI to the availables UI (#5629 )	2024-02-21 16:39:10 +02:00
Georgi Gerganov	eccd7a26dd	sync : ggml (#5633 ) * ggml : fix conv_2d batch mode (ggml/737) Co-authored-by: bssrdf <bssrdf@gmail.com> * ggml : compute forward no longer pass src tensors (ggml/729) * sync : ggml ggml-ci --------- Co-authored-by: bssrdf <merlintiger@hotmail.com> Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-02-21 16:17:10 +02:00
Georgi Gerganov	c14f72db9c	readme : update hot topics	2024-02-21 15:39:54 +02:00
Daniel Bevenius	cc6cac08e3	llava : add --skip-unknown to 1.6 convert.py (#5632 ) This commit adds the `--skip-unknown` option to the convert.py script and removes the saving of the updated checkpoints to avoid updating possibly checked out files. The motivation for this change is that this was done for 1.5 in Commit `fc0c8d286a` ("llava : update surgery script to not remove tensors") and makes the examples more consistent. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-21 15:36:57 +02:00
postmasters	580111d42b	llama : add `gemma` model (#5631 ) There are couple things in this architecture: 1. Shared input and output embedding parameters. 2. Key length and value length are not derived from `n_embd`. More information about the models can be found at https://ai.google.dev/gemma. GGUFs can be downloaded from https://huggingface.co/google.	2024-02-21 15:08:22 +02:00
Georgi Gerganov	f1d4138c13	server : fix initialization thread issues	2024-02-21 13:08:57 +02:00
Meng, Hengyu	88c46cbdac	[SYCL] conext add name (#5624 ) * [SYCL] conext add name * name should start with SYCL*	2024-02-21 17:52:06 +08:00
Kawrakow	a14679cc30	IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590 ) * iq4_nl: squash commits for easier rebase * Basics (quantize, dequantize) * CUDA dequantize and dot product * Slightly faster CUDA dot product (120 t/s) * Switch to 6-bit scales * Scalar dot product * AVX2 dot product * ARM_NEON dot product * Works on metal, but still slow * Slightly better Metal dot product * Another small Metal improvement * Metal dot product is getting there * Faster CUDA dot product * Add 1/8 ffn_down layers as Q5_K when no imatrix has been provided * Report the actual bpw * Add _xs mix that is 4.05 bpw for non-MoE models * Remove IQ4_XS for now, slightly adjust kvalues_iq4nl * AVX2 dot product uses Q8_0 instead of Q8_K * Add to test-backend-ops * Minor fix * Also use use Q5_K for attn_output in MoE models * Fixes after merging latest master * Switching to blocks of 32 * AVX2 for blocks of 32 * Scaler dot product for blocks of 32 * ARM_NEON dot product for blocks of 32 * Metal kernels for blocks of 32 * Slightly faster Metal kernels * iq4_nl: Fix after merging with master * iq4_nl: another fix after merging with master * Use IQ4_NL instead of Q4_K when using k-quants is not possible * Fix typo that makes several tests fail * It was the ggml_vdotq thing missed inside the brackets --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-02-21 11:39:52 +02:00
Pierrick HYMBERT	2a37bd6b86	server: tests: fix the multi users infinite loop test	2024-02-21 02:29:50 +01:00
Pierrick HYMBERT	469af4b4ec	server: tests: change CI workflow trigger	2024-02-21 02:20:44 +01:00
Pierrick HYMBERT	3322bfa980	server: tests: add a small check to be sure all started threads have generated response	2024-02-21 02:04:59 +01:00
Pierrick HYMBERT	672d98f6f0	server: tests: CORS and api key checks scenario	2024-02-21 01:51:33 +01:00
Pierrick HYMBERT	6dcbcfe6ba	server: tests: simplify completion scenario	2024-02-21 00:43:50 +01:00
Pierrick HYMBERT	19664b9f01	server: tests: detokenize endpoint issue reference added	2024-02-21 00:17:38 +01:00
Pierrick HYMBERT	1065f6d41b	server: tests: add tokenize/detokenize scenario	2024-02-21 00:13:53 +01:00
Pierrick HYMBERT	e6d482088d	server: tests: add embeddings scenario	2024-02-21 00:02:30 +01:00
Pierrick HYMBERT	1ecda0d13e	server: tests: disable issue 3969 scenario	2024-02-20 23:35:44 +01:00
Pierrick HYMBERT	b0b6d83c76	server: tests: add infinite loop scenario	2024-02-20 23:17:00 +01:00
Pierrick HYMBERT	68574c6f98	server: tests: add infinite loop scenario	2024-02-20 23:11:59 +01:00
Pierrick HYMBERT	6b9dc4f291	server: tests: add infinite loop	2024-02-20 23:05:27 +01:00
Pierrick HYMBERT	0772884b06	server: tests: add a constant seed in completion request	2024-02-20 22:55:29 +01:00
Pierrick HYMBERT	b9f8390d28	server: tests: check for infinite loops	2024-02-20 22:49:36 +01:00
Pierrick HYMBERT	367b59a15c	server: tests: check for infinite loops	2024-02-20 22:45:30 +01:00
Pierrick HYMBERT	c355f76427	server: tests: slots endpoint checks	2024-02-20 22:32:11 +01:00
Pierrick HYMBERT	11adf1d864	server: tests: add OAI multi user scenario	2024-02-20 22:00:09 +01:00
Pierrick HYMBERT	9b7ea97979	server: tests: add OAI stream test, fix file end of line, fast fail behave	2024-02-20 21:34:35 +01:00
Pierrick HYMBERT	56583bee41	server: tests: refactor steps and vocabulary	2024-02-20 20:52:24 +01:00
Pierrick HYMBERT	6c95ec6587	server: tests: change model to: @karpathy's tinyllamas	2024-02-20 20:50:14 +01:00
CJ Pais	6560bed3f0	server : support llava 1.6 (#5553 ) * server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build	2024-02-20 21:07:22 +02:00
slaren	06bf2cf8c4	make : fix debug build with CUDA (#5616 )	2024-02-20 20:06:17 +01:00
Pierrick HYMBERT	8bb586bf06	server: tests: add health check and concurrent request example	2024-02-20 19:05:21 +01:00
Pierrick HYMBERT	1680599b01	server: tests: build only the server	2024-02-20 19:05:21 +01:00
Pierrick HYMBERT	fe9866a52d	server: tests: use ngxson llama_xs_q4.bin	2024-02-20 19:05:21 +01:00
Pierrick HYMBERT	30aa323fb9	server: tests: fix ci workflow	2024-02-20 19:05:21 +01:00
Pierrick HYMBERT	4e5245e6b8	server: tests: fix ci workflow	2024-02-20 19:05:21 +01:00
Pierrick HYMBERT	6497755de5	server: tests: fix ci workflow	2024-02-20 19:05:21 +01:00
Pierrick HYMBERT	9b63d7057a	server: tests: reduce number of files, all in one tests shell script	2024-02-20 19:05:21 +01:00
Pierrick HYMBERT	157bcf2286	server: init functional test	2024-02-20 19:05:21 +01:00
Daniel Bevenius	4ed8e4fbef	llava : add explicit instructions for llava-1.6 (#5611 ) This commit contains a suggestion for the README.md in the llava example. The suggestion adds explicit instructions for how to convert a llava-1.6 model and run it using llava-cli. The motivation for this is that having explicit instructions similar to the 1.5 instructions will make it easier for users to try this out. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-20 19:30:27 +02:00
Xuan Son Nguyen	9c405c9f9a	Server: use llama_chat_apply_template (#5593 ) * server: use llama_chat_apply_template * server: remove trailing space * server: fix format_chat * server: fix help message Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: fix formatted_chat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-20 15:58:27 +01:00
Dane Madsen	5207b3fbc5	readme : update UI list (#5605 ) * Add maid to ui list * Specify licence	2024-02-20 12:00:23 +02:00
Haoxiang Fei	8dbbd75754	metal : add build system support for embedded metal library (#5604 ) * add build support for embedded metal library * Update Makefile --------- Co-authored-by: Haoxiang Fei <feihaoxiang@idea.edu.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-20 11:58:36 +02:00
Pierrick Hymbert	c0a8c6db37	server : health endpoint configurable failure on no slot (#5594 )	2024-02-20 09:48:19 +02:00
AidanBeltonS	b9111bd209	Update ggml_sycl_op_mul_mat_vec_q (#5502 ) * Update ggml_sycl_op_mul_mat_vec_q * Apply suggestions from code review Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * revert suggestion on macro * fix bug * Add quant type GGML_TYPE_IQ1_S to unsupported * fix format --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-02-20 12:31:25 +05:30
Mathijs de Bruin	633782b8d9	nix: now that we can do so, allow MacOS to build Vulkan binaries Author: Philip Taron <philip.taron@gmail.com> Date: Tue Feb 13 20:28:02 2024 +0000	2024-02-19 14:49:49 -08:00
0cc4m	22f83f0c38	Enable Vulkan MacOS CI	2024-02-19 14:49:49 -08:00
0cc4m	bb9dcd560a	Refactor validation and enumeration platform checks into functions to clean up ggml_vk_instance_init()	2024-02-19 14:49:49 -08:00

1 2 3 4 5 ...

2309 commits