llama.cpp

Author	SHA1	Message	Date
Pierrick Hymbert	c0a8c6db37	server : health endpoint configurable failure on no slot (#5594 )	2024-02-20 09:48:19 +02:00
AidanBeltonS	b9111bd209	Update ggml_sycl_op_mul_mat_vec_q (#5502 ) * Update ggml_sycl_op_mul_mat_vec_q * Apply suggestions from code review Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * revert suggestion on macro * fix bug * Add quant type GGML_TYPE_IQ1_S to unsupported * fix format --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-02-20 12:31:25 +05:30
Mathijs de Bruin	633782b8d9	nix: now that we can do so, allow MacOS to build Vulkan binaries Author: Philip Taron <philip.taron@gmail.com> Date: Tue Feb 13 20:28:02 2024 +0000	2024-02-19 14:49:49 -08:00
0cc4m	22f83f0c38	Enable Vulkan MacOS CI	2024-02-19 14:49:49 -08:00
0cc4m	bb9dcd560a	Refactor validation and enumeration platform checks into functions to clean up ggml_vk_instance_init()	2024-02-19 14:49:49 -08:00
0cc4m	f50db6ae0b	Add check for VK_KHR_portability_enumeration for MoltenVK support	2024-02-19 14:49:49 -08:00
Mathijs de Bruin	d8c054517d	Add preprocessor checks for Apple devices. Based on work by @rbourgeat in https://github.com/ggerganov/llama.cpp/pull/5322/files	2024-02-19 14:49:49 -08:00
Mathijs de Bruin	42f664a382	Resolve ErrorIncompatibleDriver with Vulkan on MacOS. Refs: - https://chat.openai.com/share/7020ce72-65fc-45ec-b7be-9d9d798a5f3f - https://github.com/SaschaWillems/Vulkan/issues/954 - https://github.com/haasn/libplacebo/issues/128 - https://github.com/KhronosGroup/Vulkan-Samples/issues/476	2024-02-19 14:49:49 -08:00
Mathijs de Bruin	5dde540897	Allow for Vulkan build with Accelerate. Closes #5304	2024-02-19 14:49:49 -08:00
slaren	40c3a6c1e1	cuda : ignore peer access already enabled errors (#5597 ) * cuda : ignore peer access already enabled errors * fix hip	2024-02-19 23:40:26 +01:00
pudepiedj	d261e7f8f8	Merge branch 'ggerganov:master' into server_branch	2024-02-19 22:14:25 +00:00
Jared Van Bortel	f24ed14ee0	make : pass CPPFLAGS directly to nvcc, not via -Xcompiler (#5598 )	2024-02-19 15:54:12 -05:00
pudepiedj	69cb1ef0b1	Merge branch 'ggerganov:master' into server_branch	2024-02-19 16:38:20 +00:00
pudepiedj	ea0e8ac758	Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch	2024-02-19 16:37:02 +00:00
pudepiedj	efe38c636f	server changes	2024-02-19 16:36:59 +00:00
nopperl	9d679f0fcc	examples : support minItems/maxItems in JSON grammar converter (#5039 ) * support minLength and maxLength in JSON schema grammar converter * Update examples/json-schema-to-grammar.py --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-19 16:14:07 +02:00
Georgi Gerganov	1387cf60f7	llava : remove extra cont (#5587 )	2024-02-19 15:23:17 +02:00
slaren	6fd413791a	llava : replace ggml_cpy with ggml_cont	2024-02-19 15:09:43 +02:00
Georgi Gerganov	337c9cbd52	sync : ggml ggml-ci	2024-02-19 15:09:43 +02:00
Georgi Gerganov	a3145bdc30	ggml-alloc : apply ggml/731	2024-02-19 15:09:43 +02:00
Didzis Gosko	890559ab28	metal : option to embed MSL source into compiled binary (whisper/1842) * ggml : embed Metal library source (ggml-metal.metal) into binary enable by setting WHISPER_EMBED_METAL_LIBRARY * rename the build option * rename the preprocessor directive * generate Metal library embedding assembly on-fly during build process	2024-02-19 15:09:43 +02:00
pudepiedj	491e11b283	Merge branch 'ggerganov:master' into server_branch	2024-02-19 12:48:37 +00:00
Georgi Gerganov	d0e3ce51f4	ci : enable -Werror for CUDA builds (#5579 ) * cmake : pass -Werror through -Xcompiler ggml-ci * make, cmake : enable CUDA errors on warnings ggml-ci	2024-02-19 14:45:41 +02:00
pudepiedj	aaffb2387f	Merge branch 'ggerganov:master' into server_branch	2024-02-19 12:44:51 +00:00
pudepiedj	8a4d202957	minor changes	2024-02-19 12:41:27 +00:00
pudepiedj	7f0d8987eb	minor updates and TCPshellscript	2024-02-19 12:14:23 +00:00
Georgi Gerganov	68a6b98b3c	make : fix CUDA build (#5580 )	2024-02-19 13:41:51 +02:00
valiray	70d45af0ef	readme : fix typo in README-sycl.md (#5353 )	2024-02-19 12:37:10 +02:00
Abhilash Majumder	13e2c771aa	cmake : remove obsolete sycl compile flags (#5581 ) * rm unwanted sycl compile options * fix bug * fix bug * format fix	2024-02-19 11:15:18 +02:00
Georgi Gerganov	f53119cec4	minor : fix trailing whitespace (#5538 )	2024-02-19 10:34:10 +02:00
Daniel Bevenius	7084755396	llava : avoid changing the original BakLLaVA model (#5577 ) This is a follup of Commit `fc0c8d286a` ("llava : update surgery script to not remove tensors") but this time the change is to the BakLLaVA specific part of the surgery script. I've been able to test this using SkunkworksAI/BakLLaVA-1 and it works as expected using the instructions in README.md. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-19 10:31:59 +02:00
NawafAlansari	4480542b22	baby-llama : allocate graphs in ggml_context (#5573 ) * Fixed the baby-llama issue (see issue #4830) * minor : fix whitespaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-19 10:25:38 +02:00
Xuan Son Nguyen	11b12de39b	llama : add llama_chat_apply_template() (#5538 ) * llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"	2024-02-19 10:23:37 +02:00
slaren	3a9cb4ca64	cuda, metal : fix nans in soft_max (#5574 ) * cuda : fix nans in soft_max * metal : fix nans in soft_max --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-19 10:04:45 +02:00
Mirko185	769a716e30	readme : update (#5572 ) Added 1.5-bit on README.md	2024-02-19 09:39:31 +02:00
bmwl	f0d1fafc02	ggml : android and old glibc NUMA incompatibility bugfixes (#5557 ) * #ifdef out some code NUMA blocks for Android due to lack of support * added in some __ANDROID__ if def gates around numa code and forced GLIBC prior to 2.29 to use a syscall for getcpu instead of the wrapper * Changed gates on numa platform specific stuff to __gnu_linux__ to skip any platforms without glibc * harmonizing #if defined blocks for numa code to __gnu_linux__ since that's the only model that's being followed anyways --------- Co-authored-by: root <root@nenya.lothlorien.ca>	2024-02-19 09:38:32 +02:00
Jared Van Bortel	a0c2dad9d4	build : pass all warning flags to nvcc via -Xcompiler (#5570 ) * build : pass all warning flags to nvcc via -Xcompiler * make : fix apparent mis-merge from #3952 * make : fix incorrect GF_CC_VER for CUDA host compiler	2024-02-18 16:21:52 -05:00
Georgi Gerganov	14278f55d2	ggml : restore vec dot stride arg names (#5453 )	2024-02-18 22:58:57 +02:00
Georgi Gerganov	b1de96824b	ci : fix wikitext url + compile warnings (#5569 ) ggml-ci	2024-02-18 22:39:30 +02:00
Georgi Gerganov	7ad554f90e	metal : fix unused warnings (#0 )	2024-02-18 21:39:58 +02:00
Robey Holderith	5ee99c32f5	common, server : surface min_keep as its own parameter (#5567 ) * Feature - surface min_keep as its own parameter * Updated README with min_keep param	2024-02-18 21:11:16 +02:00
pudepiedj	d2f97227ba	Merge remote-tracking branch 'origin/master' into server_branch	2024-02-18 19:00:06 +00:00
Pierrick Hymbert	c145f8a132	server : slots monitoring endpoint (#5550 )	2024-02-18 19:39:57 +02:00
Georgi Gerganov	689a091bbe	sampling : do not set min_keep to n_probs (#5564 )	2024-02-18 19:38:06 +02:00
pudepiedj	7a98f62f6d	server edit	2024-02-18 17:32:02 +00:00
Georgi Gerganov	f3f28c5395	cmake : fix GGML_USE_SYCL typo (#5555 )	2024-02-18 19:17:00 +02:00
pudepiedj	bad3de0511	server with flag	2024-02-18 16:35:26 +00:00
Pierrick Hymbert	e75c6279d1	server : enhanced health endpoint (#5548 ) * server: enrich health endpoint with available slots, return 503 if not slots are available * server: document new status no slot available in the README.md	2024-02-18 18:31:28 +02:00
Pierrick Hymbert	36376abe05	server : --n-predict option document and cap to max value (#5549 ) * server: document --n-predict * server: ensure client request cannot override n_predict if set * server: fix print usage LF in new --n-predict option	2024-02-18 18:30:09 +02:00
Daniel Hiltgen	66c1968f7a	server : graceful server shutdown (#5244 ) This updates the server queue to support graceful shutdown of the server on signals.	2024-02-18 18:23:16 +02:00

... 3 4 5 6 7 ...

2430 commits