Pierrick Hymbert
c0a8c6db37
server : health endpoint configurable failure on no slot ( #5594 )
2024-02-20 09:48:19 +02:00
AidanBeltonS
b9111bd209
Update ggml_sycl_op_mul_mat_vec_q ( #5502 )
...
* Update ggml_sycl_op_mul_mat_vec_q
* Apply suggestions from code review
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* revert suggestion on macro
* fix bug
* Add quant type GGML_TYPE_IQ1_S to unsupported
* fix format
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-02-20 12:31:25 +05:30
Mathijs de Bruin
633782b8d9
nix: now that we can do so, allow MacOS to build Vulkan binaries
...
Author: Philip Taron <philip.taron@gmail.com>
Date: Tue Feb 13 20:28:02 2024 +0000
2024-02-19 14:49:49 -08:00
0cc4m
22f83f0c38
Enable Vulkan MacOS CI
2024-02-19 14:49:49 -08:00
0cc4m
bb9dcd560a
Refactor validation and enumeration platform checks into functions to clean up ggml_vk_instance_init()
2024-02-19 14:49:49 -08:00
0cc4m
f50db6ae0b
Add check for VK_KHR_portability_enumeration for MoltenVK support
2024-02-19 14:49:49 -08:00
Mathijs de Bruin
d8c054517d
Add preprocessor checks for Apple devices.
...
Based on work by @rbourgeat in https://github.com/ggerganov/llama.cpp/pull/5322/files
2024-02-19 14:49:49 -08:00
Mathijs de Bruin
42f664a382
Resolve ErrorIncompatibleDriver with Vulkan on MacOS.
...
Refs:
- https://chat.openai.com/share/7020ce72-65fc-45ec-b7be-9d9d798a5f3f
- https://github.com/SaschaWillems/Vulkan/issues/954
- https://github.com/haasn/libplacebo/issues/128
- https://github.com/KhronosGroup/Vulkan-Samples/issues/476
2024-02-19 14:49:49 -08:00
Mathijs de Bruin
5dde540897
Allow for Vulkan build with Accelerate.
...
Closes #5304
2024-02-19 14:49:49 -08:00
slaren
40c3a6c1e1
cuda : ignore peer access already enabled errors ( #5597 )
...
* cuda : ignore peer access already enabled errors
* fix hip
2024-02-19 23:40:26 +01:00
pudepiedj
d261e7f8f8
Merge branch 'ggerganov:master' into server_branch
2024-02-19 22:14:25 +00:00
Jared Van Bortel
f24ed14ee0
make : pass CPPFLAGS directly to nvcc, not via -Xcompiler ( #5598 )
2024-02-19 15:54:12 -05:00
pudepiedj
69cb1ef0b1
Merge branch 'ggerganov:master' into server_branch
2024-02-19 16:38:20 +00:00
pudepiedj
ea0e8ac758
Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch
2024-02-19 16:37:02 +00:00
pudepiedj
efe38c636f
server changes
2024-02-19 16:36:59 +00:00
nopperl
9d679f0fcc
examples : support minItems/maxItems in JSON grammar converter ( #5039 )
...
* support minLength and maxLength in JSON schema grammar converter
* Update examples/json-schema-to-grammar.py
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19 16:14:07 +02:00
Georgi Gerganov
1387cf60f7
llava : remove extra cont ( #5587 )
2024-02-19 15:23:17 +02:00
slaren
6fd413791a
llava : replace ggml_cpy with ggml_cont
2024-02-19 15:09:43 +02:00
Georgi Gerganov
337c9cbd52
sync : ggml
...
ggml-ci
2024-02-19 15:09:43 +02:00
Georgi Gerganov
a3145bdc30
ggml-alloc : apply ggml/731
2024-02-19 15:09:43 +02:00
Didzis Gosko
890559ab28
metal : option to embed MSL source into compiled binary (whisper/1842)
...
* ggml : embed Metal library source (ggml-metal.metal) into binary
enable by setting WHISPER_EMBED_METAL_LIBRARY
* rename the build option
* rename the preprocessor directive
* generate Metal library embedding assembly on-fly during build process
2024-02-19 15:09:43 +02:00
pudepiedj
491e11b283
Merge branch 'ggerganov:master' into server_branch
2024-02-19 12:48:37 +00:00
Georgi Gerganov
d0e3ce51f4
ci : enable -Werror for CUDA builds ( #5579 )
...
* cmake : pass -Werror through -Xcompiler
ggml-ci
* make, cmake : enable CUDA errors on warnings
ggml-ci
2024-02-19 14:45:41 +02:00
pudepiedj
aaffb2387f
Merge branch 'ggerganov:master' into server_branch
2024-02-19 12:44:51 +00:00
pudepiedj
8a4d202957
minor changes
2024-02-19 12:41:27 +00:00
pudepiedj
7f0d8987eb
minor updates and TCPshellscript
2024-02-19 12:14:23 +00:00
Georgi Gerganov
68a6b98b3c
make : fix CUDA build ( #5580 )
2024-02-19 13:41:51 +02:00
valiray
70d45af0ef
readme : fix typo in README-sycl.md ( #5353 )
2024-02-19 12:37:10 +02:00
Abhilash Majumder
13e2c771aa
cmake : remove obsolete sycl compile flags ( #5581 )
...
* rm unwanted sycl compile options
* fix bug
* fix bug
* format fix
2024-02-19 11:15:18 +02:00
Georgi Gerganov
f53119cec4
minor : fix trailing whitespace ( #5538 )
2024-02-19 10:34:10 +02:00
Daniel Bevenius
7084755396
llava : avoid changing the original BakLLaVA model ( #5577 )
...
This is a follup of Commit fc0c8d286a
("llava : update surgery script to not remove tensors") but this time
the change is to the BakLLaVA specific part of the surgery script.
I've been able to test this using SkunkworksAI/BakLLaVA-1 and it works
as expected using the instructions in README.md.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-19 10:31:59 +02:00
NawafAlansari
4480542b22
baby-llama : allocate graphs in ggml_context ( #5573 )
...
* Fixed the baby-llama issue (see issue #4830 )
* minor : fix whitespaces
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19 10:25:38 +02:00
Xuan Son Nguyen
11b12de39b
llama : add llama_chat_apply_template() ( #5538 )
...
* llama: add llama_chat_apply_template
* test-chat-template: remove dedundant vector
* chat_template: do not use std::string for buffer
* add clarification for llama_chat_apply_template
* llama_chat_apply_template: add zephyr template
* llama_chat_apply_template: correct docs
* llama_chat_apply_template: use term "chat" everywhere
* llama_chat_apply_template: change variable name to "tmpl"
2024-02-19 10:23:37 +02:00
slaren
3a9cb4ca64
cuda, metal : fix nans in soft_max ( #5574 )
...
* cuda : fix nans in soft_max
* metal : fix nans in soft_max
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19 10:04:45 +02:00
Mirko185
769a716e30
readme : update ( #5572 )
...
Added 1.5-bit on README.md
2024-02-19 09:39:31 +02:00
bmwl
f0d1fafc02
ggml : android and old glibc NUMA incompatibility bugfixes ( #5557 )
...
* #ifdef out some code NUMA blocks for Android due to lack of support
* added in some __ANDROID__ if def gates around numa code and forced GLIBC prior to 2.29 to use a syscall for getcpu instead of the wrapper
* Changed gates on numa platform specific stuff to __gnu_linux__ to skip any platforms without glibc
* harmonizing #if defined blocks for numa code to __gnu_linux__ since that's the only model that's being followed anyways
---------
Co-authored-by: root <root@nenya.lothlorien.ca>
2024-02-19 09:38:32 +02:00
Jared Van Bortel
a0c2dad9d4
build : pass all warning flags to nvcc via -Xcompiler ( #5570 )
...
* build : pass all warning flags to nvcc via -Xcompiler
* make : fix apparent mis-merge from #3952
* make : fix incorrect GF_CC_VER for CUDA host compiler
2024-02-18 16:21:52 -05:00
Georgi Gerganov
14278f55d2
ggml : restore vec dot stride arg names ( #5453 )
2024-02-18 22:58:57 +02:00
Georgi Gerganov
b1de96824b
ci : fix wikitext url + compile warnings ( #5569 )
...
ggml-ci
2024-02-18 22:39:30 +02:00
Georgi Gerganov
7ad554f90e
metal : fix unused warnings ( #0 )
2024-02-18 21:39:58 +02:00
Robey Holderith
5ee99c32f5
common, server : surface min_keep as its own parameter ( #5567 )
...
* Feature - surface min_keep as its own parameter
* Updated README with min_keep param
2024-02-18 21:11:16 +02:00
pudepiedj
d2f97227ba
Merge remote-tracking branch 'origin/master' into server_branch
2024-02-18 19:00:06 +00:00
Pierrick Hymbert
c145f8a132
server : slots monitoring endpoint ( #5550 )
2024-02-18 19:39:57 +02:00
Georgi Gerganov
689a091bbe
sampling : do not set min_keep to n_probs ( #5564 )
2024-02-18 19:38:06 +02:00
pudepiedj
7a98f62f6d
server edit
2024-02-18 17:32:02 +00:00
Georgi Gerganov
f3f28c5395
cmake : fix GGML_USE_SYCL typo ( #5555 )
2024-02-18 19:17:00 +02:00
pudepiedj
bad3de0511
server with flag
2024-02-18 16:35:26 +00:00
Pierrick Hymbert
e75c6279d1
server : enhanced health endpoint ( #5548 )
...
* server: enrich health endpoint with available slots, return 503 if not slots are available
* server: document new status no slot available in the README.md
2024-02-18 18:31:28 +02:00
Pierrick Hymbert
36376abe05
server : --n-predict option document and cap to max value ( #5549 )
...
* server: document --n-predict
* server: ensure client request cannot override n_predict if set
* server: fix print usage LF in new --n-predict option
2024-02-18 18:30:09 +02:00
Daniel Hiltgen
66c1968f7a
server : graceful server shutdown ( #5244 )
...
This updates the server queue to support graceful shutdown of the server on signals.
2024-02-18 18:23:16 +02:00