Commit graph

2547 commits

Author SHA1 Message Date
Olivier Chafik
7628bd8c76 json: move json.hpp & json-schema-to-grammar.{cpp,h} to common 2024-03-20 14:35:10 +00:00
Olivier Chafik
7fc759b84f json: fix date pattern 2024-03-19 11:59:06 +00:00
ochafik
874599e749 json: create examples/json-schema-pydantic-example.py 2024-03-19 09:10:39 +00:00
ochafik
263a86e148 json: cleaner build of test 2024-03-19 02:12:15 +00:00
ochafik
02e3bde6b4 json: don't complain about unknown format type in server if unset 2024-03-19 01:45:23 +00:00
ochafik
e7de6433cb json: catch schema conversion errors in server 2024-03-19 01:21:49 +00:00
ochafik
05fd7e3020 json: fix json handling in server when there's no response_format 2024-03-18 20:46:57 +00:00
ochafik
bd96df4e85 json: ws nit 2024-03-18 04:42:25 +00:00
ochafik
24f0b941cf json: fix string patterns (was missing quotes) 2024-03-18 04:06:23 +00:00
ochafik
dd922a4da3 json: test/fix additional props corner cases 2024-03-18 01:32:15 +00:00
ochafik
bbd70800c8 json: improve grammar parsing failures 2024-03-18 00:34:02 +00:00
ochafik
618247885c json: test/fix top-level anyOf 2024-03-18 00:13:58 +00:00
ochafik
20869ede26 Merge remote-tracking branch 'origin/master' into json-fixes 2024-03-17 22:53:04 +00:00
ochafik
edbd2e9862 json: add server tests for OAI JSON response_format 2024-03-17 22:51:29 +00:00
ochafik
3e1bf44e5e json: check parsing in test + fix value & string refs 2024-03-17 22:47:20 +00:00
ochafik
84e383c1d7 json: test (& simplify output of) empty schema 2024-03-17 21:51:10 +00:00
ochafik
5c50ffaeac json: fix type=const in c++, add failure expectations for non-str const&enum 2024-03-17 21:03:48 +00:00
ochafik
64799baea1 json: add tests for some expected failures 2024-03-17 21:01:02 +00:00
Pierrick Hymbert
d01b3c4c32
common: llama_load_model_from_url using --model-url (#6098)
* common: llama_load_model_from_url with libcurl dependency

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-17 19:12:37 +01:00
Georgi Gerganov
cd776c37c9
ci : close all stale issues at once (#6115) 2024-03-17 18:51:57 +01:00
GainLee
dc0f612548
ggml:fix finding transfer queue family index error (#6094)
Co-authored-by: GainLee <ligen@meizu.com>
2024-03-17 18:12:22 +01:00
AmirAli Mirian
c47cf414ef
ggml : add AVX512F SIMD (#6088) 2024-03-16 17:52:02 +02:00
Daniel Bevenius
b5f4ae09c3
gritlm : add initial README.md (#6086)
* gritlm: add initial README.md to examples/gritlm

This commit adds a suggestion for an initial README.md for the gritlm
example.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! gritlm: add initial README.md to examples/gritlm

Use the `scripts/hf.sh` script to download the model file.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! gritlm: add initial README.md to examples/gritlm

Fix editorconfig-checker error in examples/gritlm/README.md.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-03-16 17:46:29 +02:00
Xuan Son Nguyen
dfbfdd60f9
readme : add wllama as a wasm binding (#6100) 2024-03-16 17:42:08 +02:00
DAN™
15961ec04d
common : refactor nested if causing error C1061 on MSVC (#6101)
* Refactor nested if causing error C1061 on MSVC.

* Revert back and remove else's.

* Add flag to track found arguments.
2024-03-16 17:39:15 +02:00
Pierrick Hymbert
a56d09a440
ci : close inactive issue with workflow (#6053)
* issues: ci - close inactive issue with workflow

* ci: close issue, change workflow schedule time
2024-03-16 14:20:53 +02:00
ochafik
391b17e7f6 json: support mix of additional props & required/optional 2024-03-16 11:13:29 +00:00
ochafik
f30d6c27b9 json: simplify test 2024-03-16 10:35:41 +00:00
ochafik
5602a8b649 Merge remote-tracking branch 'origin/master' into json-fixes 2024-03-16 00:45:07 +00:00
ochafik
842eb834c5 json: re-ran server deps.sh 2024-03-16 00:36:36 +00:00
ochafik
af31aa20b4 Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test) 2024-03-16 00:19:44 +00:00
slaren
d84c48505f
llama : fix Baichuan2 13B (#6092) 2024-03-15 23:14:16 +02:00
Theia Vogel
877b4d0c62
llama : add support for control vectors (#5970)
* control vector api and implementation

* control-vectors : minor code style updates

* disable control vector when data == nullptr

use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings)

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-15 22:43:02 +02:00
Andrew Canis
12247f4c69
llama : add Command-R support (#6033)
Information about the Command-R 35B model (128k context) can be found at:
	https://huggingface.co/CohereForAI/c4ai-command-r-v01

Based on the llama2 model with a few changes:

1) New hyper parameter to scale output logits (logit_scale)
2) Uses LayerNorm instead of RMSNorm
3) Transfomer layers have a single shared LayerNorm that feeds into both the
   self-attention and FFN layers in parallel. There is no post-attention LayerNorm.
4) No support for Rotary Position Embeddings (RoPE) scaling
5) No biases used

Find GGUF files here:
	https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF

To convert model to GGUF format yourself:

1) Download Command-R Hugging Face safetensors:
	git lfs install
	git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01

2) Run:
	python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01
2024-03-15 22:41:22 +02:00
Ting Lou
4e9a7f7f7f
llava : change API to pure C style for Rust FFI bindgen (#6079)
Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>
2024-03-15 16:31:05 +02:00
slaren
3020327f6c
cuda : disable unused cudaLaunchHostFunc code (#6078) 2024-03-15 14:24:03 +02:00
Neo Zhang Jianyu
46acb36767
fix set main gpu error (#6073) 2024-03-15 18:53:53 +08:00
ochafik
5714487830 json: basic support for reserved names {number:{number:{root:number}}} 2024-03-15 10:35:34 +00:00
ochafik
daceced65e nit 2024-03-15 10:07:20 +00:00
ochafik
235ff6858d json: don't use c++20 designated initializers 2024-03-15 10:03:57 +00:00
Georgi Gerganov
131b058409
make : ggml-metal.o depends on ggml.h 2024-03-15 11:38:40 +02:00
AidanBeltonS
753e36f650
[SYCL] Fix non-intel device selection (#6042)
* Fix non-intel device selection

* Update ggml-sycl.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* Update ggml-sycl.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2024-03-15 14:56:20 +05:30
Ondřej Čertík
7ce2c77f88
gguf : add support for I64 and F64 arrays (#6062)
* gguf : add support for I64 and F64 arrays

GGML currently does not support I64 or F64 arrays and they are not often
used in machine learning, however if in the future the need arises, it
would be nice to add them now, so that the types are next to the other
types I8, I16, I32 in the enums, and it also reserves their type number.

Furthermore, with this addition the GGUF format becomes very usable for
most computational applications of NumPy (being compatible with the most
common NumPy dtypes: i8, i16, i32, i64, f32, f64), providing a faster,
and more versatile alternative to the `npz` format, and a simpler
alternative to the `hdf5` format.

The change in this PR seems small, not significantly increasing the
maintenance burden. I tested this from Python using GGUFWriter/Reader
and `gguf-dump`, as well as from C, everything seems to work.

* Fix compiler warnings
2024-03-15 10:46:51 +02:00
Xuan Son Nguyen
aab606a11f
llama : add Orion chat template (#6066) 2024-03-15 10:44:57 +02:00
slaren
b0bc9f4a9d
llama-bench : use random tokens to improve accuracy with mixtral (#6069) 2024-03-15 10:22:24 +02:00
ochafik
3b3ad949f5 json: fix top-level $refs 2024-03-15 00:52:36 +00:00
ochafik
5a7deb27d5 json: pass static command to std::system in tests (fixed temp files) 2024-03-15 00:03:06 +00:00
ochafik
f2165502c9 json: fix zig build 2024-03-14 23:51:44 +00:00
ochafik
3feac66d0f Merge remote-tracking branch 'origin/master' into json-fixes 2024-03-14 23:37:13 +00:00
Georgi Gerganov
4755afd1cb
llama : fix integer overflow during quantization (#6063) 2024-03-14 22:58:41 +02:00