llama.cpp

Author	SHA1	Message	Date
Olivier Chafik	7628bd8c76	json: move json.hpp & json-schema-to-grammar.{cpp,h} to common	2024-03-20 14:35:10 +00:00
Olivier Chafik	7fc759b84f	json: fix date pattern	2024-03-19 11:59:06 +00:00
ochafik	874599e749	json: create examples/json-schema-pydantic-example.py	2024-03-19 09:10:39 +00:00
ochafik	263a86e148	json: cleaner build of test	2024-03-19 02:12:15 +00:00
ochafik	02e3bde6b4	json: don't complain about unknown format type in server if unset	2024-03-19 01:45:23 +00:00
ochafik	e7de6433cb	json: catch schema conversion errors in server	2024-03-19 01:21:49 +00:00
ochafik	05fd7e3020	json: fix json handling in server when there's no response_format	2024-03-18 20:46:57 +00:00
ochafik	bd96df4e85	json: ws nit	2024-03-18 04:42:25 +00:00
ochafik	24f0b941cf	json: fix string patterns (was missing quotes)	2024-03-18 04:06:23 +00:00
ochafik	dd922a4da3	json: test/fix additional props corner cases	2024-03-18 01:32:15 +00:00
ochafik	bbd70800c8	json: improve grammar parsing failures	2024-03-18 00:34:02 +00:00
ochafik	618247885c	json: test/fix top-level anyOf	2024-03-18 00:13:58 +00:00
ochafik	20869ede26	Merge remote-tracking branch 'origin/master' into json-fixes	2024-03-17 22:53:04 +00:00
ochafik	edbd2e9862	json: add server tests for OAI JSON response_format	2024-03-17 22:51:29 +00:00
ochafik	3e1bf44e5e	json: check parsing in test + fix value & string refs	2024-03-17 22:47:20 +00:00
ochafik	84e383c1d7	json: test (& simplify output of) empty schema	2024-03-17 21:51:10 +00:00
ochafik	5c50ffaeac	json: fix type=const in c++, add failure expectations for non-str const&enum	2024-03-17 21:03:48 +00:00
ochafik	64799baea1	json: add tests for some expected failures	2024-03-17 21:01:02 +00:00
Pierrick Hymbert	d01b3c4c32	common: llama_load_model_from_url using --model-url (#6098 ) * common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-17 19:12:37 +01:00
Georgi Gerganov	cd776c37c9	ci : close all stale issues at once (#6115 )	2024-03-17 18:51:57 +01:00
GainLee	dc0f612548	ggml:fix finding transfer queue family index error (#6094 ) Co-authored-by: GainLee <ligen@meizu.com>	2024-03-17 18:12:22 +01:00
AmirAli Mirian	c47cf414ef	ggml : add AVX512F SIMD (#6088 )	2024-03-16 17:52:02 +02:00
Daniel Bevenius	b5f4ae09c3	gritlm : add initial README.md (#6086 ) * gritlm: add initial README.md to examples/gritlm This commit adds a suggestion for an initial README.md for the gritlm example. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! gritlm: add initial README.md to examples/gritlm Use the `scripts/hf.sh` script to download the model file. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! gritlm: add initial README.md to examples/gritlm Fix editorconfig-checker error in examples/gritlm/README.md. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-03-16 17:46:29 +02:00
Xuan Son Nguyen	dfbfdd60f9	readme : add wllama as a wasm binding (#6100 )	2024-03-16 17:42:08 +02:00
DAN™	15961ec04d	common : refactor nested if causing error C1061 on MSVC (#6101 ) * Refactor nested if causing error C1061 on MSVC. * Revert back and remove else's. * Add flag to track found arguments.	2024-03-16 17:39:15 +02:00
Pierrick Hymbert	a56d09a440	ci : close inactive issue with workflow (#6053 ) * issues: ci - close inactive issue with workflow * ci: close issue, change workflow schedule time	2024-03-16 14:20:53 +02:00
ochafik	391b17e7f6	json: support mix of additional props & required/optional	2024-03-16 11:13:29 +00:00
ochafik	f30d6c27b9	json: simplify test	2024-03-16 10:35:41 +00:00
ochafik	5602a8b649	Merge remote-tracking branch 'origin/master' into json-fixes	2024-03-16 00:45:07 +00:00
ochafik	842eb834c5	json: re-ran server deps.sh	2024-03-16 00:36:36 +00:00
ochafik	af31aa20b4	Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)	2024-03-16 00:19:44 +00:00
slaren	d84c48505f	llama : fix Baichuan2 13B (#6092 )	2024-03-15 23:14:16 +02:00
Theia Vogel	877b4d0c62	llama : add support for control vectors (#5970 ) * control vector api and implementation * control-vectors : minor code style updates * disable control vector when data == nullptr use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-15 22:43:02 +02:00
Andrew Canis	12247f4c69	llama : add Command-R support (#6033 ) Information about the Command-R 35B model (128k context) can be found at: https://huggingface.co/CohereForAI/c4ai-command-r-v01 Based on the llama2 model with a few changes: 1) New hyper parameter to scale output logits (logit_scale) 2) Uses LayerNorm instead of RMSNorm 3) Transfomer layers have a single shared LayerNorm that feeds into both the self-attention and FFN layers in parallel. There is no post-attention LayerNorm. 4) No support for Rotary Position Embeddings (RoPE) scaling 5) No biases used Find GGUF files here: https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF To convert model to GGUF format yourself: 1) Download Command-R Hugging Face safetensors: git lfs install git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01 2) Run: python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01	2024-03-15 22:41:22 +02:00
Ting Lou	4e9a7f7f7f	llava : change API to pure C style for Rust FFI bindgen (#6079 ) Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>	2024-03-15 16:31:05 +02:00
slaren	3020327f6c	cuda : disable unused cudaLaunchHostFunc code (#6078 )	2024-03-15 14:24:03 +02:00
Neo Zhang Jianyu	46acb36767	fix set main gpu error (#6073 )	2024-03-15 18:53:53 +08:00
ochafik	5714487830	json: basic support for reserved names `{number:{number:{root:number}}}`	2024-03-15 10:35:34 +00:00
ochafik	daceced65e	nit	2024-03-15 10:07:20 +00:00
ochafik	235ff6858d	json: don't use c++20 designated initializers	2024-03-15 10:03:57 +00:00
Georgi Gerganov	131b058409	make : ggml-metal.o depends on ggml.h	2024-03-15 11:38:40 +02:00
AidanBeltonS	753e36f650	[SYCL] Fix non-intel device selection (#6042 ) * Fix non-intel device selection * Update ggml-sycl.cpp Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * Update ggml-sycl.cpp Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2024-03-15 14:56:20 +05:30
Ondřej Čertík	7ce2c77f88	gguf : add support for I64 and F64 arrays (#6062 ) * gguf : add support for I64 and F64 arrays GGML currently does not support I64 or F64 arrays and they are not often used in machine learning, however if in the future the need arises, it would be nice to add them now, so that the types are next to the other types I8, I16, I32 in the enums, and it also reserves their type number. Furthermore, with this addition the GGUF format becomes very usable for most computational applications of NumPy (being compatible with the most common NumPy dtypes: i8, i16, i32, i64, f32, f64), providing a faster, and more versatile alternative to the `npz` format, and a simpler alternative to the `hdf5` format. The change in this PR seems small, not significantly increasing the maintenance burden. I tested this from Python using GGUFWriter/Reader and `gguf-dump`, as well as from C, everything seems to work. * Fix compiler warnings	2024-03-15 10:46:51 +02:00
Xuan Son Nguyen	aab606a11f	llama : add Orion chat template (#6066 )	2024-03-15 10:44:57 +02:00
slaren	b0bc9f4a9d	llama-bench : use random tokens to improve accuracy with mixtral (#6069 )	2024-03-15 10:22:24 +02:00
ochafik	3b3ad949f5	json: fix top-level $refs	2024-03-15 00:52:36 +00:00
ochafik	5a7deb27d5	json: pass static command to std::system in tests (fixed temp files)	2024-03-15 00:03:06 +00:00
ochafik	f2165502c9	json: fix zig build	2024-03-14 23:51:44 +00:00
ochafik	3feac66d0f	Merge remote-tracking branch 'origin/master' into json-fixes	2024-03-14 23:37:13 +00:00
Georgi Gerganov	4755afd1cb	llama : fix integer overflow during quantization (#6063 )	2024-03-14 22:58:41 +02:00

1 2 3 4 5 ...

2547 commits