Commit graph

2480 commits

Author SHA1 Message Date
Pierrick HYMBERT
1ad5a45210 ci: build: add libcurl in default make toolchain step for tests 2024-03-16 20:06:18 +01:00
Pierrick HYMBERT
78812c6d63 llama_load_model_from_url: PR feedback, use snprintf instead of strncp and strncat 2024-03-16 20:02:34 +01:00
Pierrick HYMBERT
5df5605b02 ci: build: add libcurl in default make toolchain step 2024-03-16 19:52:11 +01:00
Pierrick HYMBERT
176f039a91 ci: tests: windows tests add libcurl 2024-03-16 19:51:44 +01:00
Pierrick HYMBERT
838178a196 ci: tests: windows tests add libcurl 2024-03-16 18:34:53 +01:00
Pierrick HYMBERT
064dc076bb common: CMakeLists.txt fix typo in logging when lib curl is not found 2024-03-16 18:34:36 +01:00
Pierrick HYMBERT
124c474bba llama_load_model_from_url: coherent clearer logging 2024-03-16 18:24:21 +01:00
Pierrick HYMBERT
4fadb072e9 server: tests: add --model-url tests 2024-03-16 18:15:41 +01:00
Pierrick HYMBERT
545fef6e0e llama_load_model_from_url: fix compilation warning, clearer logging 2024-03-16 18:01:55 +01:00
Pierrick Hymbert
b0b49e0bb8
Update examples/main/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:48:48 +01:00
Pierrick Hymbert
eb9e52a218
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:48:38 +01:00
Pierrick Hymbert
be561a7ffd
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:48:32 +01:00
Pierrick Hymbert
89ab37a261
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:48:27 +01:00
Pierrick Hymbert
330e28df08
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:48:20 +01:00
Pierrick Hymbert
9565ae3187
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:48:10 +01:00
Pierrick Hymbert
f22456d8c3
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:48:02 +01:00
Pierrick Hymbert
b088122719
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:47:04 +01:00
Pierrick Hymbert
f53bfd56af
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:46:53 +01:00
Pierrick Hymbert
8751bd0c82
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:46:46 +01:00
Pierrick Hymbert
4bc47b75ca
Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:46:34 +01:00
Pierrick Hymbert
e84206d132
Update examples/server/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 17:46:18 +01:00
Pierrick HYMBERT
1430e895fc Merge branch 'master' into hp/download-model-from-hf
# Conflicts:
#	common/common.cpp
2024-03-16 16:57:24 +01:00
AmirAli Mirian
c47cf414ef
ggml : add AVX512F SIMD (#6088) 2024-03-16 17:52:02 +02:00
Pierrick HYMBERT
6633689fa5 llama_load_model_from_url: cleanup code 2024-03-16 16:49:44 +01:00
Daniel Bevenius
b5f4ae09c3
gritlm : add initial README.md (#6086)
* gritlm: add initial README.md to examples/gritlm

This commit adds a suggestion for an initial README.md for the gritlm
example.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! gritlm: add initial README.md to examples/gritlm

Use the `scripts/hf.sh` script to download the model file.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! gritlm: add initial README.md to examples/gritlm

Fix editorconfig-checker error in examples/gritlm/README.md.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-03-16 17:46:29 +02:00
Xuan Son Nguyen
dfbfdd60f9
readme : add wllama as a wasm binding (#6100) 2024-03-16 17:42:08 +02:00
DAN™
15961ec04d
common : refactor nested if causing error C1061 on MSVC (#6101)
* Refactor nested if causing error C1061 on MSVC.

* Revert back and remove else's.

* Add flag to track found arguments.
2024-03-16 17:39:15 +02:00
Pierrick HYMBERT
921e4af930 ci: build, fix the default build to use LLAMA_CURL 2024-03-16 16:29:08 +01:00
Pierrick HYMBERT
5d99f3224f llama_load_model_from_url: download the file only if modified based on etag and last-modified http headers 2024-03-16 16:27:48 +01:00
Pierrick HYMBERT
4135d4a505 llama_load_model_from_url: typo 2024-03-16 16:27:48 +01:00
Pierrick Hymbert
2c3a00e270
Update Makefile
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-16 15:40:29 +01:00
Pierrick HYMBERT
80bec9890a llama_load_model_from_url: try to make the windows build passing 2024-03-16 14:08:21 +01:00
Pierrick HYMBERT
df0d82289c ci: compile the server with curl, add make option curl example in default cmake 2024-03-16 13:52:45 +01:00
Pierrick HYMBERT
7e782856bd common: LLAMA_USE_CURL in make toolchain 2024-03-16 13:45:09 +01:00
Pierrick HYMBERT
42b25dacab common: PR feedback, rename the definition to LLAMA_USE_CURL 2024-03-16 13:27:05 +01:00
Pierrick Hymbert
a56d09a440
ci : close inactive issue with workflow (#6053)
* issues: ci - close inactive issue with workflow

* ci: close issue, change workflow schedule time
2024-03-16 14:20:53 +02:00
Pierrick HYMBERT
a0ebdfcc5d common: llama_load_model_from_url witch to libcurl dependency 2024-03-16 12:27:08 +01:00
Pierrick HYMBERT
3221ab01ad common: introduce llama_load_model_from_url to download model from hf url using libopenssl only 2024-03-16 09:59:14 +01:00
slaren
d84c48505f
llama : fix Baichuan2 13B (#6092) 2024-03-15 23:14:16 +02:00
Theia Vogel
877b4d0c62
llama : add support for control vectors (#5970)
* control vector api and implementation

* control-vectors : minor code style updates

* disable control vector when data == nullptr

use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings)

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-15 22:43:02 +02:00
Andrew Canis
12247f4c69
llama : add Command-R support (#6033)
Information about the Command-R 35B model (128k context) can be found at:
	https://huggingface.co/CohereForAI/c4ai-command-r-v01

Based on the llama2 model with a few changes:

1) New hyper parameter to scale output logits (logit_scale)
2) Uses LayerNorm instead of RMSNorm
3) Transfomer layers have a single shared LayerNorm that feeds into both the
   self-attention and FFN layers in parallel. There is no post-attention LayerNorm.
4) No support for Rotary Position Embeddings (RoPE) scaling
5) No biases used

Find GGUF files here:
	https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF

To convert model to GGUF format yourself:

1) Download Command-R Hugging Face safetensors:
	git lfs install
	git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01

2) Run:
	python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01
2024-03-15 22:41:22 +02:00
Ting Lou
4e9a7f7f7f
llava : change API to pure C style for Rust FFI bindgen (#6079)
Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>
2024-03-15 16:31:05 +02:00
slaren
3020327f6c
cuda : disable unused cudaLaunchHostFunc code (#6078) 2024-03-15 14:24:03 +02:00
Neo Zhang Jianyu
46acb36767
fix set main gpu error (#6073) 2024-03-15 18:53:53 +08:00
Georgi Gerganov
131b058409
make : ggml-metal.o depends on ggml.h 2024-03-15 11:38:40 +02:00
AidanBeltonS
753e36f650
[SYCL] Fix non-intel device selection (#6042)
* Fix non-intel device selection

* Update ggml-sycl.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* Update ggml-sycl.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2024-03-15 14:56:20 +05:30
Ondřej Čertík
7ce2c77f88
gguf : add support for I64 and F64 arrays (#6062)
* gguf : add support for I64 and F64 arrays

GGML currently does not support I64 or F64 arrays and they are not often
used in machine learning, however if in the future the need arises, it
would be nice to add them now, so that the types are next to the other
types I8, I16, I32 in the enums, and it also reserves their type number.

Furthermore, with this addition the GGUF format becomes very usable for
most computational applications of NumPy (being compatible with the most
common NumPy dtypes: i8, i16, i32, i64, f32, f64), providing a faster,
and more versatile alternative to the `npz` format, and a simpler
alternative to the `hdf5` format.

The change in this PR seems small, not significantly increasing the
maintenance burden. I tested this from Python using GGUFWriter/Reader
and `gguf-dump`, as well as from C, everything seems to work.

* Fix compiler warnings
2024-03-15 10:46:51 +02:00
Xuan Son Nguyen
aab606a11f
llama : add Orion chat template (#6066) 2024-03-15 10:44:57 +02:00
slaren
b0bc9f4a9d
llama-bench : use random tokens to improve accuracy with mixtral (#6069) 2024-03-15 10:22:24 +02:00
Georgi Gerganov
4755afd1cb
llama : fix integer overflow during quantization (#6063) 2024-03-14 22:58:41 +02:00