llama.cpp

Author	SHA1	Message	Date
Pierrick HYMBERT	1ad5a45210	ci: build: add libcurl in default make toolchain step for tests	2024-03-16 20:06:18 +01:00
Pierrick HYMBERT	78812c6d63	llama_load_model_from_url: PR feedback, use snprintf instead of strncp and strncat	2024-03-16 20:02:34 +01:00
Pierrick HYMBERT	5df5605b02	ci: build: add libcurl in default make toolchain step	2024-03-16 19:52:11 +01:00
Pierrick HYMBERT	176f039a91	ci: tests: windows tests add libcurl	2024-03-16 19:51:44 +01:00
Pierrick HYMBERT	838178a196	ci: tests: windows tests add libcurl	2024-03-16 18:34:53 +01:00
Pierrick HYMBERT	064dc076bb	common: CMakeLists.txt fix typo in logging when lib curl is not found	2024-03-16 18:34:36 +01:00
Pierrick HYMBERT	124c474bba	llama_load_model_from_url: coherent clearer logging	2024-03-16 18:24:21 +01:00
Pierrick HYMBERT	4fadb072e9	server: tests: add `--model-url` tests	2024-03-16 18:15:41 +01:00
Pierrick HYMBERT	545fef6e0e	llama_load_model_from_url: fix compilation warning, clearer logging	2024-03-16 18:01:55 +01:00
Pierrick Hymbert	b0b49e0bb8	Update examples/main/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:48:48 +01:00
Pierrick Hymbert	eb9e52a218	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:48:38 +01:00
Pierrick Hymbert	be561a7ffd	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:48:32 +01:00
Pierrick Hymbert	89ab37a261	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:48:27 +01:00
Pierrick Hymbert	330e28df08	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:48:20 +01:00
Pierrick Hymbert	9565ae3187	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:48:10 +01:00
Pierrick Hymbert	f22456d8c3	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:48:02 +01:00
Pierrick Hymbert	b088122719	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:47:04 +01:00
Pierrick Hymbert	f53bfd56af	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:46:53 +01:00
Pierrick Hymbert	8751bd0c82	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:46:46 +01:00
Pierrick Hymbert	4bc47b75ca	Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:46:34 +01:00
Pierrick Hymbert	e84206d132	Update examples/server/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 17:46:18 +01:00
Pierrick HYMBERT	1430e895fc	Merge branch 'master' into hp/download-model-from-hf # Conflicts: # common/common.cpp	2024-03-16 16:57:24 +01:00
AmirAli Mirian	c47cf414ef	ggml : add AVX512F SIMD (#6088 )	2024-03-16 17:52:02 +02:00
Pierrick HYMBERT	6633689fa5	llama_load_model_from_url: cleanup code	2024-03-16 16:49:44 +01:00
Daniel Bevenius	b5f4ae09c3	gritlm : add initial README.md (#6086 ) * gritlm: add initial README.md to examples/gritlm This commit adds a suggestion for an initial README.md for the gritlm example. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! gritlm: add initial README.md to examples/gritlm Use the `scripts/hf.sh` script to download the model file. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! gritlm: add initial README.md to examples/gritlm Fix editorconfig-checker error in examples/gritlm/README.md. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-03-16 17:46:29 +02:00
Xuan Son Nguyen	dfbfdd60f9	readme : add wllama as a wasm binding (#6100 )	2024-03-16 17:42:08 +02:00
DAN™	15961ec04d	common : refactor nested if causing error C1061 on MSVC (#6101 ) * Refactor nested if causing error C1061 on MSVC. * Revert back and remove else's. * Add flag to track found arguments.	2024-03-16 17:39:15 +02:00
Pierrick HYMBERT	921e4af930	ci: build, fix the default build to use LLAMA_CURL	2024-03-16 16:29:08 +01:00
Pierrick HYMBERT	5d99f3224f	llama_load_model_from_url: download the file only if modified based on etag and last-modified http headers	2024-03-16 16:27:48 +01:00
Pierrick HYMBERT	4135d4a505	llama_load_model_from_url: typo	2024-03-16 16:27:48 +01:00
Pierrick Hymbert	2c3a00e270	Update Makefile Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-16 15:40:29 +01:00
Pierrick HYMBERT	80bec9890a	llama_load_model_from_url: try to make the windows build passing	2024-03-16 14:08:21 +01:00
Pierrick HYMBERT	df0d82289c	ci: compile the server with curl, add make option curl example in default cmake	2024-03-16 13:52:45 +01:00
Pierrick HYMBERT	7e782856bd	common: LLAMA_USE_CURL in make toolchain	2024-03-16 13:45:09 +01:00
Pierrick HYMBERT	42b25dacab	common: PR feedback, rename the definition to LLAMA_USE_CURL	2024-03-16 13:27:05 +01:00
Pierrick Hymbert	a56d09a440	ci : close inactive issue with workflow (#6053 ) * issues: ci - close inactive issue with workflow * ci: close issue, change workflow schedule time	2024-03-16 14:20:53 +02:00
Pierrick HYMBERT	a0ebdfcc5d	common: llama_load_model_from_url witch to libcurl dependency	2024-03-16 12:27:08 +01:00
Pierrick HYMBERT	3221ab01ad	common: introduce llama_load_model_from_url to download model from hf url using libopenssl only	2024-03-16 09:59:14 +01:00
slaren	d84c48505f	llama : fix Baichuan2 13B (#6092 )	2024-03-15 23:14:16 +02:00
Theia Vogel	877b4d0c62	llama : add support for control vectors (#5970 ) * control vector api and implementation * control-vectors : minor code style updates * disable control vector when data == nullptr use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-15 22:43:02 +02:00
Andrew Canis	12247f4c69	llama : add Command-R support (#6033 ) Information about the Command-R 35B model (128k context) can be found at: https://huggingface.co/CohereForAI/c4ai-command-r-v01 Based on the llama2 model with a few changes: 1) New hyper parameter to scale output logits (logit_scale) 2) Uses LayerNorm instead of RMSNorm 3) Transfomer layers have a single shared LayerNorm that feeds into both the self-attention and FFN layers in parallel. There is no post-attention LayerNorm. 4) No support for Rotary Position Embeddings (RoPE) scaling 5) No biases used Find GGUF files here: https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF To convert model to GGUF format yourself: 1) Download Command-R Hugging Face safetensors: git lfs install git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01 2) Run: python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01	2024-03-15 22:41:22 +02:00
Ting Lou	4e9a7f7f7f	llava : change API to pure C style for Rust FFI bindgen (#6079 ) Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>	2024-03-15 16:31:05 +02:00
slaren	3020327f6c	cuda : disable unused cudaLaunchHostFunc code (#6078 )	2024-03-15 14:24:03 +02:00
Neo Zhang Jianyu	46acb36767	fix set main gpu error (#6073 )	2024-03-15 18:53:53 +08:00
Georgi Gerganov	131b058409	make : ggml-metal.o depends on ggml.h	2024-03-15 11:38:40 +02:00
AidanBeltonS	753e36f650	[SYCL] Fix non-intel device selection (#6042 ) * Fix non-intel device selection * Update ggml-sycl.cpp Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * Update ggml-sycl.cpp Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2024-03-15 14:56:20 +05:30
Ondřej Čertík	7ce2c77f88	gguf : add support for I64 and F64 arrays (#6062 ) * gguf : add support for I64 and F64 arrays GGML currently does not support I64 or F64 arrays and they are not often used in machine learning, however if in the future the need arises, it would be nice to add them now, so that the types are next to the other types I8, I16, I32 in the enums, and it also reserves their type number. Furthermore, with this addition the GGUF format becomes very usable for most computational applications of NumPy (being compatible with the most common NumPy dtypes: i8, i16, i32, i64, f32, f64), providing a faster, and more versatile alternative to the `npz` format, and a simpler alternative to the `hdf5` format. The change in this PR seems small, not significantly increasing the maintenance burden. I tested this from Python using GGUFWriter/Reader and `gguf-dump`, as well as from C, everything seems to work. * Fix compiler warnings	2024-03-15 10:46:51 +02:00
Xuan Son Nguyen	aab606a11f	llama : add Orion chat template (#6066 )	2024-03-15 10:44:57 +02:00
slaren	b0bc9f4a9d	llama-bench : use random tokens to improve accuracy with mixtral (#6069 )	2024-03-15 10:22:24 +02:00
Georgi Gerganov	4755afd1cb	llama : fix integer overflow during quantization (#6063 )	2024-03-14 22:58:41 +02:00

1 2 3 4 5 ...

2480 commits