llama.cpp

Author	SHA1	Message	Date
joshcarp	716886cf17	Cleanup	2024-05-07 14:46:19 -04:00
joshcarp	04e2858942	Cleanup	2024-05-07 14:45:59 -04:00
joshcarp	92ff0de243	Cleanup	2024-05-07 14:44:59 -04:00
joshcarp	308c817af4	Update	2024-05-07 14:38:58 -04:00
joshcarp	16b8ecdaf5	WIP	2024-05-06 15:15:20 -04:00
joshcarp	98ba54e5ec	WIP	2024-05-06 15:14:54 -04:00
joshcarp	8d2dead681	Remove comment on assert that was failing	2024-04-30 08:53:03 -04:00
joshcarp	896dee5059	Update	2024-04-30 08:51:01 -04:00
joshcarp	5eea11e241	fix up	2024-04-30 00:10:58 -04:00
joshcarp	9858fd1457	Fix SwiGlu2	2024-04-29 23:22:02 -04:00
joshcarp	0084a2a8d7	Checkpoint	2024-04-29 20:00:44 -04:00
joshcarp	7c3c3eb256	Add comment	2024-04-29 17:26:54 -04:00
joshcarp	5ba2143c3c	Add nop feedforward length	2024-04-29 14:32:10 -04:00
joshcarp	6e89d82269	Attempt at OpenElm	2024-04-29 14:24:59 -04:00
Daniel Bevenius	5539e6fdd1	main : fix typo in comment in main.cpp (#6985 ) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-29 13:56:59 -04:00
Olivier Chafik	b8a7a5a90f	build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964 ) * readme: cmake . -B build && cmake --build build * build: fix typo Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * build: drop implicit . from cmake config command * build: remove another superfluous . * build: update MinGW cmake commands * Update README-sycl.md Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * build: reinstate --config Release as not the default w/ some generators + document how to build Debug * build: revert more --config Release * build: nit / remove -H from cmake example * build: reword debug instructions around single/multi config split --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2024-04-29 17:02:45 +01:00
Georgi Gerganov	d2c898f746	ci : tmp disable gguf-split (#6983 ) ggml-ci	2024-04-29 18:36:39 +03:00
Georgi Gerganov	544f1f10ad	ggml : fix __MSC_VER -> _MSC_VER (#6977 ) ggml-ci	2024-04-29 17:55:02 +03:00
cpumaxx	ffe666572f	llava-cli : multiple images (#6969 ) Co-authored-by: root <root@nenya.lothlorien.ca>	2024-04-29 17:34:24 +03:00
Georgi Gerganov	24affa7db3	readme : update hot topics	2024-04-29 17:06:19 +03:00
Georgi Gerganov	f4ab2a4147	llama : fix BPE pre-tokenization (#6920 ) * merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * lint : fix * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>	2024-04-29 16:58:41 +03:00
David Renshaw	3f167476b1	sampling : use std::random_device{}() for default random seed (#6962 )	2024-04-29 16:35:45 +03:00
Christian Zhou-Zheng	3055a41805	convert : fix conversion of some BERT embedding models (#6937 )	2024-04-29 16:34:41 +03:00
Przemysław Pawełczyk	577277ffd2	make : change GNU make default CXX from g++ to c++ (#6966 )	2024-04-29 16:08:20 +03:00
Przemysław Pawełczyk	ca7f29f568	ci : add building in MSYS2 environments (Windows) (#6967 )	2024-04-29 15:59:47 +03:00
Johannes Gäßler	c4f708a93f	llama : fix typo LAMMAFILE -> LLAMAFILE (#6974 )	2024-04-29 15:36:22 +03:00
DAN™	e00b4a8f81	Fix more int overflow during quant (PPL/CUDA). (#6563 ) * Fix more int overflow during quant. * Fix some more int overflow in softmax. * Revert back to int64_t.	2024-04-29 00:38:44 +02:00
Xuan Son Nguyen	7bb36ccf91	gguf : enforce that tensor names are unique (#6905 ) * not allow adding duplicated tensor name * no duplicated tensor while reading gguf * typo * throw exception inside llama_model_loader Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-04-28 17:36:18 +02:00
Neo Zhang	ce023f6f2f	add device version in device list (#6959 ) Co-authored-by: arthw <>	2024-04-28 22:40:31 +08:00
github-actions[bot]	6e472f58e4	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/5c24cf2f0a12ad855f444c30b2421d044120c66f?narHash=sha256-XtTSSIB2DA6tOv%2Bl0FhvfDMiyCmhoRbNB%2B0SeInZkbk%3D' (2024-04-19) → 'github:NixOS/nixpkgs/7bb2ccd8cdc44c91edba16c48d2c8f331fb3d856?narHash=sha256-Drmja/f5MRHZCskS6mvzFqxEaZMeciScCTFxWVLqWEY%3D' (2024-04-25)	2024-04-28 11:12:50 +00:00
mgroeber9110	4dba7e8114	Replace "alternative" boolean operator in conditional compilation directive (#6949 )	2024-04-27 21:02:06 +02:00
Pierrick Hymbert	b7368332e2	ci: server: tests python env on github container ubuntu latest / fix n_predict (#6935 ) * ci: server: fix python env * ci: server: fix server tests after #6638 * ci: server: fix windows is not building PR branch	2024-04-27 17:50:48 +02:00
agray3	928e0b7013	Reset schedule earlier to allow overlap with ggml graph computation on device (#6933 ) * Reset schedule earlier to allow overlap with graph computation on device	2024-04-26 20:08:30 +02:00
Pierrick Hymbert	0c4d489e29	quantize: add imatrix and dataset metadata in GGUF (#6658 ) * imatrix: save the dataset file used in the output file * llama: support kv overrides type string string * common: factorize KV Overrides parsing between common and server * quantize: add imatrix n entries and dataset KV metadata quantize: factorize KV Overrides parsing between common #6656 * llama: remove kv override str_value initialization as it does not compile on some toolchain * quantize: add imatrix m_last_call as `quantize.imatrix.chunks_count` * quantize: add imatrix filename in KV * llama: add llama_model_kv_override_free * common: add llama_model_kv_override_free common: free kv override if used after model loading * llama: finally move the string KV override value to the stack * llama : minor * no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators. Co-authored-by: slaren <slarengh@gmail.com> * kv override: ensure string termination --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-04-26 20:06:33 +02:00
slaren	017e6999b5	add basic tensor data validation function (#6884 ) * add basic tensor data validation function * add --check-tensors command line argument tensor validation is disabled by default and can be enabled by adding `--check-tensors` to the command line arguments. quantize always validates tensors.	2024-04-26 18:39:58 +02:00
slaren	e2764cd7ca	gguf : fix mismatch between alloc and free functions (#6929 )	2024-04-26 18:07:42 +03:00
Justine Tunney	4b1c3c98b4	llamafile : use 64-bit integers in sgemm (#6928 )	2024-04-26 17:05:33 +03:00
Pierrick Hymbert	bbe3c6e761	ci: server: fix python installation (#6925 )	2024-04-26 12:27:25 +02:00
Pierrick Hymbert	7f5ff558ee	server: stop generation at `n_ctx_train` if `n_predict` is not set (#6638 ) * server: cap n_predict if not set to n_ctx_train * server: fix infinite loop * server: infinite loop, move in process_token server: infinite loop: set stop limit to true * minor: spaces * minor: spaces * server: include prompt tokens in the EOS limit	2024-04-26 12:15:30 +02:00
Pierrick Hymbert	9e4e077ec5	ci: server: fix python installation (#6922 )	2024-04-26 11:11:51 +02:00
Georgi Gerganov	83b72cb086	Merge pull request from GHSA-p5mv-gjc5-mwqv * always use calloc clamp n_kv on failure to read a kv * ggml : alternative ctx->header.n_kv update --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-04-26 10:41:53 +03:00
Pierrick Hymbert	d4a9afc100	ci: server: fix python installation (#6918 )	2024-04-26 09:27:49 +02:00
Pierrick Hymbert	7d641c26ac	ci: fix concurrency for pull_request_target (#6917 )	2024-04-26 09:26:59 +02:00
Pierrick Hymbert	5790c8dac1	bench: server add stop word for PHI-2 (#6916 )	2024-04-26 09:26:16 +02:00
vik	46e12c4692	llava : add support for moondream vision language model (#6899 ) * add support for moondream vision language model This required making the following changes to the CLIP model: 1. Support for patch embedding bias. 2. Make class embedding and pre-layernorm optional. 3. Add support for post-layernorm. * Update examples/llava/clip.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-25 22:38:31 +03:00
Georgi Gerganov	dba497e0c1	cmake : restore LLAMA_LLAMAFILE_DEFAULT	2024-04-25 21:37:27 +03:00
Georgi Gerganov	fa0b4ad252	cmake : remove obsolete ANDROID check	2024-04-25 18:59:51 +03:00
slaren	d6e1d44f16	llama : synchronize before get/set session data (#6911 )	2024-04-25 17:59:03 +02:00
Georgi Gerganov	853d06ffe2	ci : tmp disable slow tests	2024-04-25 17:06:27 +03:00
BarfingLemurs	3fe0596c18	readme : update model list (#6908 ) * Update README.md * missing space * llama3 !	2024-04-25 16:52:28 +03:00

1 2 3 4 5 ...

2781 commits