llama.cpp

Author	SHA1	Message	Date
Pierrick HYMBERT	12731d21c5	eval-callback: fix make toolchain	2024-04-11 14:28:44 +02:00
Pierrick HYMBERT	ee588a5c24	eval-callback: renamed from ggml-debug	2024-04-11 14:10:47 +02:00
Pierrick HYMBERT	28fd76ffd4	ggml-debug: remove block size	2024-04-11 14:07:09 +02:00
Pierrick HYMBERT	8d7be2c986	ggml-debug: printing also the sum of each tensor	2024-04-11 13:05:45 +02:00
Georgi Gerganov	bb359cdd7d	gitignore : ggml-debug	2024-04-11 13:58:58 +03:00
Pierrick HYMBERT	cfb820b343	ggml-debug: better tensor type support	2024-04-10 23:23:22 +02:00
Pierrick HYMBERT	3f8a93fb7b	ci: add curl test	2024-04-10 22:50:36 +02:00
Pierrick HYMBERT	831c97efc7	common: allow the warmup to be disabled in llama_init_from_gpt_params	2024-04-10 22:42:04 +02:00
Pierrick HYMBERT	52a8e0640a	ggml-debug: ci add test curl label	2024-04-10 22:36:03 +02:00
Pierrick HYMBERT	f84473da64	ggml-debug: tests add the main label	2024-04-10 22:18:45 +02:00
Pierrick HYMBERT	a42ebbd596	ggml-debug: add to make toolchain	2024-04-10 22:12:10 +02:00
Pierrick HYMBERT	0b3392887b	doc: add a model: add a link to ggml-debug	2024-04-10 21:57:48 +02:00
Pierrick HYMBERT	deadf29759	Merge remote-tracking branch 'origin/master' into hp/ggml/debug	2024-04-10 21:55:23 +02:00
Pierrick HYMBERT	368272c54b	ggml_debug: add main test label	2024-04-10 21:52:04 +02:00
Pierrick HYMBERT	1a031d39ae	ci: build revert label	2024-04-10 21:48:48 +02:00
Pierrick HYMBERT	f3f0d1818f	common: fix cb_eval and user data not initialized	2024-04-10 21:40:52 +02:00
Pierrick HYMBERT	ca6f3ff4e0	ggml_debug: fix trailing spaces	2024-04-10 21:20:32 +02:00
Pierrick HYMBERT	08fa088d74	ggml_debug: fix trailing spaces	2024-04-10 21:19:09 +02:00
Pierrick HYMBERT	fe4b1915f1	ggml_debug: Remove unused param n_batch, no batching here	2024-04-10 21:15:21 +02:00
Pierrick HYMBERT	2d34bbe2a0	ggml_debug: EOL in CMakeLists.txt	2024-04-10 21:07:45 +02:00
Pierrick HYMBERT	cda1d42a64	ggml_debug: ci: add tests	2024-04-10 21:06:12 +02:00
Pierrick HYMBERT	01dd5e9776	ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random.	2024-04-10 20:52:38 +02:00
Pierrick HYMBERT	8fe3be8d68	llama: cv eval: move cb eval field in common gpt_params	2024-04-10 20:51:54 +02:00
Ralph Soika	b3a96f27f0	minor layout improvements (#6572 ) * minor layout improvements * added missing file, run deps.sh locally	2024-04-10 19:18:25 +02:00
slaren	4f407a0a35	llama : add model types for mixtral (#6589 )	2024-04-10 17:24:14 +02:00
slaren	65c64dc36f	convert.py : add consolidated.safetensors for mixtral 8x22b (#6587 )	2024-04-10 15:23:12 +02:00
Pierrick HYMBERT	f63b722486	gguf-debug: no mutex, verify type, fix stride.	2024-04-10 09:50:45 +02:00
Pierrick Hymbert	67fac4b95f	docs : how to add a model (#6565 ) * docs: how to add a model * docs: model: typo and docs * docs: model: add prevision on RoPE * docs: model: rephrasing README.md * docs: model: rephrasing README.md * docs: model: README.md fix trailing spaces * docs : some fixes * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-10 09:58:48 +03:00
Artem Zinnatullin	29122d32ac	readme : fix ROCm link (#6579 )	2024-04-10 09:49:12 +03:00
sjxx	b231b37b09	readme : update UI list (#6560 )	2024-04-10 09:34:00 +03:00
Pierrick HYMBERT	067e294783	gguf-debug: Example how to use ggml callback for debugging	2024-04-10 03:53:29 +02:00
Jiří Sejkora	ba5e134e07	readme: fix typo in amdgpu target name (#6573 )	2024-04-10 00:23:02 +02:00
Jared Van Bortel	1b67731e18	BERT tokenizer fixes (#6498 ) Key changes: * BERT conversion: fix abuse of LlamaHfVocab, do not set BOS or EOS * Nomic Embed conversion: pad vocab instead of slicing embedding tensor * llama_tokenize: handle added special tokens like HF does	2024-04-09 13:44:08 -04:00
Georgi Gerganov	c4a3a4ff47	sync : ggml	2024-04-09 20:29:06 +03:00
Ed Lee	400d5d722d	server : detect search query to start webchat (#6554 )	2024-04-09 10:31:47 +02:00
Carolinabanana	5dc9dd7152	llama : add Command R Plus support (#6491 ) * Add Command R Plus GGUF * Add Command R Plus GGUF * Loading works up to LayerNorm2D * Export new tensors in 1D so they are not quantized. * Fix embedding layer based on Noeda's example * Whitespace * Add line * Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda) * dranger003: Fix block index overflow in CUDA dequantizing. * Reverted blocked multiplication code as it still has issues and could affect other Llama arches * export norms as f32 * fix overflow issues during quant and other cleanup * Type convention Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * dranger003: Fix more int overflow during quant. --------- Co-authored-by: S <seast@Ss-Mac-Studio.local> Co-authored-by: S <s@example.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 11:16:13 +03:00
Georgi Gerganov	e11a8999b5	license : update copyright notice + add AUTHORS (#6405 ) * license : add AUTHORS * authors : update * scipts : add LICENSE and gen-authors.sh to sync	2024-04-09 09:23:19 +03:00
Georgi Gerganov	cc4a95426d	llama : fix attention layer count sanity check (#6550 ) * llama : fix attention layer count sanity check * llama : fix parentheses in attention layer count sanity check There was otherwise a warning when compiling. --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-04-08 22:25:49 +03:00
kunnis	cecd8d3c98	Comment explaining a decision (#6531 )	2024-04-08 17:44:19 +02:00
Georgi Gerganov	b73e564b16	quantize : fix precedence of cli args (#6541 )	2024-04-08 16:23:01 +03:00
Rick G	e3c337d87c	llama : support negative ith in llama_get_ API (#6519 ) * llama_sampling_sample with default args is more naively usable * Batches populated by either llama_batch_get_one or llama_batch_add work with default args * Previously get_one could use the default argument * Previously add should usually have used the last index where logits[idx] == true * This hopefully encourages the use of llama_batch_add * By giving expected results when using default arguments. * Adds "negative indexing" feature to llama_get_logits_ith and llama_get_embeddings_ith * Believed to work with any currently well behaved program * Default arg now works for both cases (previously would give strange results for add case) * Any non-negative number is unaffected and behaves as previously * Negative arguments were previously invalid. * Implemented as a special case of indexing as suggested by @compilade in https://github.com/ggerganov/llama.cpp/pull/6519 * Fixed mismatch type errors * cited in macOS CI tests * Missed in original updates based on PR feedback in https://github.com/ggerganov/llama.cpp/pull/6519	2024-04-08 16:02:30 +03:00
Jan Boon	beea6e1b16	llama : save and restore kv cache for single seq id (#6341 ) * llama : save and restore kv cache for single seq id * remove trailing whitespace * respond error in case there's no space in the kv cache * add kv seq save restore to test case * add --slot-save-path arg to enable save restore and restrict save location * Returning 0 for some cases, instead of asserting. * cleanup error cases * rename sequence state functions * rename state get set functions * add previous function names back in with DEPRECATED notice * update doc * adjust endpoints to preferred style * fix restoring zero cell count * handle seq rm return value * unused param * keep in the size check * fix return types * add server test case for slot save restore * cleanup * add cake * cleanup style * add special * removing a whole sequence never fails * move sequence state file functionality from server to llama to match session api and add version tags * catch exceptions on save as well * error log messages * check types for stricter restore * update server doc * readme : update API changes date * strict filename validation * move include, reject bom as well * also reject empty filename * reject whitespace and trailing dot --------- Co-authored-by: Martin Evans <martindevans@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-08 15:43:30 +03:00
Abhilash Majumder	87fb5b4234	remove row=1 cond (#6532 )	2024-04-08 16:26:01 +08:00
Firat	d752327c33	Adding KodiBot to UI list (#6535 ) KodiBot is free and open source ai chat app released under the GNU General Public License.	2024-04-08 09:48:29 +02:00
Mark Fairbairn	855f54402e	Change Windows AMD example to release build to make inference much faster. (#6525 )	2024-04-07 20:52:19 +02:00
Georgi Gerganov	b909236c0b	flake.lock: Update (#6517 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/f7b3c975cf067e56e7cda6cb098ebe3fb4d74ca2' (2024-03-01) → 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d' (2024-04-01) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8?dir=lib' (2024-02-29) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib' (2024-03-29) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) → 'github:NixOS/nixpkgs/fd281bd6b7d3e32ddfa399853946f782553163b5' (2024-04-03) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-04-07 11:25:30 -07:00
DAN™	e0717e751e	Add GritLM as supported models. (#6513 )	2024-04-07 19:33:59 +02:00
Georgi Gerganov	c37247796b	sync : ggml	2024-04-07 17:05:51 +03:00
Slava Primenko	f77261a7c5	ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020) `cudaHostRegisterReadOnly` parameter was only introduced in CUDA 11.1 See this issue for more details: https://github.com/ggerganov/examples/whisper/whisper.cpp/issues/2007	2024-04-07 17:05:40 +03:00
Georgi Gerganov	43e8995e75	scripts : sync ggml-cuda folder	2024-04-07 16:08:12 +03:00

1 2 3 4 5 ...

2671 commits