llama.cpp

Author	SHA1	Message	Date
Pierrick HYMBERT	6f813dcc6a	Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx	2024-04-10 19:24:38 +02:00
Ralph Soika	b3a96f27f0	minor layout improvements (#6572 ) * minor layout improvements * added missing file, run deps.sh locally	2024-04-10 19:18:25 +02:00
slaren	4f407a0a35	llama : add model types for mixtral (#6589 )	2024-04-10 17:24:14 +02:00
slaren	65c64dc36f	convert.py : add consolidated.safetensors for mixtral 8x22b (#6587 )	2024-04-10 15:23:12 +02:00
Pierrick Hymbert	67fac4b95f	docs : how to add a model (#6565 ) * docs: how to add a model * docs: model: typo and docs * docs: model: add prevision on RoPE * docs: model: rephrasing README.md * docs: model: rephrasing README.md * docs: model: README.md fix trailing spaces * docs : some fixes * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-10 09:58:48 +03:00
Artem Zinnatullin	29122d32ac	readme : fix ROCm link (#6579 )	2024-04-10 09:49:12 +03:00
sjxx	b231b37b09	readme : update UI list (#6560 )	2024-04-10 09:34:00 +03:00
Jiří Sejkora	ba5e134e07	readme: fix typo in amdgpu target name (#6573 )	2024-04-10 00:23:02 +02:00
Jared Van Bortel	1b67731e18	BERT tokenizer fixes (#6498 ) Key changes: * BERT conversion: fix abuse of LlamaHfVocab, do not set BOS or EOS * Nomic Embed conversion: pad vocab instead of slicing embedding tensor * llama_tokenize: handle added special tokens like HF does	2024-04-09 13:44:08 -04:00
Georgi Gerganov	c4a3a4ff47	sync : ggml	2024-04-09 20:29:06 +03:00
Pierrick HYMBERT	e5631cf25a	Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx	2024-04-09 15:10:51 +02:00
Ed Lee	400d5d722d	server : detect search query to start webchat (#6554 )	2024-04-09 10:31:47 +02:00
Carolinabanana	5dc9dd7152	llama : add Command R Plus support (#6491 ) * Add Command R Plus GGUF * Add Command R Plus GGUF * Loading works up to LayerNorm2D * Export new tensors in 1D so they are not quantized. * Fix embedding layer based on Noeda's example * Whitespace * Add line * Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda) * dranger003: Fix block index overflow in CUDA dequantizing. * Reverted blocked multiplication code as it still has issues and could affect other Llama arches * export norms as f32 * fix overflow issues during quant and other cleanup * Type convention Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * dranger003: Fix more int overflow during quant. --------- Co-authored-by: S <seast@Ss-Mac-Studio.local> Co-authored-by: S <s@example.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 11:16:13 +03:00
Georgi Gerganov	e11a8999b5	license : update copyright notice + add AUTHORS (#6405 ) * license : add AUTHORS * authors : update * scipts : add LICENSE and gen-authors.sh to sync	2024-04-09 09:23:19 +03:00
Pierrick HYMBERT	ac75fbd8c5	gguf-py: dbrx: reverse again the MOE tensors mapping: layer.ffn_up_exps -> Up-projection weights (w1) layer.ffn_gate_exps -> Gating weights (v1) layer.ffn_down_exps -> Down-projection weights (w2)	2024-04-09 02:41:39 +02:00
Pierrick HYMBERT	ac82aa0e63	gguf-py: revert spaces	2024-04-09 01:26:57 +02:00
Pierrick HYMBERT	c7b9a2e85e	llama: dbrx: fix ggml context of the attention outputs weight	2024-04-09 00:58:50 +02:00
Pierrick HYMBERT	55943a281f	model: dbrx: convert fix mixed ffn_gate_exps and ffn_down_exps	2024-04-08 21:47:59 +02:00
Georgi Gerganov	cc4a95426d	llama : fix attention layer count sanity check (#6550 ) * llama : fix attention layer count sanity check * llama : fix parentheses in attention layer count sanity check There was otherwise a warning when compiling. --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-04-08 22:25:49 +03:00
Pierrick HYMBERT	ea8b58c6cd	llama: dbrx: first add the residuals and then do the norm	2024-04-08 21:10:49 +02:00
Pierrick HYMBERT	f30a73bb01	llama: dbrx: rename layer_out_norm to attn_out_norm	2024-04-08 20:38:31 +02:00
Pierrick HYMBERT	e66f1e3448	llama: dbrx: document changes, permute only FFN_DOWN_EXPS. Add a check for ftype	2024-04-08 20:08:54 +02:00
Pierrick HYMBERT	9968952921	llama: dbrx: fix experts 3D tensor layout (again)	2024-04-08 19:37:23 +02:00
Pierrick HYMBERT	18a84fedda	llama: dbrx: fix experts 3D tensor layout (again)	2024-04-08 19:12:53 +02:00
Pierrick HYMBERT	48909ed2a7	model: dbrx convert permute experts directly torch, log shape	2024-04-08 19:01:44 +02:00
Pierrick HYMBERT	f20c04f01f	llama: factorize moe graph implementation between grok, mixtral and dbrx	2024-04-08 17:45:35 +02:00
kunnis	cecd8d3c98	Comment explaining a decision (#6531 )	2024-04-08 17:44:19 +02:00
Pierrick HYMBERT	21fb24aa45	model: dbrx: convert-hf-to-gguf.py fix experts tensors shapes	2024-04-08 16:55:56 +02:00
Georgi Gerganov	b73e564b16	quantize : fix precedence of cli args (#6541 )	2024-04-08 16:23:01 +03:00
Pierrick HYMBERT	81f308ad64	llama: dbrx: fix experts tensor layout	2024-04-08 15:04:18 +02:00
Rick G	e3c337d87c	llama : support negative ith in llama_get_ API (#6519 ) * llama_sampling_sample with default args is more naively usable * Batches populated by either llama_batch_get_one or llama_batch_add work with default args * Previously get_one could use the default argument * Previously add should usually have used the last index where logits[idx] == true * This hopefully encourages the use of llama_batch_add * By giving expected results when using default arguments. * Adds "negative indexing" feature to llama_get_logits_ith and llama_get_embeddings_ith * Believed to work with any currently well behaved program * Default arg now works for both cases (previously would give strange results for add case) * Any non-negative number is unaffected and behaves as previously * Negative arguments were previously invalid. * Implemented as a special case of indexing as suggested by @compilade in https://github.com/ggerganov/llama.cpp/pull/6519 * Fixed mismatch type errors * cited in macOS CI tests * Missed in original updates based on PR feedback in https://github.com/ggerganov/llama.cpp/pull/6519	2024-04-08 16:02:30 +03:00
Jan Boon	beea6e1b16	llama : save and restore kv cache for single seq id (#6341 ) * llama : save and restore kv cache for single seq id * remove trailing whitespace * respond error in case there's no space in the kv cache * add kv seq save restore to test case * add --slot-save-path arg to enable save restore and restrict save location * Returning 0 for some cases, instead of asserting. * cleanup error cases * rename sequence state functions * rename state get set functions * add previous function names back in with DEPRECATED notice * update doc * adjust endpoints to preferred style * fix restoring zero cell count * handle seq rm return value * unused param * keep in the size check * fix return types * add server test case for slot save restore * cleanup * add cake * cleanup style * add special * removing a whole sequence never fails * move sequence state file functionality from server to llama to match session api and add version tags * catch exceptions on save as well * error log messages * check types for stricter restore * update server doc * readme : update API changes date * strict filename validation * move include, reject bom as well * also reject empty filename * reject whitespace and trailing dot --------- Co-authored-by: Martin Evans <martindevans@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-08 15:43:30 +03:00
Pierrick HYMBERT	eb0847e6b1	llama: dbrx: load norm eps in hparams	2024-04-08 14:38:21 +02:00
Pierrick HYMBERT	506cc2ea53	llama: dbrx: convert remove previous reverse	2024-04-08 14:09:06 +02:00
Pierrick HYMBERT	35dce3e145	llama: dbrx: rename tensor to actual meaning. Fix normalization in graph. Permute expert tensors to the llama.cpp layout	2024-04-08 14:02:08 +02:00
Pierrick HYMBERT	8e22688401	llama: dbrx: move norm epsilon to convert. Fix missing normalization.	2024-04-08 11:22:24 +02:00
Pierrick HYMBERT	52c6276e12	llama: dbrx: fix k scale	2024-04-08 10:43:36 +02:00
Abhilash Majumder	87fb5b4234	remove row=1 cond (#6532 )	2024-04-08 16:26:01 +08:00
Firat	d752327c33	Adding KodiBot to UI list (#6535 ) KodiBot is free and open source ai chat app released under the GNU General Public License.	2024-04-08 09:48:29 +02:00
Pierrick HYMBERT	71f9e479aa	llama: dbrx: Try another rope type	2024-04-08 01:29:00 +02:00
Pierrick HYMBERT	f8f97e74f9	llama: dbrx: hardcode nn.LayerNorm epsilon	2024-04-08 01:17:33 +02:00
Pierrick HYMBERT	74e6d876f6	llama: dbrx: fix build kv att out tensor name	2024-04-08 00:37:28 +02:00
Pierrick HYMBERT	b01b062ab5	llama: dbrx: fix build kv att out	2024-04-08 00:25:54 +02:00
Pierrick HYMBERT	993f836029	llama: dbrx: move norm2 after attention, fix build kv	2024-04-08 00:11:19 +02:00
Pierrick HYMBERT	2897aa628c	llama: dbrx: revert	2024-04-07 23:47:26 +02:00
Pierrick HYMBERT	830e46d7ae	llama: dbrx: fix last normalization	2024-04-07 23:40:12 +02:00
Pierrick HYMBERT	0ab1bae854	llama: dbrx: output norm dim	2024-04-07 20:56:53 +02:00
Mark Fairbairn	855f54402e	Change Windows AMD example to release build to make inference much faster. (#6525 )	2024-04-07 20:52:19 +02:00
Georgi Gerganov	b909236c0b	flake.lock: Update (#6517 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/f7b3c975cf067e56e7cda6cb098ebe3fb4d74ca2' (2024-03-01) → 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d' (2024-04-01) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8?dir=lib' (2024-02-29) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib' (2024-03-29) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) → 'github:NixOS/nixpkgs/fd281bd6b7d3e32ddfa399853946f782553163b5' (2024-04-03) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-04-07 11:25:30 -07:00
Pierrick HYMBERT	50b4373673	model: dbrx: weird fix expert reshape	2024-04-07 20:14:43 +02:00

1 2 3 4 5 ...

2715 commits