Commit graph

2715 commits

Author SHA1 Message Date
Pierrick HYMBERT
6f813dcc6a Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx 2024-04-10 19:24:38 +02:00
Ralph Soika
b3a96f27f0
minor layout improvements (#6572)
* minor layout improvements

* added missing file, run deps.sh locally
2024-04-10 19:18:25 +02:00
slaren
4f407a0a35
llama : add model types for mixtral (#6589) 2024-04-10 17:24:14 +02:00
slaren
65c64dc36f
convert.py : add consolidated.safetensors for mixtral 8x22b (#6587) 2024-04-10 15:23:12 +02:00
Pierrick Hymbert
67fac4b95f
docs : how to add a model (#6565)
* docs: how to add a model

* docs: model: typo and docs

* docs: model: add prevision on RoPE

* docs: model: rephrasing README.md

* docs: model: rephrasing README.md

* docs: model: README.md fix trailing spaces

* docs : some fixes

* Update README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-10 09:58:48 +03:00
Artem Zinnatullin
29122d32ac
readme : fix ROCm link (#6579) 2024-04-10 09:49:12 +03:00
sjxx
b231b37b09
readme : update UI list (#6560) 2024-04-10 09:34:00 +03:00
Jiří Sejkora
ba5e134e07
readme: fix typo in amdgpu target name (#6573) 2024-04-10 00:23:02 +02:00
Jared Van Bortel
1b67731e18
BERT tokenizer fixes (#6498)
Key changes:
* BERT conversion: fix abuse of LlamaHfVocab, do not set BOS or EOS
* Nomic Embed conversion: pad vocab instead of slicing embedding tensor
* llama_tokenize: handle added special tokens like HF does
2024-04-09 13:44:08 -04:00
Georgi Gerganov
c4a3a4ff47
sync : ggml 2024-04-09 20:29:06 +03:00
Pierrick HYMBERT
e5631cf25a Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx 2024-04-09 15:10:51 +02:00
Ed Lee
400d5d722d
server : detect search query to start webchat (#6554) 2024-04-09 10:31:47 +02:00
Carolinabanana
5dc9dd7152
llama : add Command R Plus support (#6491)
* Add Command R Plus GGUF

* Add Command R Plus GGUF

* Loading works up to LayerNorm2D

* Export new tensors in 1D so they are not quantized.

* Fix embedding layer based on Noeda's example

* Whitespace

* Add line

* Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda)

* dranger003: Fix block index overflow in CUDA dequantizing.

* Reverted blocked multiplication code as it still has issues and could affect other Llama arches

* export norms as f32

* fix overflow issues during quant and other cleanup

* Type convention

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* dranger003: Fix more int overflow during quant.

---------

Co-authored-by: S <seast@Ss-Mac-Studio.local>
Co-authored-by: S <s@example.com>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-09 11:16:13 +03:00
Georgi Gerganov
e11a8999b5
license : update copyright notice + add AUTHORS (#6405)
* license : add AUTHORS

* authors : update

* scipts : add LICENSE and gen-authors.sh to sync
2024-04-09 09:23:19 +03:00
Pierrick HYMBERT
ac75fbd8c5 gguf-py: dbrx: reverse again the MOE tensors mapping:
layer.ffn_up_exps   -> Up-projection weights (w1)
    layer.ffn_gate_exps -> Gating weights (v1)
    layer.ffn_down_exps -> Down-projection weights (w2)
2024-04-09 02:41:39 +02:00
Pierrick HYMBERT
ac82aa0e63 gguf-py: revert spaces 2024-04-09 01:26:57 +02:00
Pierrick HYMBERT
c7b9a2e85e llama: dbrx: fix ggml context of the attention outputs weight 2024-04-09 00:58:50 +02:00
Pierrick HYMBERT
55943a281f model: dbrx: convert fix mixed ffn_gate_exps and ffn_down_exps 2024-04-08 21:47:59 +02:00
Georgi Gerganov
cc4a95426d
llama : fix attention layer count sanity check (#6550)
* llama : fix attention layer count sanity check

* llama : fix parentheses in attention layer count sanity check

There was otherwise a warning when compiling.

---------

Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-04-08 22:25:49 +03:00
Pierrick HYMBERT
ea8b58c6cd llama: dbrx: first add the residuals and then do the norm 2024-04-08 21:10:49 +02:00
Pierrick HYMBERT
f30a73bb01 llama: dbrx: rename layer_out_norm to attn_out_norm 2024-04-08 20:38:31 +02:00
Pierrick HYMBERT
e66f1e3448 llama: dbrx: document changes, permute only FFN_DOWN_EXPS. Add a check for ftype 2024-04-08 20:08:54 +02:00
Pierrick HYMBERT
9968952921 llama: dbrx: fix experts 3D tensor layout (again) 2024-04-08 19:37:23 +02:00
Pierrick HYMBERT
18a84fedda llama: dbrx: fix experts 3D tensor layout (again) 2024-04-08 19:12:53 +02:00
Pierrick HYMBERT
48909ed2a7 model: dbrx convert permute experts directly torch, log shape 2024-04-08 19:01:44 +02:00
Pierrick HYMBERT
f20c04f01f llama: factorize moe graph implementation between grok, mixtral and dbrx 2024-04-08 17:45:35 +02:00
kunnis
cecd8d3c98
Comment explaining a decision (#6531) 2024-04-08 17:44:19 +02:00
Pierrick HYMBERT
21fb24aa45 model: dbrx: convert-hf-to-gguf.py fix experts tensors shapes 2024-04-08 16:55:56 +02:00
Georgi Gerganov
b73e564b16
quantize : fix precedence of cli args (#6541) 2024-04-08 16:23:01 +03:00
Pierrick HYMBERT
81f308ad64 llama: dbrx: fix experts tensor layout 2024-04-08 15:04:18 +02:00
Rick G
e3c337d87c
llama : support negative ith in llama_get_ API (#6519)
* llama_sampling_sample with default args is more naively usable

* Batches populated by either llama_batch_get_one or llama_batch_add work with default args
  * Previously get_one could use the default argument
  * Previously add should usually have used the last index where logits[idx] == true
* This hopefully encourages the use of llama_batch_add
  * By giving expected results when using default arguments.
* Adds "negative indexing" feature to llama_get_logits_ith and llama_get_embeddings_ith
* Believed to work with any currently well behaved program
  * Default arg now works for both cases (previously would give strange results for add case)
  * Any non-negative number is unaffected and behaves as previously
  * Negative arguments were previously invalid.
* Implemented as a special case of indexing as suggested by @compilade in https://github.com/ggerganov/llama.cpp/pull/6519

* Fixed mismatch type errors

* cited in macOS CI tests
* Missed in original updates based on PR feedback in https://github.com/ggerganov/llama.cpp/pull/6519
2024-04-08 16:02:30 +03:00
Jan Boon
beea6e1b16
llama : save and restore kv cache for single seq id (#6341)
* llama : save and restore kv cache for single seq id

* remove trailing whitespace

* respond error in case there's no space in the kv cache

* add kv seq save restore to test case

* add --slot-save-path arg to enable save restore and restrict save location

* Returning 0 for some cases, instead of asserting.

* cleanup error cases

* rename sequence state functions

* rename state get set functions

* add previous function names back in with DEPRECATED notice

* update doc

* adjust endpoints to preferred style

* fix restoring zero cell count

* handle seq rm return value

* unused param

* keep in the size check

* fix return types

* add server test case for slot save restore

* cleanup

* add cake

* cleanup style

* add special

* removing a whole sequence never fails

* move sequence state file functionality from server to llama to match session api and add version tags

* catch exceptions on save as well

* error log messages

* check types for stricter restore

* update server doc

* readme : update API changes date

* strict filename validation

* move include, reject bom as well

* also reject empty filename

* reject whitespace and trailing dot

---------

Co-authored-by: Martin Evans <martindevans@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-08 15:43:30 +03:00
Pierrick HYMBERT
eb0847e6b1 llama: dbrx: load norm eps in hparams 2024-04-08 14:38:21 +02:00
Pierrick HYMBERT
506cc2ea53 llama: dbrx: convert remove previous reverse 2024-04-08 14:09:06 +02:00
Pierrick HYMBERT
35dce3e145 llama: dbrx: rename tensor to actual meaning. Fix normalization in graph. Permute expert tensors to the llama.cpp layout 2024-04-08 14:02:08 +02:00
Pierrick HYMBERT
8e22688401 llama: dbrx: move norm epsilon to convert. Fix missing normalization. 2024-04-08 11:22:24 +02:00
Pierrick HYMBERT
52c6276e12 llama: dbrx: fix k scale 2024-04-08 10:43:36 +02:00
Abhilash Majumder
87fb5b4234
remove row=1 cond (#6532) 2024-04-08 16:26:01 +08:00
Firat
d752327c33
Adding KodiBot to UI list (#6535)
KodiBot is free and open source ai chat app released under the GNU General Public License.
2024-04-08 09:48:29 +02:00
Pierrick HYMBERT
71f9e479aa llama: dbrx: Try another rope type 2024-04-08 01:29:00 +02:00
Pierrick HYMBERT
f8f97e74f9 llama: dbrx: hardcode nn.LayerNorm epsilon 2024-04-08 01:17:33 +02:00
Pierrick HYMBERT
74e6d876f6 llama: dbrx: fix build kv att out tensor name 2024-04-08 00:37:28 +02:00
Pierrick HYMBERT
b01b062ab5 llama: dbrx: fix build kv att out 2024-04-08 00:25:54 +02:00
Pierrick HYMBERT
993f836029 llama: dbrx: move norm2 after attention, fix build kv 2024-04-08 00:11:19 +02:00
Pierrick HYMBERT
2897aa628c llama: dbrx: revert 2024-04-07 23:47:26 +02:00
Pierrick HYMBERT
830e46d7ae llama: dbrx: fix last normalization 2024-04-07 23:40:12 +02:00
Pierrick HYMBERT
0ab1bae854 llama: dbrx: output norm dim 2024-04-07 20:56:53 +02:00
Mark Fairbairn
855f54402e
Change Windows AMD example to release build to make inference much faster. (#6525) 2024-04-07 20:52:19 +02:00
Georgi Gerganov
b909236c0b
flake.lock: Update (#6517)
Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/f7b3c975cf067e56e7cda6cb098ebe3fb4d74ca2' (2024-03-01)
  → 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d' (2024-04-01)
• Updated input 'flake-parts/nixpkgs-lib':
    'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8?dir=lib' (2024-02-29)
  → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib' (2024-03-29)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29)
  → 'github:NixOS/nixpkgs/fd281bd6b7d3e32ddfa399853946f782553163b5' (2024-04-03)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-04-07 11:25:30 -07:00
Pierrick HYMBERT
50b4373673 model: dbrx: weird fix expert reshape 2024-04-07 20:14:43 +02:00