Commit graph

3084 commits

Author SHA1 Message Date
Yazan Agha-Schrader
80888e93cc renaming to ensure consistency 2024-05-31 06:17:40 +02:00
Yazan Agha-Schrader
d9742fbf4e fix wrong link to old ui 2024-05-31 05:37:55 +02:00
Yazan Agha-Schrader
bb9542b54f include new ui in cpp 2024-05-31 05:37:55 +02:00
Yazan Agha-Schrader
0d75e07bd9
Merge branch 'ggerganov:master' into server-ui-pr 2024-05-30 08:28:26 +02:00
Meng, Hengyu
3854c9d07f
[SYCL] fix intel docker (#7630)
* Update main-intel.Dockerfile

* workaround for https://github.com/intel/oneapi-containers/issues/70

* reset intel docker in CI

* add missed in server
2024-05-30 16:19:08 +10:00
Yazan Agha-Schrader
505d0a3346 move new ui to "/public" due to otherwise problematic CORS behaviour 2024-05-30 04:00:56 +02:00
Yazan Agha-Schrader
8b937a1a71 add a button to the new ui 2024-05-30 03:59:28 +02:00
Galunid
eb57fee51f
gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627) 2024-05-30 02:10:40 +02:00
Yazan Agha-Schrader
734be4dcc9 Merge branch 'master' into server-ui-pr 2024-05-30 01:47:22 +02:00
Yazan Agha-Schrader
d55081767c fix css path 2024-05-30 01:17:47 +02:00
Yazan Agha-Schrader
89b1b38144 move files, clean code 2024-05-30 01:13:10 +02:00
Yazan Agha-Schrader
63de7201fa set default prompt to empty 2024-05-29 22:34:15 +02:00
Yazan Agha-Schrader
dcdc11a5c4 add cmd-r prompt et reduce redundancy 2024-05-29 22:24:24 +02:00
Yazan Agha-Schrader
87bcbbb6c2 fix toggle state localstorage 2024-05-29 22:23:40 +02:00
Georgi Gerganov
55d62262a9
metal : remove invalid asserts (#7617) 2024-05-29 22:21:20 +03:00
Yazan Agha-Schrader
c2badb4697 add hacky llama2 prompt solution, reduce redundancy in promptFormats.js 2024-05-29 20:03:20 +02:00
Georgi Gerganov
975ec63ff2
metal : add missing asserts (#7617) 2024-05-29 20:45:25 +03:00
Georgi Gerganov
fb76ec31a9
ggml : fix YARN + add tests + add asserts (#7617)
* tests : add rope tests

ggml-ci

* ggml : fixes (hopefully)

ggml-ci

* tests : add non-cont tests

ggml-ci

* cuda : add asserts for rope/norm + fix DS2

ggml-ci

* ggml : assert contiguousness

* tests : reduce RoPE tests

ggml-ci
2024-05-29 20:17:31 +03:00
Georgi Gerganov
cce3dcffc5
cuda : non-cont concat support (#7610)
* tests : add non-cont concat tests

* cuda : non-cont concat support

ggml-ci
2024-05-29 15:38:26 +03:00
Yazan Agha-Schrader
1c24ab6e20 move prompt style 2024-05-29 14:09:19 +02:00
Radoslav Gerganov
210d99173d
llama-bench : add support for the RPC backend (#7435) 2024-05-29 14:45:44 +03:00
slaren
87bdf2a199
ggml : use atomic_flag for critical section (#7598)
* ggml : use atomic_flag for critical section

* add windows shims
2024-05-29 13:36:39 +02:00
Yazan Agha-Schrader
f2ef89415c do not separate with new line or comma 2024-05-29 13:36:07 +02:00
Yazan Agha-Schrader
39a163f76e add missing char 2024-05-29 13:32:33 +02:00
Georgi Gerganov
00281b7be3
scripts : remove mpi remnants 2024-05-29 14:31:18 +03:00
Georgi Gerganov
2ab977282b
sync : ggml 2024-05-29 14:29:52 +03:00
Georgi Gerganov
72de268bec
ggml : restore ggml_rope_xpos_inplace (ggml/0)
ggml-ci
2024-05-29 14:29:33 +03:00
Yazan Agha-Schrader
513406ab60 add more comon stop tokens 2024-05-29 13:29:00 +02:00
Yazan Agha-Schrader
80b6143f78 more prompt format fixes 2024-05-29 13:19:22 +02:00
Yazan Agha-Schrader
ca565f4ed6 fix llama3 prompt template 2024-05-29 12:08:39 +02:00
Yazan Agha-Schrader
9fa0aa53f5 fix chatml & add llama3 format 2024-05-29 11:26:34 +02:00
Yazan Agha-Schrader
5fa255edfb add user message suffix 2024-05-29 10:28:07 +02:00
Yazan Agha-Schrader
eac8d739a5 update forgotten css theme 2024-05-29 08:54:04 +02:00
Akarshan Biswas
0e8d8bfd6c
Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605) 2024-05-29 16:53:47 +10:00
Yazan Agha-Schrader
aa493e022d add css class 2024-05-29 08:45:20 +02:00
Yazan Agha-Schrader
9bb074e1f6 add phi3 to dropdown 2024-05-29 06:28:27 +02:00
Yazan Agha-Schrader
be675948d4 add phi-3 prompt template 2024-05-29 05:28:52 +02:00
zhouwg
504f0c340f
ggml : fix typo in ggml.c (#7603) 2024-05-29 04:09:31 +02:00
Meng, Hengyu
b864b50ce5
[SYCL] Align GEMM dispatch (#7566)
* align GEMM dispatch
2024-05-29 07:00:24 +08:00
jaime-m-p
02c1ecad07
Tokenizer WPM fixes (#7500)
* Update random test: add_bos_token.
* Update random test: add WPM models for testing.
* Build vocab.special_tokens_cache using vocab token types.
* Fix and improve WPM preprocessing.
  - Fix unicode edge case combinations.
  - Split by whitspace in the same pass.
* Discard all tokens when no matching found.
2024-05-28 21:46:34 +02:00
Georgi Gerganov
6bd12ce409
sycl : fix assert (#7563) 2024-05-28 22:22:50 +03:00
Giuseppe Scrivano
5442939fcc
llama : support small Granite models (#7481)
* Add optional MLP bias for Granite models

Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.

* llama: honor add_space_prefix from the model configuration

propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

* llama: add support for small granite models

it works only for the small models 3b and 8b.

The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

---------

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Co-authored-by: Steffen Roecker <sroecker@redhat.com>
2024-05-28 21:49:49 +03:00
k.h.lai
56411a950f
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552) 2024-05-28 19:25:08 +02:00
Radoslav Gerganov
2b737caae1
rpc : resource management rework (#7562)
* rpc : resource management rework

* address review comments
2024-05-28 18:13:36 +03:00
fairydreaming
ee3dff6b8e
Add support for DeepseekV2ForCausalLM (#7519)
* common : increase max number of experts to 160

* common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture

* common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier

* convert-hf : add model conversion support for DeepseekV2ForCausalLM

* llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models

* llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor)

* llama : add inference support for LLM_ARCH_DEEPSEEK2

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-05-28 17:07:05 +02:00
Georgi Gerganov
edc29433fa
tests : fix test-tokenizer-0.sh 2024-05-28 15:04:09 +03:00
Georgi Gerganov
8b99e2aa66
llama : handle unknown utf8 bytes (#7588) 2024-05-28 13:55:35 +03:00
Brian
271ff3fc44
github: add refactor to issue template (#7561)
* github: add refactor issue template [no ci]

* Update 07-refactor.yml
2024-05-28 20:27:27 +10:00
Neo Zhang
e2b065071c
[SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436)
* fix mul_mat_id to match the change of api

* rm comment

* rm unused or duplicated code, rename as review comment
2024-05-28 10:53:37 +01:00
Georgi Gerganov
0548a4187f
ggml : generalize GGML_OP_CONCAT (#7563)
* ggml : generalize GGML_OP_CONCAT (WIP)

ggml-ci

* tests : add dim != 2 tests

* metal : generalize concat kernel

* tests : naming

* cuda : generalize concat kernel

ggml-ci

* sycl : add warning and assert

* ggml : fix op params handling

* metal : bugfix kernel

ggml-ci

* ggml : reimplement CPU and Metal

* cuda : add asserts

ggml-ci

* ggml : fix ptrs

ggml-ci
2024-05-28 11:04:19 +03:00