Commit graph

3068 commits

Author SHA1 Message Date
k.h.lai
fcda1128bc
vulkan: add workaround for iterator boundary check to fix clang-cl debug build (#7426) 2024-05-22 14:53:21 +02:00
Justine Tunney
03d8900ebe
llama : add missing model type names (#7445) 2024-05-22 14:08:18 +03:00
Georgi Gerganov
9b3d833189
cuda : fix compile warning (#7454) 2024-05-22 12:36:37 +03:00
Johannes Gäßler
95fb0aefab
CUDA: remove incorrect precision check (#7454) 2024-05-22 10:24:29 +02:00
Georgi Gerganov
3e5faa8503
cuda : fix rope + add tests (#7452)
* cuda : fix rope pos data

ggml-ci

* ggml : drop mode & 1 == 1 support for ggml_rope

ggml-ci

* ggml : support freq_factors for f16 rope (CPU)

ggml-ci

* tests : add rope tests using frequency factors

ggml-ci
2024-05-22 11:01:35 +03:00
teleprint-me
12285b5325
chore: Map model file and vocab types 2024-05-22 02:58:12 -04:00
teleprint-me
0b43e14030
refactor: Add experimental mapping for BPE pre-tokenizers 2024-05-21 22:45:45 -04:00
teleprint-me
34e14ae96d
refactor: Add experimental model mappings 2024-05-21 19:11:51 -04:00
liuwei-git
201cc11afa
llama : add phi3 128K model support (#7225)
* add phi3 128k support in convert-hf-to-gguf

* add phi3 128k support in cuda

* address build warnings on llama.cpp

* adjust index value in cuda long rope freq factors

* add long rope support in ggml cpu backend

* make freq factors only depend on ctx size

* remove unused rope scaling type 'su' frin gguf converter

* fix flint warnings on convert-hf-to-gguf.py

* set to the short freq factor when context size is small than trained context size

* add one line of comments

* metal : support rope freq_factors

* ggml : update ggml_rope_ext API to support freq. factors

* backends : add dev messages to support rope freq. factors

* minor : style

* tests : update to use new rope API

* backends : fix pragma semicolons

* minor : cleanup

* llama : move rope factors from KV header to tensors

* llama : remove tmp assert

* cuda : fix compile warning

* convert : read/write n_head_kv

* llama : fix uninitialized tensors

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-21 23:28:32 +03:00
teleprint-me
b2aac685d5
docs: Fix comment 2024-05-21 16:07:12 -04:00
teleprint-me
83b9fcd3e4
refactor: Rename constants to reduce confusion between references 2024-05-21 16:06:39 -04:00
Georgi Gerganov
6369bf0433
metal : handle F16 inf values, fix FA partial offload (#7434)
ggml-ci
2024-05-21 23:03:42 +03:00
Olivier Chafik
e402de364b
grammars: fix resampling logic regression (#7424) 2024-05-21 20:40:00 +01:00
Johannes Gäßler
fcf6538ba6
CUDA: fix unused warning in mmq.cu (#7442) 2024-05-21 20:27:12 +03:00
Georgi Gerganov
c3f8d58356
tests : test-tokenizer-0.sh print more info (#7402) 2024-05-21 19:53:48 +03:00
Amir
11474e756d
examples: cache hf model when --model not provided (#7353)
* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided
2024-05-21 17:13:12 +03:00
Johannes Gäßler
d8ee902227
CUDA: deduplicate mmq code (#7397) 2024-05-21 16:02:12 +02:00
jaime-m-p
d7e852c1bc
Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425)
* Update brute force test: add_special
* Update brute force test: default values for add_bos_token and add_eos_token
* Enable rtrim when pre-inserting BOS

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Revert "server : fix test regexes"
2024-05-21 14:39:48 +02:00
teleprint-me
2fe28ad4d3
chore: Rename from repo to model repo and reorder for improved readability 2024-05-21 01:41:35 -04:00
teleprint-me
4768650aff
chore: Add formatting, set common vocab files, apply pattern to model map 2024-05-21 01:38:29 -04:00
teleprint-me
fb32f50834
feat: Add hf model mapping descriptors for each repo 2024-05-21 01:07:13 -04:00
teleprint-me
a3bdac091c
chore: Remove unused enum import reference 2024-05-21 00:46:31 -04:00
teleprint-me
6296206392
chore: Apply deduped token type references 2024-05-21 00:45:06 -04:00
teleprint-me
a35b76755f
Merge branch 'master' into auto-model-support 2024-05-21 00:16:34 -04:00
teleprint-me
aed0573f68
proto: Add experimental vocab pre-tokenizer regular expressions 2024-05-21 00:14:26 -04:00
teleprint-me
12537fdabc
chore: Add tokenizer constants for model metadata 2024-05-21 00:13:49 -04:00
teleprint-me
5978bb007d
chore: Fix and update comments 2024-05-20 14:59:40 -04:00
teleprint-me
2fa2c7a86c
chore: Move enums and model map to constants 2024-05-20 14:51:03 -04:00
teleprint-me
d9ba963cd4
refactor: Restructure tokenizer model metadata 2024-05-20 14:42:59 -04:00
jaime-m-p
917dc8cfa6
Tokenizer SPM fixes for phi-3 and llama-spm (#7375)
* Update brute force test: special tokens
* Fix added tokens
  - Try to read 'added_tokens.json'.
  - Try to read 'tokenizer_config.json'.
  - Try to read 'tokenizer.json'.
* Fix special tokens rtrim

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : fix test regexes
2024-05-20 20:15:57 +02:00
teleprint-me
18bb36e496
chore: Allow the user to config the logger 2024-05-20 14:06:21 -04:00
Georgi Gerganov
fabf30b4c4
llama : remove Persimmon (#7408)
* llama : remove Persimmon

* requirements : remove
2024-05-21 02:35:28 +10:00
Johannes Gäßler
20385cebcc
perplexity: update README FP16 results [no ci] (#7413) 2024-05-20 18:15:38 +02:00
Radoslav Gerganov
db10f01310
rpc : track allocated buffers (#7411)
* rpc : track allocated buffers

ref: #7407

* rpc : pack rpc_tensor tightly
2024-05-20 16:36:55 +03:00
Georgi Gerganov
3bc10cb485
server : fix temperature + disable some tests (#7409)
* server : fix temperature

* server : disable tests relying on parallel determinism

* ci : change server Debug -> RelWithDebInfo
2024-05-20 22:10:03 +10:00
AidanBeltonS
6bf9b66fa3
[SYCL] Update SYCL upscale operation (#7321)
* Update SYCL upscale operation

* Formatting

* Remove messages
2024-05-20 16:38:23 +05:30
Bingan
26cd4237bc
Update README.md (#7410) 2024-05-20 11:55:34 +02:00
Herman Semenov
213e90ed73
ggml-opencl, llama: using reserve() if count already known (#7272) 2024-05-20 10:33:21 +03:00
junchao-loongson
65c58207ec
ggml : add loongarch lsx and lasx support (#6454)
* add loongarch lsx and lasx optimize code

* Add loongarch compilation support to makefile

* revert stb_image.h

* opt bytes_from_nibbles_32 and sum_i16_pairs_float

* fix undeclared

* format code

* update

* update 2

---------

Co-authored-by: Jinyang He <hejinyang@loongson.cn>
2024-05-20 10:19:21 +03:00
Georgi Gerganov
1cc0155d04
server : tuning tests (#7388)
* server : don't pass temperature as string

* server : increase timeout

* tests : fix the fix 0.8f -> 0.8

ggml-ci

* tests : set explicit temperature
2024-05-20 10:16:41 +03:00
Georgi Gerganov
e932094d58
server : return error on too large embedding input (#7389) 2024-05-20 08:56:05 +03:00
Georgi Gerganov
2789baf480
tests : fix --keep_split -> --keep-split (#7374) 2024-05-20 08:55:09 +03:00
teleprint-me
bdd0286bd0
refactor: Use proper names for referenced member variables 2024-05-20 01:39:09 -04:00
teleprint-me
a1951e27dc
refactor: Add proper names for remote model references 2024-05-20 01:36:44 -04:00
teleprint-me
6fc4492b3f
chore: Add english pangram to vocab tests 2024-05-20 00:51:35 -04:00
teleprint-me
381dad5eb3
fix: Add missing model architectures 2024-05-20 00:50:42 -04:00
teleprint-me
9a2834e24e
fix: Use __name__ as logger name 2024-05-19 22:39:30 -04:00
teleprint-me
a0362ea475
patch: Fix nested quotes for dict refs 2024-05-19 22:39:05 -04:00
teleprint-me
89a46fe818
feat: Attempt to mirror the llama.cpp API for compatibility 2024-05-19 22:31:05 -04:00
teleprint-me
c6f2a48af7
feat: Add prototype for identifying the vocab type 2024-05-19 22:30:37 -04:00