jaime-m-p
|
4d441e4acf
|
wip: fixing unicode codepoint ranges
|
2024-05-04 01:36:13 +02:00 |
|
jaime-m-p
|
3e3e2838a1
|
Add bruteforce random tests for token encoding
|
2024-05-04 01:34:36 +02:00 |
|
jaime-m-p
|
0c6d820b89
|
Style
|
2024-04-30 13:18:25 +02:00 |
|
jaime-m-p
|
2cd1eb0daa
|
Add alternative regex for custom aplit llama3
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
2024-04-30 13:02:46 +02:00 |
|
jaime-m-p
|
1d8fcc06ba
|
GPT2 custom regex split
|
2024-04-29 19:13:18 +02:00 |
|
jaime-m-p
|
5c38f6ed7a
|
Move unused variable value
|
2024-04-29 19:11:37 +02:00 |
|
jaime-m-p
|
b66cdd1c24
|
Merge remote-tracking branch 'upstream/gg/bpe-preprocess' into gg/bpe-preprocess
|
2024-04-29 16:01:07 +02:00 |
|
Georgi Gerganov
|
80cb3127df
|
tests : disable test-tokenizer-1-bpe due to slowness
ggml-ci
|
2024-04-29 15:24:39 +03:00 |
|
Georgi Gerganov
|
3202676f5d
|
llama : more prominent warning for old BPE models
|
2024-04-29 15:24:27 +03:00 |
|
Georgi Gerganov
|
6d6ce93959
|
tests : use faster bpe test
ggml-ci
|
2024-04-29 14:47:25 +03:00 |
|
Georgi Gerganov
|
9a7d430ff2
|
tests : disable obsolete
ggml-ci
|
2024-04-29 14:12:34 +03:00 |
|
Georgi Gerganov
|
120cf37d54
|
models : add phi-3, mpt, gpt-2, starcoder
|
2024-04-29 13:40:30 +03:00 |
|
jaime-m-p
|
a0c870db85
|
Fix merge
|
2024-04-29 11:09:52 +02:00 |
|
jaime-m-p
|
866e3941f7
|
Merge branch 'ggerganov:gg/bpe-preprocess' into gg/bpe-preprocess
|
2024-04-29 10:55:15 +02:00 |
|
Georgi Gerganov
|
c21ab1833e
|
scripts : ignore new update script in check-requirements.sh
|
2024-04-29 11:24:05 +03:00 |
|
Georgi Gerganov
|
af05268cdd
|
unicode : cleanup
|
2024-04-29 11:20:42 +03:00 |
|
Georgi Gerganov
|
c68d2596ea
|
tests : add more vocabs and tests
ggml-ci
|
2024-04-29 11:09:17 +03:00 |
|
Georgi Gerganov
|
43708d22c3
|
tests : refactor vocab tests
ggml-ci
|
2024-04-29 10:46:43 +03:00 |
|
Georgi Gerganov
|
ef4cca9e87
|
cmake : refactor test targets
|
2024-04-29 09:53:49 +03:00 |
|
jaime-m-p
|
0cf9ed3457
|
Restore BOM
|
2024-04-29 01:35:08 +02:00 |
|
jaime-m-p
|
2a48873914
|
Typing
|
2024-04-29 00:12:56 +02:00 |
|
jaime-m-p
|
6e4d2af6c3
|
already exists unicode_tolower()
|
2024-04-28 21:57:22 +02:00 |
|
Georgi Gerganov
|
7b1210f6a8
|
lint : fix
|
2024-04-28 22:51:13 +03:00 |
|
jaime-m-p
|
5cc4b2cf01
|
Using char32_t for codepoints
|
2024-04-28 21:51:12 +02:00 |
|
Georgi Gerganov
|
78081502e9
|
convert : exercise contractions
ggml-ci
|
2024-04-28 22:18:20 +03:00 |
|
Georgi Gerganov
|
0f9058ceec
|
convert : add comments
|
2024-04-28 22:10:04 +03:00 |
|
Georgi Gerganov
|
02fd977fe1
|
convert : remove unused functions
|
2024-04-28 22:03:21 +03:00 |
|
Georgi Gerganov
|
e8dd4a1494
|
lint : fix
|
2024-04-28 22:02:10 +03:00 |
|
Georgi Gerganov
|
491f2339bb
|
lint : fix
|
2024-04-28 21:42:58 +03:00 |
|
Georgi Gerganov
|
1545550ec2
|
unicode : normalize signatures
|
2024-04-28 21:40:36 +03:00 |
|
Georgi Gerganov
|
1c888eb4da
|
convert : add falcon
ggml-ci
|
2024-04-28 21:26:40 +03:00 |
|
Georgi Gerganov
|
4e3e6d8ecc
|
lint : update
|
2024-04-28 21:16:50 +03:00 |
|
Georgi Gerganov
|
7642973616
|
convert : add convert-hf-to-gguf-update.py
ggml-ci
|
2024-04-28 20:52:31 +03:00 |
|
jaime-m-p
|
e11fe2fb6a
|
llama3 custom regex split
|
2024-04-28 19:27:06 +02:00 |
|
Georgi Gerganov
|
ee6d1b3fb4
|
unicode : simplify
|
2024-04-28 18:36:57 +03:00 |
|
Georgi Gerganov
|
e972e6cbf8
|
unicode : clean-up
|
2024-04-28 18:30:37 +03:00 |
|
Georgi Gerganov
|
d63cc9068b
|
Merge branch 'master' into gg/bpe-preprocess
ggml-ci
|
2024-04-28 15:34:45 +03:00 |
|
Georgi Gerganov
|
b97add52a4
|
unicode : category support via std::regex
|
2024-04-28 15:15:57 +03:00 |
|
github-actions[bot]
|
6e472f58e4
|
flake.lock: Update
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/5c24cf2f0a12ad855f444c30b2421d044120c66f?narHash=sha256-XtTSSIB2DA6tOv%2Bl0FhvfDMiyCmhoRbNB%2B0SeInZkbk%3D' (2024-04-19)
→ 'github:NixOS/nixpkgs/7bb2ccd8cdc44c91edba16c48d2c8f331fb3d856?narHash=sha256-Drmja/f5MRHZCskS6mvzFqxEaZMeciScCTFxWVLqWEY%3D' (2024-04-25)
|
2024-04-28 11:12:50 +00:00 |
|
mgroeber9110
|
4dba7e8114
|
Replace "alternative" boolean operator in conditional compilation directive (#6949)
|
2024-04-27 21:02:06 +02:00 |
|
Pierrick Hymbert
|
b7368332e2
|
ci: server: tests python env on github container ubuntu latest / fix n_predict (#6935)
* ci: server: fix python env
* ci: server: fix server tests after #6638
* ci: server: fix windows is not building PR branch
|
2024-04-27 17:50:48 +02:00 |
|
Georgi Gerganov
|
581c4a0239
|
unicode : try fix windows
|
2024-04-27 18:36:00 +03:00 |
|
Georgi Gerganov
|
91eaa414bf
|
unicode : support \p{N}, \p{L} and \p{P} natively
|
2024-04-27 17:48:38 +03:00 |
|
Georgi Gerganov
|
ce5485aee0
|
unicode : always use std::wregex
|
2024-04-27 17:11:34 +03:00 |
|
Georgi Gerganov
|
2affd0b221
|
unicode : set bomb
|
2024-04-27 11:56:02 +03:00 |
|
Georgi Gerganov
|
a22645c2a7
|
unicode : set bomb
|
2024-04-27 11:48:24 +03:00 |
|
Georgi Gerganov
|
4434c9d6c2
|
minor
|
2024-04-27 11:33:16 +03:00 |
|
Georgi Gerganov
|
ad929833cb
|
llama : adapt punctuation regex + add llama 3 regex
|
2024-04-27 11:06:08 +03:00 |
|
Georgi Gerganov
|
96965f67e6
|
models : add llama v3 vocab file
|
2024-04-27 11:05:12 +03:00 |
|
Georgi Gerganov
|
c160818ec0
|
wip
|
2024-04-27 00:28:36 +03:00 |
|