Commit graph

12 commits

Author SHA1 Message Date
Georgi Gerganov
e972e6cbf8
unicode : clean-up 2024-04-28 18:30:37 +03:00
Georgi Gerganov
b97add52a4
unicode : category support via std::regex 2024-04-28 15:15:57 +03:00
Georgi Gerganov
91eaa414bf
unicode : support \p{N}, \p{L} and \p{P} natively 2024-04-27 17:48:38 +03:00
Georgi Gerganov
ce5485aee0
unicode : always use std::wregex 2024-04-27 17:11:34 +03:00
Georgi Gerganov
2affd0b221
unicode : set bomb 2024-04-27 11:56:02 +03:00
Georgi Gerganov
ad929833cb
llama : adapt punctuation regex + add llama 3 regex 2024-04-27 11:06:08 +03:00
Georgi Gerganov
06d3e693db
unicode : fix? unicode_wstring_to_utf8 2024-04-26 12:55:11 +03:00
Kazim Abrar Mahi
753580360b
Fixed issues 2024-04-26 11:43:29 +03:00
Kazim Abrar Mahi
feeaf4f39c
Added needed functionality, testing remains 2024-04-26 11:43:29 +03:00
Kazim Abrar Mahi
7e308ed212
Adding unicode regex function 2024-04-26 11:43:29 +03:00
Kazim Abrar Mahi
a5710a4101
Adding unicode regex mappings 2024-04-26 11:43:29 +03:00
Jared Van Bortel
32c8486e1f
wpm : portable unicode tolower (#6305)
Also use C locale for ispunct/isspace, and split unicode-data.cpp from unicode.cpp.
2024-03-26 17:46:21 -04:00