Make numerous improvements

- Python static hello world now 1.8mb
- Python static fully loaded now 10mb
- Python HTTPS client now uses MbedTLS
- Python REPL now completes import stmts
- Increase stack size for Python for now
- Begin synthesizing posixpath and ntpath
- Restore Python \N{UNICODE NAME} support
- Restore Python NFKD symbol normalization
- Add optimized code path for Intel SHA-NI
- Get more Python unit tests passing faster
- Get Python help() pagination working on NT
- Python hashlib now supports MbedTLS PBKDF2
- Make memcpy/memmove/memcmp/bcmp/etc. faster
- Add Mersenne Twister and Vigna to LIBC_RAND
- Provide privileged __printf() for error code
- Fix zipos opendir() so that it reports ENOTDIR
- Add basic chmod() implementation for Windows NT
- Add Cosmo's best functions to Python cosmo module
- Pin function trace indent depth to that of caller
- Show memory diagram on invalid access in MODE=dbg
- Differentiate stack overflow on crash in MODE=dbg
- Add stb_truetype and tools for analyzing font files
- Upgrade to UNICODE 13 and reduce its binary footprint
- COMPILE.COM now logs resource usage of build commands
- Start implementing basic poll() support on bare metal
- Set getauxval(AT_EXECFN) to GetModuleFileName() on NT
- Add descriptions to strerror() in non-TINY build modes
- Add COUNTBRANCH() macro to help with micro-optimizations
- Make error / backtrace / asan / memory code more unbreakable
- Add fast perfect C implementation of μ-Law and a-Law audio codecs
- Make strtol() functions consistent with other libc implementations
- Improve Linenoise implementation (see also github.com/jart/bestline)
- COMPILE.COM now suppresses stdout/stderr of successful build commands
This commit is contained in:
Justine Tunney 2021-09-27 22:58:51 -07:00
parent fa7b4f5bd1
commit 39bf41f4eb
806 changed files with 77494 additions and 63859 deletions

View file

@ -0,0 +1,281 @@
# SpecialCasing-14.0.0.txt
# Date: 2021-03-08, 19:35:55 GMT
# © 2021 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
# For documentation, see http://www.unicode.org/reports/tr44/
#
# Special Casing
#
# This file is a supplement to the UnicodeData.txt file. It does not define any
# properties, but rather provides additional information about the casing of
# Unicode characters, for situations when casing incurs a change in string length
# or is dependent on context or locale. For compatibility, the UnicodeData.txt
# file only contains simple case mappings for characters where they are one-to-one
# and independent of context and language. The data in this file, combined with
# the simple case mappings in UnicodeData.txt, defines the full case mappings
# Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (uc).
#
# Note that the preferred mechanism for defining tailored casing operations is
# the Unicode Common Locale Data Repository (CLDR). For more information, see the
# discussion of case mappings and case algorithms in the Unicode Standard.
#
# All code points not listed in this file that do not have a simple case mappings
# in UnicodeData.txt map to themselves.
# ================================================================================
# Format
# ================================================================================
# The entries in this file are in the following machine-readable format:
#
# <code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment>
#
# <code>, <lower>, <title>, and <upper> provide the respective full case mappings
# of <code>, expressed as character values in hex. If there is more than one character,
# they are separated by spaces. Other than as used to separate elements, spaces are
# to be ignored.
#
# The <condition_list> is optional. Where present, it consists of one or more language IDs
# or casing contexts, separated by spaces. In these conditions:
# - A condition list overrides the normal behavior if all of the listed conditions are true.
# - The casing context is always the context of the characters in the original string,
# NOT in the resulting string.
# - Case distinctions in the condition list are not significant.
# - Conditions preceded by "Not_" represent the negation of the condition.
# The condition list is not represented in the UCD as a formal property.
#
# A language ID is defined by BCP 47, with '-' and '_' treated equivalently.
#
# A casing context for a character is defined by Section 3.13 Default Case Algorithms
# of The Unicode Standard.
#
# Parsers of this file must be prepared to deal with future additions to this format:
# * Additional contexts
# * Additional fields
# ================================================================================
# ================================================================================
# Unconditional mappings
# ================================================================================
# The German es-zed is special--the normal mapping is to SS.
# Note: the titlecase should never occur in practice. It is equal to titlecase(uppercase(<es-zed>))
00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
# Preserve canonical equivalence for I with dot. Turkic is handled below.
0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE
# Ligatures
FB00; FB00; 0046 0066; 0046 0046; # LATIN SMALL LIGATURE FF
FB01; FB01; 0046 0069; 0046 0049; # LATIN SMALL LIGATURE FI
FB02; FB02; 0046 006C; 0046 004C; # LATIN SMALL LIGATURE FL
FB03; FB03; 0046 0066 0069; 0046 0046 0049; # LATIN SMALL LIGATURE FFI
FB04; FB04; 0046 0066 006C; 0046 0046 004C; # LATIN SMALL LIGATURE FFL
FB05; FB05; 0053 0074; 0053 0054; # LATIN SMALL LIGATURE LONG S T
FB06; FB06; 0053 0074; 0053 0054; # LATIN SMALL LIGATURE ST
0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
FB13; FB13; 0544 0576; 0544 0546; # ARMENIAN SMALL LIGATURE MEN NOW
FB14; FB14; 0544 0565; 0544 0535; # ARMENIAN SMALL LIGATURE MEN ECH
FB15; FB15; 0544 056B; 0544 053B; # ARMENIAN SMALL LIGATURE MEN INI
FB16; FB16; 054E 0576; 054E 0546; # ARMENIAN SMALL LIGATURE VEW NOW
FB17; FB17; 0544 056D; 0544 053D; # ARMENIAN SMALL LIGATURE MEN XEH
# No corresponding uppercase precomposed character
0149; 0149; 02BC 004E; 02BC 004E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
0390; 0390; 0399 0308 0301; 0399 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
03B0; 03B0; 03A5 0308 0301; 03A5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
01F0; 01F0; 004A 030C; 004A 030C; # LATIN SMALL LETTER J WITH CARON
1E96; 1E96; 0048 0331; 0048 0331; # LATIN SMALL LETTER H WITH LINE BELOW
1E97; 1E97; 0054 0308; 0054 0308; # LATIN SMALL LETTER T WITH DIAERESIS
1E98; 1E98; 0057 030A; 0057 030A; # LATIN SMALL LETTER W WITH RING ABOVE
1E99; 1E99; 0059 030A; 0059 030A; # LATIN SMALL LETTER Y WITH RING ABOVE
1E9A; 1E9A; 0041 02BE; 0041 02BE; # LATIN SMALL LETTER A WITH RIGHT HALF RING
1F50; 1F50; 03A5 0313; 03A5 0313; # GREEK SMALL LETTER UPSILON WITH PSILI
1F52; 1F52; 03A5 0313 0300; 03A5 0313 0300; # GREEK SMALL LETTER UPSILON WITH PSILI AND VARIA
1F54; 1F54; 03A5 0313 0301; 03A5 0313 0301; # GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA
1F56; 1F56; 03A5 0313 0342; 03A5 0313 0342; # GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI
1FB6; 1FB6; 0391 0342; 0391 0342; # GREEK SMALL LETTER ALPHA WITH PERISPOMENI
1FC6; 1FC6; 0397 0342; 0397 0342; # GREEK SMALL LETTER ETA WITH PERISPOMENI
1FD2; 1FD2; 0399 0308 0300; 0399 0308 0300; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND VARIA
1FD3; 1FD3; 0399 0308 0301; 0399 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
1FD6; 1FD6; 0399 0342; 0399 0342; # GREEK SMALL LETTER IOTA WITH PERISPOMENI
1FD7; 1FD7; 0399 0308 0342; 0399 0308 0342; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI
1FE2; 1FE2; 03A5 0308 0300; 03A5 0308 0300; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND VARIA
1FE3; 1FE3; 03A5 0308 0301; 03A5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
1FE4; 1FE4; 03A1 0313; 03A1 0313; # GREEK SMALL LETTER RHO WITH PSILI
1FE6; 1FE6; 03A5 0342; 03A5 0342; # GREEK SMALL LETTER UPSILON WITH PERISPOMENI
1FE7; 1FE7; 03A5 0308 0342; 03A5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND PERISPOMENI
1FF6; 1FF6; 03A9 0342; 03A9 0342; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI
# IMPORTANT-when iota-subscript (0345) is uppercased or titlecased,
# the result will be incorrect unless the iota-subscript is moved to the end
# of any sequence of combining marks. Otherwise, the accents will go on the capital iota.
# This process can be achieved by first transforming the text to NFC before casing.
# E.g. <alpha><iota_subscript><acute> is uppercased to <ALPHA><acute><IOTA>
# The following cases are already in the UnicodeData.txt file, so are only commented here.
# 0345; 0345; 0399; 0399; # COMBINING GREEK YPOGEGRAMMENI
# All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iota adscript)
# have special uppercases.
# Note: characters with PROSGEGRAMMENI are actually titlecase, not uppercase!
1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAMMENI
1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND YPOGEGRAMMENI
1F83; 1F83; 1F8B; 1F0B 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA AND YPOGEGRAMMENI
1F84; 1F84; 1F8C; 1F0C 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA AND YPOGEGRAMMENI
1F85; 1F85; 1F8D; 1F0D 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA AND YPOGEGRAMMENI
1F86; 1F86; 1F8E; 1F0E 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI
1F87; 1F87; 1F8F; 1F0F 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI
1F88; 1F80; 1F88; 1F08 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI
1F89; 1F81; 1F89; 1F09 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI
1F8A; 1F82; 1F8A; 1F0A 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1F8B; 1F83; 1F8B; 1F0B 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1F8C; 1F84; 1F8C; 1F0C 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1F8D; 1F85; 1F8D; 1F0D 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1F8E; 1F86; 1F8E; 1F0E 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
1F8F; 1F87; 1F8F; 1F0F 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
1F90; 1F90; 1F98; 1F28 0399; # GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI
1F91; 1F91; 1F99; 1F29 0399; # GREEK SMALL LETTER ETA WITH DASIA AND YPOGEGRAMMENI
1F92; 1F92; 1F9A; 1F2A 0399; # GREEK SMALL LETTER ETA WITH PSILI AND VARIA AND YPOGEGRAMMENI
1F93; 1F93; 1F9B; 1F2B 0399; # GREEK SMALL LETTER ETA WITH DASIA AND VARIA AND YPOGEGRAMMENI
1F94; 1F94; 1F9C; 1F2C 0399; # GREEK SMALL LETTER ETA WITH PSILI AND OXIA AND YPOGEGRAMMENI
1F95; 1F95; 1F9D; 1F2D 0399; # GREEK SMALL LETTER ETA WITH DASIA AND OXIA AND YPOGEGRAMMENI
1F96; 1F96; 1F9E; 1F2E 0399; # GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI
1F97; 1F97; 1F9F; 1F2F 0399; # GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI
1F98; 1F90; 1F98; 1F28 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI
1F99; 1F91; 1F99; 1F29 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI
1F9A; 1F92; 1F9A; 1F2A 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1F9B; 1F93; 1F9B; 1F2B 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1F9C; 1F94; 1F9C; 1F2C 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1F9D; 1F95; 1F9D; 1F2D 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1F9E; 1F96; 1F9E; 1F2E 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
1F9F; 1F97; 1F9F; 1F2F 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
1FA0; 1FA0; 1FA8; 1F68 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI
1FA1; 1FA1; 1FA9; 1F69 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND YPOGEGRAMMENI
1FA2; 1FA2; 1FAA; 1F6A 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA AND YPOGEGRAMMENI
1FA3; 1FA3; 1FAB; 1F6B 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND VARIA AND YPOGEGRAMMENI
1FA4; 1FA4; 1FAC; 1F6C 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA AND YPOGEGRAMMENI
1FA5; 1FA5; 1FAD; 1F6D 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND OXIA AND YPOGEGRAMMENI
1FA6; 1FA6; 1FAE; 1F6E 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI
1FA7; 1FA7; 1FAF; 1F6F 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI
1FA8; 1FA0; 1FA8; 1F68 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI
1FA9; 1FA1; 1FA9; 1F69 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI
1FAA; 1FA2; 1FAA; 1F6A 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1FAB; 1FA3; 1FAB; 1F6B 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1FAC; 1FA4; 1FAC; 1F6C 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1FAD; 1FA5; 1FAD; 1F6D 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1FAE; 1FA6; 1FAE; 1F6E 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
1FAF; 1FA7; 1FAF; 1F6F 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
1FB3; 1FB3; 1FBC; 0391 0399; # GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI
1FBC; 1FB3; 1FBC; 0391 0399; # GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI
1FC3; 1FC3; 1FCC; 0397 0399; # GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI
1FCC; 1FC3; 1FCC; 0397 0399; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI
1FF3; 1FF3; 1FFC; 03A9 0399; # GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI
1FFC; 1FF3; 1FFC; 03A9 0399; # GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI
# Some characters with YPOGEGRAMMENI also have no corresponding titlecases
1FB2; 1FB2; 1FBA 0345; 1FBA 0399; # GREEK SMALL LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI
1FB4; 1FB4; 0386 0345; 0386 0399; # GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI
1FC2; 1FC2; 1FCA 0345; 1FCA 0399; # GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI
1FC4; 1FC4; 0389 0345; 0389 0399; # GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI
1FF2; 1FF2; 1FFA 0345; 1FFA 0399; # GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI
1FF4; 1FF4; 038F 0345; 038F 0399; # GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI
1FB7; 1FB7; 0391 0342 0345; 0391 0342 0399; # GREEK SMALL LETTER ALPHA WITH PERISPOMENI AND YPOGEGRAMMENI
1FC7; 1FC7; 0397 0342 0345; 0397 0342 0399; # GREEK SMALL LETTER ETA WITH PERISPOMENI AND YPOGEGRAMMENI
1FF7; 1FF7; 03A9 0342 0345; 03A9 0342 0399; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI
# ================================================================================
# Conditional Mappings
# The remainder of this file provides conditional casing data used to produce
# full case mappings.
# ================================================================================
# Language-Insensitive Mappings
# These are characters whose full case mappings do not depend on language, but do
# depend on context (which characters come before or after). For more information
# see the header of this file and the Unicode Standard.
# ================================================================================
# Special case for final form of sigma
03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA
# Note: the following cases for non-final are already in the UnicodeData.txt file.
# 03A3; 03C3; 03A3; 03A3; # GREEK CAPITAL LETTER SIGMA
# 03C3; 03C3; 03A3; 03A3; # GREEK SMALL LETTER SIGMA
# 03C2; 03C2; 03A3; 03A3; # GREEK SMALL LETTER FINAL SIGMA
# Note: the following cases are not included, since they would case-fold in lowercasing
# 03C3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK SMALL LETTER SIGMA
# 03C2; 03C3; 03A3; 03A3; Not_Final_Sigma; # GREEK SMALL LETTER FINAL SIGMA
# ================================================================================
# Language-Sensitive Mappings
# These are characters whose full case mappings depend on language and perhaps also
# context (which characters come before or after). For more information
# see the header of this file and the Unicode Standard.
# ================================================================================
# Lithuanian
# Lithuanian retains the dot in a lowercase i when followed by accents.
# Remove DOT ABOVE after "i" with upper or titlecase
0307; 0307; ; ; lt After_Soft_Dotted; # COMBINING DOT ABOVE
# Introduce an explicit dot above when lowercasing capital I's and J's
# whenever there are more accents above.
# (of the accents used in Lithuanian: grave, acute, tilde above, and ogonek)
0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I
004A; 006A 0307; 004A; 004A; lt More_Above; # LATIN CAPITAL LETTER J
012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH OGONEK
00CC; 0069 0307 0300; 00CC; 00CC; lt; # LATIN CAPITAL LETTER I WITH GRAVE
00CD; 0069 0307 0301; 00CD; 00CD; lt; # LATIN CAPITAL LETTER I WITH ACUTE
0128; 0069 0307 0303; 0128; 0128; lt; # LATIN CAPITAL LETTER I WITH TILDE
# ================================================================================
# Turkish and Azeri
# I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
# The following rules handle those cases.
0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE
# When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i.
# This matches the behavior of the canonically equivalent I-dot_above
0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE
0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE
# When lowercasing, unless an I is before a dot_above, it turns into a dotless i.
0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I
0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I
# When uppercasing, i turns into a dotted capital I
0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I
# Note: the following case is already in the UnicodeData.txt file.
# 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I
# EOF

View file

@ -1,6 +1,6 @@
# Blocks-13.0.0.txt
# Date: 2019-07-10, 19:06:00 GMT [KW]
# © 2019 Unicode®, Inc.
# Blocks-14.0.0.txt
# Date: 2021-01-22, 23:29:00 GMT [KW]
# © 2021 Unicode®, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
@ -38,9 +38,7 @@
0180..024F; Latin Extended-B
0250..02AF; IPA Extensions
02B0..02FF; Spacing Modifier Letters
0300..036F; Combining Diacritical Marks
0370..03FF; Greek and Coptic
0400..04FF; Cyrillic
0500..052F; Cyrillic Supplement
@ -54,6 +52,7 @@
0800..083F; Samaritan
0840..085F; Mandaic
0860..086F; Syriac Supplement
0870..089F; Arabic Extended-B
08A0..08FF; Arabic Extended-A
0900..097F; Devanagari
0980..09FF; Bengali
@ -75,7 +74,6 @@
1380..139F; Ethiopic Supplement
13A0..13FF; Cherokee
1400..167F; Unified Canadian Aboriginal Syllabics
1680..169F; Ogham
16A0..16FF; Runic
1700..171F; Tagalog
@ -218,7 +216,9 @@ FFF0..FFFF; Specials
104B0..104FF; Osage
10500..1052F; Elbasan
10530..1056F; Caucasian Albanian
10570..105BF; Vithkuqi
10600..1077F; Linear A
10780..107BF; Latin Extended-F
10800..1083F; Cypriot Syllabary
10840..1085F; Imperial Aramaic
10860..1087F; Palmyrene
@ -243,6 +243,7 @@ FFF0..FFFF; Specials
10E80..10EBF; Yezidi
10F00..10F2F; Old Sogdian
10F30..10F6F; Sogdian
10F70..10FAF; Old Uyghur
10FB0..10FDF; Chorasmian
10FE0..10FFF; Elymaic
11000..1107F; Brahmi
@ -262,13 +263,14 @@ FFF0..FFFF; Specials
11600..1165F; Modi
11660..1167F; Mongolian Supplement
11680..116CF; Takri
11700..1173F; Ahom
11700..1174F; Ahom
11800..1184F; Dogra
118A0..118FF; Warang Citi
11900..1195F; Dives Akuru
119A0..119FF; Nandinagari
11A00..11A4F; Zanabazar Square
11A50..11AAF; Soyombo
11AB0..11ABF; Unified Canadian Aboriginal Syllabics Extended-A
11AC0..11AFF; Pau Cin Hau
11C00..11C6F; Bhaiksuki
11C70..11CBF; Marchen
@ -280,11 +282,13 @@ FFF0..FFFF; Specials
12000..123FF; Cuneiform
12400..1247F; Cuneiform Numbers and Punctuation
12480..1254F; Early Dynastic Cuneiform
12F90..12FFF; Cypro-Minoan
13000..1342F; Egyptian Hieroglyphs
13430..1343F; Egyptian Hieroglyph Format Controls
14400..1467F; Anatolian Hieroglyphs
16800..16A3F; Bamum Supplement
16A40..16A6F; Mro
16A70..16ACF; Tangsa
16AD0..16AFF; Bassa Vah
16B00..16B8F; Pahawh Hmong
16E40..16E9F; Medefaidrin
@ -293,13 +297,15 @@ FFF0..FFFF; Specials
17000..187FF; Tangut
18800..18AFF; Tangut Components
18B00..18CFF; Khitan Small Script
18D00..18D8F; Tangut Supplement
18D00..18D7F; Tangut Supplement
1AFF0..1AFFF; Kana Extended-B
1B000..1B0FF; Kana Supplement
1B100..1B12F; Kana Extended-A
1B130..1B16F; Small Kana Extension
1B170..1B2FF; Nushu
1BC00..1BC9F; Duployan
1BCA0..1BCAF; Shorthand Format Controls
1CF00..1CFCF; Znamenny Musical Notation
1D000..1D0FF; Byzantine Musical Symbols
1D100..1D1FF; Musical Symbols
1D200..1D24F; Ancient Greek Musical Notation
@ -308,9 +314,12 @@ FFF0..FFFF; Specials
1D360..1D37F; Counting Rod Numerals
1D400..1D7FF; Mathematical Alphanumeric Symbols
1D800..1DAAF; Sutton SignWriting
1DF00..1DFFF; Latin Extended-G
1E000..1E02F; Glagolitic Supplement
1E100..1E14F; Nyiakeng Puachue Hmong
1E290..1E2BF; Toto
1E2C0..1E2FF; Wancho
1E7E0..1E7FF; Ethiopic Extended-B
1E800..1E8DF; Mende Kikakui
1E900..1E95F; Adlam
1EC70..1ECBF; Indic Siyaq Numbers

View file

@ -1,15 +1,15 @@
# EastAsianWidth-12.1.0.txt
# Date: 2019-03-31, 22:01:58 GMT [KW, LI]
# © 2019 Unicode®, Inc.
# EastAsianWidth-14.0.0.txt
# Date: 2021-07-06, 09:58:53 GMT [KW, LI]
# © 2021 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For terms of use, see https://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
# For documentation, see http://www.unicode.org/reports/tr44/
# For documentation, see https://www.unicode.org/reports/tr44/
#
# East_Asian_Width Property
#
# This file is an informative contributory data file in the
# This file is a normative contributory data file in the
# Unicode Character Database.
#
# The format is two fields separated by a semicolon.
@ -37,7 +37,7 @@
# with ranges of code points, the code point count in square brackets.
#
# For more information, see UAX #11: East Asian Width,
# at http://www.unicode.org/reports/tr11/
# at https://www.unicode.org/reports/tr11/
#
# @missing: 0000..10FFFF; N
0000..001F;N # Cc [32] <control-0000>..<control-001F>
@ -273,7 +273,7 @@
0610..061A;N # Mn [11] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL KASRA
061B;N # Po ARABIC SEMICOLON
061C;N # Cf ARABIC LETTER MARK
061E..061F;N # Po [2] ARABIC TRIPLE DOT PUNCTUATION MARK..ARABIC QUESTION MARK
061D..061F;N # Po [3] ARABIC END OF TEXT MARK..ARABIC QUESTION MARK
0620..063F;N # Lo [32] ARABIC LETTER KASHMIRI YEH..ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE
0640;N # Lm ARABIC TATWEEL
0641..064A;N # Lo [10] ARABIC LETTER FEH..ARABIC LETTER YEH
@ -331,9 +331,14 @@
0859..085B;N # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
085E;N # Po MANDAIC PUNCTUATION
0860..086A;N # Lo [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
08A0..08B4;N # Lo [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
08B6..08BD;N # Lo [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
08D3..08E1;N # Mn [15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN SAFHA
0870..0887;N # Lo [24] ARABIC LETTER ALEF WITH ATTACHED FATHA..ARABIC BASELINE ROUND DOT
0888;N # Sk ARABIC RAISED ROUND DOT
0889..088E;N # Lo [6] ARABIC LETTER NOON WITH INVERTED SMALL V..ARABIC VERTICAL TAIL
0890..0891;N # Cf [2] ARABIC POUND MARK ABOVE..ARABIC PIASTRE MARK ABOVE
0898..089F;N # Mn [8] ARABIC SMALL HIGH WORD AL-JUZ..ARABIC HALF MADDA OVER MADDA
08A0..08C8;N # Lo [41] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER GRAF
08C9;N # Lm ARABIC SMALL FARSI YEH
08CA..08E1;N # Mn [24] ARABIC SMALL HIGH FARSI YEH..ARABIC SMALL HIGH SIGN SAFHA
08E2;N # Cf ARABIC DISPUTED END OF AYAH
08E3..08FF;N # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA
0900..0902;N # Mn [3] DEVANAGARI SIGN INVERTED CANDRABINDU..DEVANAGARI SIGN ANUSVARA
@ -450,7 +455,7 @@
0B47..0B48;N # Mc [2] ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI
0B4B..0B4C;N # Mc [2] ORIYA VOWEL SIGN O..ORIYA VOWEL SIGN AU
0B4D;N # Mn ORIYA SIGN VIRAMA
0B56;N # Mn ORIYA AI LENGTH MARK
0B55..0B56;N # Mn [2] ORIYA SIGN OVERLINE..ORIYA AI LENGTH MARK
0B57;N # Mc ORIYA AU LENGTH MARK
0B5C..0B5D;N # Lo [2] ORIYA LETTER RRA..ORIYA LETTER RHA
0B5F..0B61;N # Lo [3] ORIYA LETTER YYA..ORIYA LETTER VOCALIC LL
@ -490,6 +495,7 @@
0C0E..0C10;N # Lo [3] TELUGU LETTER E..TELUGU LETTER AI
0C12..0C28;N # Lo [23] TELUGU LETTER O..TELUGU LETTER NA
0C2A..0C39;N # Lo [16] TELUGU LETTER PA..TELUGU LETTER HA
0C3C;N # Mn TELUGU SIGN NUKTA
0C3D;N # Lo TELUGU SIGN AVAGRAHA
0C3E..0C40;N # Mn [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II
0C41..0C44;N # Mc [4] TELUGU VOWEL SIGN U..TELUGU VOWEL SIGN VOCALIC RR
@ -497,6 +503,7 @@
0C4A..0C4D;N # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
0C55..0C56;N # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK
0C58..0C5A;N # Lo [3] TELUGU LETTER TSA..TELUGU LETTER RRRA
0C5D;N # Lo TELUGU LETTER NAKAARA POLLU
0C60..0C61;N # Lo [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL
0C62..0C63;N # Mn [2] TELUGU VOWEL SIGN VOCALIC L..TELUGU VOWEL SIGN VOCALIC LL
0C66..0C6F;N # Nd [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE
@ -522,14 +529,14 @@
0CCA..0CCB;N # Mc [2] KANNADA VOWEL SIGN O..KANNADA VOWEL SIGN OO
0CCC..0CCD;N # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
0CD5..0CD6;N # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
0CDE;N # Lo KANNADA LETTER FA
0CDD..0CDE;N # Lo [2] KANNADA LETTER NAKAARA POLLU..KANNADA LETTER FA
0CE0..0CE1;N # Lo [2] KANNADA LETTER VOCALIC RR..KANNADA LETTER VOCALIC LL
0CE2..0CE3;N # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
0CE6..0CEF;N # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
0CF1..0CF2;N # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA
0D00..0D01;N # Mn [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
0D02..0D03;N # Mc [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA
0D05..0D0C;N # Lo [8] MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC L
0D04..0D0C;N # Lo [9] MALAYALAM LETTER VEDIC ANUSVARA..MALAYALAM LETTER VOCALIC L
0D0E..0D10;N # Lo [3] MALAYALAM LETTER E..MALAYALAM LETTER AI
0D12..0D3A;N # Lo [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
0D3B..0D3C;N # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
@ -550,6 +557,7 @@
0D70..0D78;N # No [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS
0D79;N # So MALAYALAM DATE MARK
0D7A..0D7F;N # Lo [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K
0D81;N # Mn SINHALA SIGN CANDRABINDU
0D82..0D83;N # Mc [2] SINHALA SIGN ANUSVARAYA..SINHALA SIGN VISARGAYA
0D85..0D96;N # Lo [18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA
0D9A..0DB1;N # Lo [24] SINHALA LETTER ALPAPRAANA KAYANNA..SINHALA LETTER DANTAJA NAYANNA
@ -708,11 +716,13 @@
16EB..16ED;N # Po [3] RUNIC SINGLE PUNCTUATION..RUNIC CROSS PUNCTUATION
16EE..16F0;N # Nl [3] RUNIC ARLAUG SYMBOL..RUNIC BELGTHOR SYMBOL
16F1..16F8;N # Lo [8] RUNIC LETTER K..RUNIC LETTER FRANKS CASKET AESC
1700..170C;N # Lo [13] TAGALOG LETTER A..TAGALOG LETTER YA
170E..1711;N # Lo [4] TAGALOG LETTER LA..TAGALOG LETTER HA
1700..1711;N # Lo [18] TAGALOG LETTER A..TAGALOG LETTER HA
1712..1714;N # Mn [3] TAGALOG VOWEL SIGN I..TAGALOG SIGN VIRAMA
1715;N # Mc TAGALOG SIGN PAMUDPOD
171F;N # Lo TAGALOG LETTER ARCHAIC RA
1720..1731;N # Lo [18] HANUNOO LETTER A..HANUNOO LETTER HA
1732..1734;N # Mn [3] HANUNOO VOWEL SIGN I..HANUNOO SIGN PAMUDPOD
1732..1733;N # Mn [2] HANUNOO VOWEL SIGN I..HANUNOO VOWEL SIGN U
1734;N # Mc HANUNOO SIGN PAMUDPOD
1735..1736;N # Po [2] PHILIPPINE SINGLE PUNCTUATION..PHILIPPINE DOUBLE PUNCTUATION
1740..1751;N # Lo [18] BUHID LETTER A..BUHID LETTER HA
1752..1753;N # Mn [2] BUHID VOWEL SIGN I..BUHID VOWEL SIGN U
@ -740,6 +750,7 @@
1807..180A;N # Po [4] MONGOLIAN SIBE SYLLABLE BOUNDARY MARKER..MONGOLIAN NIRUGU
180B..180D;N # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
180E;N # Cf MONGOLIAN VOWEL SEPARATOR
180F;N # Mn MONGOLIAN FREE VARIATION SELECTOR FOUR
1810..1819;N # Nd [10] MONGOLIAN DIGIT ZERO..MONGOLIAN DIGIT NINE
1820..1842;N # Lo [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
1843;N # Lm MONGOLIAN LETTER TODO LONG VOWEL SIGN
@ -795,6 +806,7 @@
1AA8..1AAD;N # Po [6] TAI THAM SIGN KAAN..TAI THAM SIGN CAANG
1AB0..1ABD;N # Mn [14] COMBINING DOUBLED CIRCUMFLEX ACCENT..COMBINING PARENTHESES BELOW
1ABE;N # Me COMBINING PARENTHESES OVERLAY
1ABF..1ACE;N # Mn [16] COMBINING LATIN SMALL LETTER W BELOW..COMBINING LATIN SMALL LETTER INSULAR T
1B00..1B03;N # Mn [4] BALINESE SIGN ULU RICEM..BALINESE SIGN SURANG
1B04;N # Mc BALINESE SIGN BISAH
1B05..1B33;N # Lo [47] BALINESE LETTER AKARA..BALINESE LETTER HA
@ -806,12 +818,13 @@
1B3D..1B41;N # Mc [5] BALINESE VOWEL SIGN LA LENGA TEDUNG..BALINESE VOWEL SIGN TALING REPA TEDUNG
1B42;N # Mn BALINESE VOWEL SIGN PEPET
1B43..1B44;N # Mc [2] BALINESE VOWEL SIGN PEPET TEDUNG..BALINESE ADEG ADEG
1B45..1B4B;N # Lo [7] BALINESE LETTER KAF SASAK..BALINESE LETTER ASYURA SASAK
1B45..1B4C;N # Lo [8] BALINESE LETTER KAF SASAK..BALINESE LETTER ARCHAIC JNYA
1B50..1B59;N # Nd [10] BALINESE DIGIT ZERO..BALINESE DIGIT NINE
1B5A..1B60;N # Po [7] BALINESE PANTI..BALINESE PAMENENG
1B61..1B6A;N # So [10] BALINESE MUSICAL SYMBOL DONG..BALINESE MUSICAL SYMBOL DANG GEDE
1B6B..1B73;N # Mn [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG
1B74..1B7C;N # So [9] BALINESE MUSICAL SYMBOL RIGHT-HAND OPEN DUG..BALINESE MUSICAL SYMBOL LEFT-HAND OPEN PING
1B7D..1B7E;N # Po [2] BALINESE PANTI LANTANG..BALINESE PAMADA LANTANG
1B80..1B81;N # Mn [2] SUNDANESE SIGN PANYECEK..SUNDANESE SIGN PANGLAYAR
1B82;N # Mc SUNDANESE SIGN PANGWISAD
1B83..1BA0;N # Lo [30] SUNDANESE LETTER A..SUNDANESE LETTER HA
@ -870,8 +883,7 @@
1D79..1D7F;N # Ll [7] LATIN SMALL LETTER INSULAR G..LATIN SMALL LETTER UPSILON WITH STROKE
1D80..1D9A;N # Ll [27] LATIN SMALL LETTER B WITH PALATAL HOOK..LATIN SMALL LETTER EZH WITH RETROFLEX HOOK
1D9B..1DBF;N # Lm [37] MODIFIER LETTER SMALL TURNED ALPHA..MODIFIER LETTER SMALL THETA
1DC0..1DF9;N # Mn [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
1DFB..1DFF;N # Mn [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
1DC0..1DFF;N # Mn [64] COMBINING DOTTED GRAVE ACCENT..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
1E00..1EFF;N # L& [256] LATIN CAPITAL LETTER A WITH RING BELOW..LATIN SMALL LETTER Y WITH LOOP
1F00..1F15;N # L& [22] GREEK SMALL LETTER ALPHA WITH PSILI..GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA
1F18..1F1D;N # Lu [6] GREEK CAPITAL LETTER EPSILON WITH PSILI..GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA
@ -963,7 +975,7 @@
20A9;H # Sc WON SIGN
20AA..20AB;N # Sc [2] NEW SHEQEL SIGN..DONG SIGN
20AC;A # Sc EURO SIGN
20AD..20BF;N # Sc [19] KIP SIGN..BITCOIN SIGN
20AD..20C0;N # Sc [20] KIP SIGN..SOM SIGN
20D0..20DC;N # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
20DD..20E0;N # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
20E1;N # Mn COMBINING LEFT RIGHT ARROW ABOVE
@ -1335,9 +1347,8 @@
2B56..2B59;A # So [4] HEAVY OVAL WITH OVAL INSIDE..HEAVY CIRCLED SALTIRE
2B5A..2B73;N # So [26] SLANTED NORTH ARROW WITH HOOKED HEAD..DOWNWARDS TRIANGLE-HEADED ARROW TO BAR
2B76..2B95;N # So [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
2B98..2BFF;N # So [104] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..HELLSCHREIBER PAUSE SYMBOL
2C00..2C2E;N # Lu [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE
2C30..2C5E;N # Ll [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE
2B97..2BFF;N # So [105] SYMBOL FOR TYPE A ELECTRONICS..HELLSCHREIBER PAUSE SYMBOL
2C00..2C5F;N # L& [96] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC SMALL LETTER CAUDATE CHRIVI
2C60..2C7B;N # L& [28] LATIN CAPITAL LETTER L WITH DOUBLE BAR..LATIN LETTER SMALL CAPITAL TURNED E
2C7C..2C7D;N # Lm [2] LATIN SUBSCRIPT SMALL LETTER J..MODIFIER LETTER CAPITAL V
2C7E..2C7F;N # Lu [2] LATIN CAPITAL LETTER S WITH SWASH TAIL..LATIN CAPITAL LETTER Z WITH SWASH TAIL
@ -1404,6 +1415,17 @@
2E41;N # Po REVERSED COMMA
2E42;N # Ps DOUBLE LOW-REVERSED-9 QUOTATION MARK
2E43..2E4F;N # Po [13] DASH WITH LEFT UPTURN..CORNISH VERSE DIVIDER
2E50..2E51;N # So [2] CROSS PATTY WITH RIGHT CROSSBAR..CROSS PATTY WITH LEFT CROSSBAR
2E52..2E54;N # Po [3] TIRONIAN SIGN CAPITAL ET..MEDIEVAL QUESTION MARK
2E55;N # Ps LEFT SQUARE BRACKET WITH STROKE
2E56;N # Pe RIGHT SQUARE BRACKET WITH STROKE
2E57;N # Ps LEFT SQUARE BRACKET WITH DOUBLE STROKE
2E58;N # Pe RIGHT SQUARE BRACKET WITH DOUBLE STROKE
2E59;N # Ps TOP HALF LEFT PARENTHESIS
2E5A;N # Pe TOP HALF RIGHT PARENTHESIS
2E5B;N # Ps BOTTOM HALF LEFT PARENTHESIS
2E5C;N # Pe BOTTOM HALF RIGHT PARENTHESIS
2E5D;N # Pd OBLIQUE HYPHEN
2E80..2E99;W # So [26] CJK RADICAL REPEAT..CJK RADICAL RAP
2E9B..2EF3;W # So [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE
2F00..2FD5;W # So [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE
@ -1464,7 +1486,7 @@
3190..3191;W # So [2] IDEOGRAPHIC ANNOTATION LINKING MARK..IDEOGRAPHIC ANNOTATION REVERSE MARK
3192..3195;W # No [4] IDEOGRAPHIC ANNOTATION ONE MARK..IDEOGRAPHIC ANNOTATION FOUR MARK
3196..319F;W # So [10] IDEOGRAPHIC ANNOTATION TOP MARK..IDEOGRAPHIC ANNOTATION MAN MARK
31A0..31BA;W # Lo [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
31A0..31BF;W # Lo [32] BOPOMOFO LETTER BU..BOPOMOFO LETTER AH
31C0..31E3;W # So [36] CJK STROKE T..CJK STROKE Q
31F0..31FF;W # Lo [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
3200..321E;W # So [31] PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED KOREAN CHARACTER O HU
@ -1479,11 +1501,9 @@
32B1..32BF;W # No [15] CIRCLED NUMBER THIRTY SIX..CIRCLED NUMBER FIFTY
32C0..32FF;W # So [64] IDEOGRAPHIC TELEGRAPH SYMBOL FOR JANUARY..SQUARE ERA NAME REIWA
3300..33FF;W # So [256] SQUARE APAATO..SQUARE GAL
3400..4DB5;W # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
4DB6..4DBF;W # Cn [10] <reserved-4DB6>..<reserved-4DBF>
3400..4DBF;W # Lo [6592] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DBF
4DC0..4DFF;N # So [64] HEXAGRAM FOR THE CREATIVE HEAVEN..HEXAGRAM FOR BEFORE COMPLETION
4E00..9FEF;W # Lo [20976] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEF
9FF0..9FFF;W # Cn [16] <reserved-9FF0>..<reserved-9FFF>
4E00..9FFF;W # Lo [20992] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FFF
A000..A014;W # Lo [21] YI SYLLABLE IT..YI SYLLABLE E
A015;W # Lm YI SYLLABLE WU
A016..A48C;W # Lo [1143] YI SYLLABLE BIT..YI SYLLABLE YYR
@ -1522,8 +1542,12 @@ A788;N # Lm MODIFIER LETTER LOW CIRCUMFLEX ACCENT
A789..A78A;N # Sk [2] MODIFIER LETTER COLON..MODIFIER LETTER SHORT EQUALS SIGN
A78B..A78E;N # L& [4] LATIN CAPITAL LETTER SALTILLO..LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT
A78F;N # Lo LATIN LETTER SINOLOGICAL DOT
A790..A7BF;N # L& [48] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN SMALL LETTER GLOTTAL U
A7C2..A7C6;N # L& [5] LATIN CAPITAL LETTER ANGLICANA W..LATIN CAPITAL LETTER Z WITH PALATAL HOOK
A790..A7CA;N # L& [59] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN SMALL LETTER S WITH SHORT STROKE OVERLAY
A7D0..A7D1;N # L& [2] LATIN CAPITAL LETTER CLOSED INSULAR G..LATIN SMALL LETTER CLOSED INSULAR G
A7D3;N # Ll LATIN SMALL LETTER DOUBLE THORN
A7D5..A7D9;N # L& [5] LATIN SMALL LETTER DOUBLE WYNN..LATIN SMALL LETTER SIGMOID S
A7F2..A7F4;N # Lm [3] MODIFIER LETTER CAPITAL C..MODIFIER LETTER CAPITAL Q
A7F5..A7F6;N # L& [2] LATIN CAPITAL LETTER REVERSED HALF H..LATIN SMALL LETTER REVERSED HALF H
A7F7;N # Lo LATIN EPIGRAPHIC LETTER SIDEWAYS I
A7F8..A7F9;N # Lm [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE
A7FA;N # Ll LATIN LETTER SMALL CAPITAL TURNED M
@ -1539,6 +1563,7 @@ A823..A824;N # Mc [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN
A825..A826;N # Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
A827;N # Mc SYLOTI NAGRI VOWEL SIGN OO
A828..A82B;N # So [4] SYLOTI NAGRI POETRY MARK-1..SYLOTI NAGRI POETRY MARK-4
A82C;N # Mn SYLOTI NAGRI SIGN ALTERNATE HASANTA
A830..A835;N # No [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTION THREE SIXTEENTHS
A836..A837;N # So [2] NORTH INDIC QUARTER MARK..NORTH INDIC PLACEHOLDER MARK
A838;N # Sc NORTH INDIC RUPEE MARK
@ -1639,7 +1664,9 @@ AB28..AB2E;N # Lo [7] ETHIOPIC SYLLABLE BBA..ETHIOPIC SYLLABLE BBO
AB30..AB5A;N # Ll [43] LATIN SMALL LETTER BARRED ALPHA..LATIN SMALL LETTER Y WITH SHORT RIGHT LEG
AB5B;N # Sk MODIFIER BREVE WITH INVERTED BREVE
AB5C..AB5F;N # Lm [4] MODIFIER LETTER SMALL HENG..MODIFIER LETTER SMALL U WITH LEFT HOOK
AB60..AB67;N # Ll [8] LATIN SMALL LETTER SAKHA YAT..LATIN SMALL LETTER TS DIGRAPH WITH RETROFLEX HOOK
AB60..AB68;N # Ll [9] LATIN SMALL LETTER SAKHA YAT..LATIN SMALL LETTER TURNED R WITH MIDDLE TILDE
AB69;N # Lm MODIFIER LETTER SMALL TURNED W
AB6A..AB6B;N # Sk [2] MODIFIER LETTER LEFT TACK..MODIFIER LETTER RIGHT TACK
AB70..ABBF;N # Ll [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETTER YA
ABC0..ABE2;N # Lo [35] MEETEI MAYEK LETTER KOK..MEETEI MAYEK LETTER I LONSUM
ABE3..ABE4;N # Mc [2] MEETEI MAYEK VOWEL SIGN ONAP..MEETEI MAYEK VOWEL SIGN INAP
@ -1675,15 +1702,17 @@ FB40..FB41;N # Lo [2] HEBREW LETTER NUN WITH DAGESH..HEBREW LETTER SAMEK
FB43..FB44;N # Lo [2] HEBREW LETTER FINAL PE WITH DAGESH..HEBREW LETTER PE WITH DAGESH
FB46..FB4F;N # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATURE ALEF LAMED
FB50..FBB1;N # Lo [98] ARABIC LETTER ALEF WASLA ISOLATED FORM..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
FBB2..FBC1;N # Sk [16] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL SMALL TAH BELOW
FBB2..FBC2;N # Sk [17] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL WASLA ABOVE
FBD3..FD3D;N # Lo [363] ARABIC LETTER NG ISOLATED FORM..ARABIC LIGATURE ALEF WITH FATHATAN ISOLATED FORM
FD3E;N # Pe ORNATE LEFT PARENTHESIS
FD3F;N # Ps ORNATE RIGHT PARENTHESIS
FD40..FD4F;N # So [16] ARABIC LIGATURE RAHIMAHU ALLAAH..ARABIC LIGATURE RAHIMAHUM ALLAAH
FD50..FD8F;N # Lo [64] ARABIC LIGATURE TEH WITH JEEM WITH MEEM INITIAL FORM..ARABIC LIGATURE MEEM WITH KHAH WITH MEEM INITIAL FORM
FD92..FDC7;N # Lo [54] ARABIC LIGATURE MEEM WITH JEEM WITH KHAH INITIAL FORM..ARABIC LIGATURE NOON WITH JEEM WITH YEH FINAL FORM
FDCF;N # So ARABIC LIGATURE SALAAMUHU ALAYNAA
FDF0..FDFB;N # Lo [12] ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM..ARABIC LIGATURE JALLAJALALOUHOU
FDFC;N # Sc RIAL SIGN
FDFD;N # So ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
FDFD..FDFF;N # So [3] ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM..ARABIC LIGATURE AZZA WA JALL
FE00..FE0F;A # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16
FE10..FE16;W # Po [7] PRESENTATION FORM FOR VERTICAL COMMA..PRESENTATION FORM FOR VERTICAL QUESTION MARK
FE17;W # Ps PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
@ -1800,7 +1829,7 @@ FFFD;A # So REPLACEMENT CHARACTER
10179..10189;N # So [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN
1018A..1018B;N # No [2] GREEK ZERO SIGN..GREEK ONE QUARTER SIGN
1018C..1018E;N # So [3] GREEK SINUSOID SIGN..NOMISMA SIGN
10190..1019B;N # So [12] ROMAN SEXTANS SIGN..ROMAN CENTURIAL SIGN
10190..1019C;N # So [13] ROMAN SEXTANS SIGN..ASCIA SYMBOL
101A0;N # So GREEK SYMBOL TAU RHO
101D0..101FC;N # So [45] PHAISTOS DISC SIGN PEDESTRIAN..PHAISTOS DISC SIGN WAVY BAND
101FD;N # Mn PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE
@ -1832,9 +1861,20 @@ FFFD;A # So REPLACEMENT CHARACTER
10500..10527;N # Lo [40] ELBASAN LETTER A..ELBASAN LETTER KHE
10530..10563;N # Lo [52] CAUCASIAN ALBANIAN LETTER ALT..CAUCASIAN ALBANIAN LETTER KIW
1056F;N # Po CAUCASIAN ALBANIAN CITATION MARK
10570..1057A;N # Lu [11] VITHKUQI CAPITAL LETTER A..VITHKUQI CAPITAL LETTER GA
1057C..1058A;N # Lu [15] VITHKUQI CAPITAL LETTER HA..VITHKUQI CAPITAL LETTER RE
1058C..10592;N # Lu [7] VITHKUQI CAPITAL LETTER SE..VITHKUQI CAPITAL LETTER XE
10594..10595;N # Lu [2] VITHKUQI CAPITAL LETTER Y..VITHKUQI CAPITAL LETTER ZE
10597..105A1;N # Ll [11] VITHKUQI SMALL LETTER A..VITHKUQI SMALL LETTER GA
105A3..105B1;N # Ll [15] VITHKUQI SMALL LETTER HA..VITHKUQI SMALL LETTER RE
105B3..105B9;N # Ll [7] VITHKUQI SMALL LETTER SE..VITHKUQI SMALL LETTER XE
105BB..105BC;N # Ll [2] VITHKUQI SMALL LETTER Y..VITHKUQI SMALL LETTER ZE
10600..10736;N # Lo [311] LINEAR A SIGN AB001..LINEAR A SIGN A664
10740..10755;N # Lo [22] LINEAR A SIGN A701 A..LINEAR A SIGN A732 JE
10760..10767;N # Lo [8] LINEAR A SIGN A800..LINEAR A SIGN A807
10780..10785;N # Lm [6] MODIFIER LETTER SMALL CAPITAL AA..MODIFIER LETTER SMALL B WITH HOOK
10787..107B0;N # Lm [42] MODIFIER LETTER SMALL DZ DIGRAPH..MODIFIER LETTER SMALL V WITH RIGHT HOOK
107B2..107BA;N # Lm [9] MODIFIER LETTER SMALL CAPITAL Y..MODIFIER LETTER SMALL S WITH CURL
10800..10805;N # Lo [6] CYPRIOT SYLLABLE A..CYPRIOT SYLLABLE JA
10808;N # Lo CYPRIOT SYLLABLE JO
1080A..10835;N # Lo [44] CYPRIOT SYLLABLE KA..CYPRIOT SYLLABLE WO
@ -1902,6 +1942,10 @@ FFFD;A # So REPLACEMENT CHARACTER
10D24..10D27;N # Mn [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI
10D30..10D39;N # Nd [10] HANIFI ROHINGYA DIGIT ZERO..HANIFI ROHINGYA DIGIT NINE
10E60..10E7E;N # No [31] RUMI DIGIT ONE..RUMI FRACTION TWO THIRDS
10E80..10EA9;N # Lo [42] YEZIDI LETTER ELIF..YEZIDI LETTER ET
10EAB..10EAC;N # Mn [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK
10EAD;N # Pd YEZIDI HYPHENATION MARK
10EB0..10EB1;N # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
10F00..10F1C;N # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F1D..10F26;N # No [10] OLD SOGDIAN NUMBER ONE..OLD SOGDIAN FRACTION ONE HALF
10F27;N # Lo OLD SOGDIAN LIGATURE AYIN-DALETH
@ -1909,6 +1953,11 @@ FFFD;A # So REPLACEMENT CHARACTER
10F46..10F50;N # Mn [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW
10F51..10F54;N # No [4] SOGDIAN NUMBER ONE..SOGDIAN NUMBER ONE HUNDRED
10F55..10F59;N # Po [5] SOGDIAN PUNCTUATION TWO VERTICAL BARS..SOGDIAN PUNCTUATION HALF CIRCLE WITH DOT
10F70..10F81;N # Lo [18] OLD UYGHUR LETTER ALEPH..OLD UYGHUR LETTER LESH
10F82..10F85;N # Mn [4] OLD UYGHUR COMBINING DOT ABOVE..OLD UYGHUR COMBINING TWO DOTS BELOW
10F86..10F89;N # Po [4] OLD UYGHUR PUNCTUATION BAR..OLD UYGHUR PUNCTUATION FOUR DOTS
10FB0..10FC4;N # Lo [21] CHORASMIAN LETTER ALEPH..CHORASMIAN LETTER TAW
10FC5..10FCB;N # No [7] CHORASMIAN NUMBER ONE..CHORASMIAN NUMBER ONE HUNDRED
10FE0..10FF6;N # Lo [23] ELYMAIC LETTER ALEPH..ELYMAIC LIGATURE ZAYIN-YODH
11000;N # Mc BRAHMI SIGN CANDRABINDU
11001;N # Mn BRAHMI SIGN ANUSVARA
@ -1918,6 +1967,10 @@ FFFD;A # So REPLACEMENT CHARACTER
11047..1104D;N # Po [7] BRAHMI DANDA..BRAHMI PUNCTUATION LOTUS
11052..11065;N # No [20] BRAHMI NUMBER ONE..BRAHMI NUMBER ONE THOUSAND
11066..1106F;N # Nd [10] BRAHMI DIGIT ZERO..BRAHMI DIGIT NINE
11070;N # Mn BRAHMI SIGN OLD TAMIL VIRAMA
11071..11072;N # Lo [2] BRAHMI LETTER OLD TAMIL SHORT E..BRAHMI LETTER OLD TAMIL SHORT O
11073..11074;N # Mn [2] BRAHMI VOWEL SIGN OLD TAMIL SHORT E..BRAHMI VOWEL SIGN OLD TAMIL SHORT O
11075;N # Lo BRAHMI LETTER OLD TAMIL LLA
1107F;N # Mn BRAHMI NUMBER JOINER
11080..11081;N # Mn [2] KAITHI SIGN CANDRABINDU..KAITHI SIGN ANUSVARA
11082;N # Mc KAITHI SIGN VISARGA
@ -1929,6 +1982,7 @@ FFFD;A # So REPLACEMENT CHARACTER
110BB..110BC;N # Po [2] KAITHI ABBREVIATION SIGN..KAITHI ENUMERATION SIGN
110BD;N # Cf KAITHI NUMBER SIGN
110BE..110C1;N # Po [4] KAITHI SECTION MARK..KAITHI DOUBLE DANDA
110C2;N # Mn KAITHI VOWEL SIGN VOCALIC R
110CD;N # Cf KAITHI NUMBER SIGN ABOVE
110D0..110E8;N # Lo [25] SORA SOMPENG LETTER SAH..SORA SOMPENG LETTER MAE
110F0..110F9;N # Nd [10] SORA SOMPENG DIGIT ZERO..SORA SOMPENG DIGIT NINE
@ -1941,6 +1995,7 @@ FFFD;A # So REPLACEMENT CHARACTER
11140..11143;N # Po [4] CHAKMA SECTION MARK..CHAKMA QUESTION MARK
11144;N # Lo CHAKMA LETTER LHAA
11145..11146;N # Mc [2] CHAKMA VOWEL SIGN AA..CHAKMA VOWEL SIGN EI
11147;N # Lo CHAKMA LETTER VAA
11150..11172;N # Lo [35] MAHAJANI LETTER A..MAHAJANI LETTER RRA
11173;N # Mn MAHAJANI SIGN NUKTA
11174..11175;N # Po [2] MAHAJANI ABBREVIATION SIGN..MAHAJANI SECTION MARK
@ -1955,6 +2010,8 @@ FFFD;A # So REPLACEMENT CHARACTER
111C5..111C8;N # Po [4] SHARADA DANDA..SHARADA SEPARATOR
111C9..111CC;N # Mn [4] SHARADA SANDHI MARK..SHARADA EXTRA SHORT VOWEL MARK
111CD;N # Po SHARADA SUTRA MARK
111CE;N # Mc SHARADA VOWEL SIGN PRISHTHAMATRA E
111CF;N # Mn SHARADA SIGN INVERTED CANDRABINDU
111D0..111D9;N # Nd [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE
111DA;N # Lo SHARADA EKAM
111DB;N # Po SHARADA SIGN SIDDHAM
@ -2013,10 +2070,10 @@ FFFD;A # So REPLACEMENT CHARACTER
11447..1144A;N # Lo [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI
1144B..1144F;N # Po [5] NEWA DANDA..NEWA ABBREVIATION SIGN
11450..11459;N # Nd [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
1145B;N # Po NEWA PLACEHOLDER MARK
1145A..1145B;N # Po [2] NEWA DOUBLE COMMA..NEWA PLACEHOLDER MARK
1145D;N # Po NEWA INSERTION SIGN
1145E;N # Mn NEWA SANDHI MARK
1145F;N # Lo NEWA LETTER VEDIC ANUSVARA
1145F..11461;N # Lo [3] NEWA LETTER VEDIC ANUSVARA..NEWA SIGN UPADHMANIYA
11480..114AF;N # Lo [48] TIRHUTA ANJI..TIRHUTA LETTER HA
114B0..114B2;N # Mc [3] TIRHUTA VOWEL SIGN AA..TIRHUTA VOWEL SIGN II
114B3..114B8;N # Mn [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
@ -2060,6 +2117,7 @@ FFFD;A # So REPLACEMENT CHARACTER
116B6;N # Mc TAKRI SIGN VIRAMA
116B7;N # Mn TAKRI SIGN NUKTA
116B8;N # Lo TAKRI LETTER ARCHAIC KHA
116B9;N # Po TAKRI ABBREVIATION SIGN
116C0..116C9;N # Nd [10] TAKRI DIGIT ZERO..TAKRI DIGIT NINE
11700..1171A;N # Lo [27] AHOM LETTER KA..AHOM LETTER ALTERNATE BA
1171D..1171F;N # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
@ -2071,6 +2129,7 @@ FFFD;A # So REPLACEMENT CHARACTER
1173A..1173B;N # No [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY
1173C..1173E;N # Po [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI
1173F;N # So AHOM SYMBOL VI
11740..11746;N # Lo [7] AHOM LETTER CA..AHOM LETTER LLA
11800..1182B;N # Lo [44] DOGRA LETTER A..DOGRA LETTER RRA
1182C..1182E;N # Mc [3] DOGRA VOWEL SIGN AA..DOGRA VOWEL SIGN II
1182F..11837;N # Mn [9] DOGRA VOWEL SIGN U..DOGRA SIGN ANUSVARA
@ -2081,6 +2140,23 @@ FFFD;A # So REPLACEMENT CHARACTER
118E0..118E9;N # Nd [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE
118EA..118F2;N # No [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY
118FF;N # Lo WARANG CITI OM
11900..11906;N # Lo [7] DIVES AKURU LETTER A..DIVES AKURU LETTER E
11909;N # Lo DIVES AKURU LETTER O
1190C..11913;N # Lo [8] DIVES AKURU LETTER KA..DIVES AKURU LETTER JA
11915..11916;N # Lo [2] DIVES AKURU LETTER NYA..DIVES AKURU LETTER TTA
11918..1192F;N # Lo [24] DIVES AKURU LETTER DDA..DIVES AKURU LETTER ZA
11930..11935;N # Mc [6] DIVES AKURU VOWEL SIGN AA..DIVES AKURU VOWEL SIGN E
11937..11938;N # Mc [2] DIVES AKURU VOWEL SIGN AI..DIVES AKURU VOWEL SIGN O
1193B..1193C;N # Mn [2] DIVES AKURU SIGN ANUSVARA..DIVES AKURU SIGN CANDRABINDU
1193D;N # Mc DIVES AKURU SIGN HALANTA
1193E;N # Mn DIVES AKURU VIRAMA
1193F;N # Lo DIVES AKURU PREFIXED NASAL SIGN
11940;N # Mc DIVES AKURU MEDIAL YA
11941;N # Lo DIVES AKURU INITIAL RA
11942;N # Mc DIVES AKURU MEDIAL RA
11943;N # Mn DIVES AKURU SIGN NUKTA
11944..11946;N # Po [3] DIVES AKURU DOUBLE DANDA..DIVES AKURU END OF TEXT MARK
11950..11959;N # Nd [10] DIVES AKURU DIGIT ZERO..DIVES AKURU DIGIT NINE
119A0..119A7;N # Lo [8] NANDINAGARI LETTER A..NANDINAGARI LETTER VOCALIC RR
119AA..119D0;N # Lo [39] NANDINAGARI LETTER E..NANDINAGARI LETTER RRA
119D1..119D3;N # Mc [3] NANDINAGARI VOWEL SIGN AA..NANDINAGARI VOWEL SIGN II
@ -2112,6 +2188,7 @@ FFFD;A # So REPLACEMENT CHARACTER
11A9A..11A9C;N # Po [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
11A9D;N # Lo SOYOMBO MARK PLUTA
11A9E..11AA2;N # Po [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
11AB0..11ABF;N # Lo [16] CANADIAN SYLLABICS NATTILIK HI..CANADIAN SYLLABICS SPA
11AC0..11AF8;N # Lo [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL
11C00..11C08;N # Lo [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
11C0A..11C2E;N # Lo [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
@ -2158,6 +2235,7 @@ FFFD;A # So REPLACEMENT CHARACTER
11EF3..11EF4;N # Mn [2] MAKASAR VOWEL SIGN I..MAKASAR VOWEL SIGN U
11EF5..11EF6;N # Mc [2] MAKASAR VOWEL SIGN E..MAKASAR VOWEL SIGN O
11EF7..11EF8;N # Po [2] MAKASAR PASSIMBANG..MAKASAR END OF SECTION
11FB0;N # Lo LISU LETTER YHA
11FC0..11FD4;N # No [21] TAMIL FRACTION ONE THREE-HUNDRED-AND-TWENTIETH..TAMIL FRACTION DOWNSCALING FACTOR KIIZH
11FD5..11FDC;N # So [8] TAMIL SIGN NEL..TAMIL SIGN MUKKURUNI
11FDD..11FE0;N # Sc [4] TAMIL SIGN KAACU..TAMIL SIGN VARAAKAN
@ -2167,6 +2245,8 @@ FFFD;A # So REPLACEMENT CHARACTER
12400..1246E;N # Nl [111] CUNEIFORM NUMERIC SIGN TWO ASH..CUNEIFORM NUMERIC SIGN NINE U VARIANT FORM
12470..12474;N # Po [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON
12480..12543;N # Lo [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU
12F90..12FF0;N # Lo [97] CYPRO-MINOAN SIGN CM001..CYPRO-MINOAN SIGN CM114
12FF1..12FF2;N # Po [2] CYPRO-MINOAN SIGN CM301..CYPRO-MINOAN SIGN CM302
13000..1342E;N # Lo [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032
13430..13438;N # Cf [9] EGYPTIAN HIEROGLYPH VERTICAL JOINER..EGYPTIAN HIEROGLYPH END SEGMENT
14400..14646;N # Lo [583] ANATOLIAN HIEROGLYPH A001..ANATOLIAN HIEROGLYPH A530
@ -2174,6 +2254,8 @@ FFFD;A # So REPLACEMENT CHARACTER
16A40..16A5E;N # Lo [31] MRO LETTER TA..MRO LETTER TEK
16A60..16A69;N # Nd [10] MRO DIGIT ZERO..MRO DIGIT NINE
16A6E..16A6F;N # Po [2] MRO DANDA..MRO DOUBLE DANDA
16A70..16ABE;N # Lo [79] TANGSA LETTER OZ..TANGSA LETTER ZA
16AC0..16AC9;N # Nd [10] TANGSA DIGIT ZERO..TANGSA DIGIT NINE
16AD0..16AED;N # Lo [30] BASSA VAH LETTER ENNI..BASSA VAH LETTER I
16AF0..16AF4;N # Mn [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
16AF5;N # Po BASSA VAH FULL STOP
@ -2200,10 +2282,17 @@ FFFD;A # So REPLACEMENT CHARACTER
16FE0..16FE1;W # Lm [2] TANGUT ITERATION MARK..NUSHU ITERATION MARK
16FE2;W # Po OLD CHINESE HOOK MARK
16FE3;W # Lm OLD CHINESE ITERATION MARK
16FE4;W # Mn KHITAN SMALL SCRIPT FILLER
16FF0..16FF1;W # Mc [2] VIETNAMESE ALTERNATE READING MARK CA..VIETNAMESE ALTERNATE READING MARK NHAY
17000..187F7;W # Lo [6136] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187F7
18800..18AF2;W # Lo [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
18800..18AFF;W # Lo [768] TANGUT COMPONENT-001..TANGUT COMPONENT-768
18B00..18CD5;W # Lo [470] KHITAN SMALL SCRIPT CHARACTER-18B00..KHITAN SMALL SCRIPT CHARACTER-18CD5
18D00..18D08;W # Lo [9] TANGUT IDEOGRAPH-18D00..TANGUT IDEOGRAPH-18D08
1AFF0..1AFF3;W # Lm [4] KATAKANA LETTER MINNAN TONE-2..KATAKANA LETTER MINNAN TONE-5
1AFF5..1AFFB;W # Lm [7] KATAKANA LETTER MINNAN TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-5
1AFFD..1AFFE;W # Lm [2] KATAKANA LETTER MINNAN NASALIZED TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-8
1B000..1B0FF;W # Lo [256] KATAKANA LETTER ARCHAIC E..HENTAIGANA LETTER RE-2
1B100..1B11E;W # Lo [31] HENTAIGANA LETTER RE-3..HENTAIGANA LETTER N-MU-MO-2
1B100..1B122;W # Lo [35] HENTAIGANA LETTER RE-3..KATAKANA LETTER ARCHAIC WU
1B150..1B152;W # Lo [3] HIRAGANA LETTER SMALL WI..HIRAGANA LETTER SMALL WO
1B164..1B167;W # Lo [4] KATAKANA LETTER SMALL WI..KATAKANA LETTER SMALL N
1B170..1B2FB;W # Lo [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
@ -2215,6 +2304,9 @@ FFFD;A # So REPLACEMENT CHARACTER
1BC9D..1BC9E;N # Mn [2] DUPLOYAN THICK LETTER SELECTOR..DUPLOYAN DOUBLE MARK
1BC9F;N # Po DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1BCA0..1BCA3;N # Cf [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP
1CF00..1CF2D;N # Mn [46] ZNAMENNY COMBINING MARK GORAZDO NIZKO S KRYZHEM ON LEFT..ZNAMENNY COMBINING MARK KRYZH ON LEFT
1CF30..1CF46;N # Mn [23] ZNAMENNY COMBINING TONAL RANGE MARK MRACHNO..ZNAMENNY PRIZNAK MODIFIER ROG
1CF50..1CFC3;N # So [116] ZNAMENNY NEUME KRYUK..ZNAMENNY NEUME PAUK
1D000..1D0F5;N # So [246] BYZANTINE MUSICAL SYMBOL PSILI..BYZANTINE MUSICAL SYMBOL GORGON NEO KATO
1D100..1D126;N # So [39] MUSICAL SYMBOL SINGLE BARLINE..MUSICAL SYMBOL DRUM CLEF-2
1D129..1D164;N # So [60] MUSICAL SYMBOL MULTIPLE MEASURE REST..MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE
@ -2228,7 +2320,7 @@ FFFD;A # So REPLACEMENT CHARACTER
1D185..1D18B;N # Mn [7] MUSICAL SYMBOL COMBINING DOIT..MUSICAL SYMBOL COMBINING TRIPLE TONGUE
1D18C..1D1A9;N # So [30] MUSICAL SYMBOL RINFORZANDO..MUSICAL SYMBOL DEGREE SLASH
1D1AA..1D1AD;N # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO
1D1AE..1D1E8;N # So [59] MUSICAL SYMBOL PEDAL MARK..MUSICAL SYMBOL KIEVAN FLAT SIGN
1D1AE..1D1EA;N # So [61] MUSICAL SYMBOL PEDAL MARK..MUSICAL SYMBOL KORON
1D200..1D241;N # So [66] GREEK VOCAL NOTATION SYMBOL-1..GREEK INSTRUMENTAL NOTATION SYMBOL-54
1D242..1D244;N # Mn [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME
1D245;N # So GREEK MUSICAL LEIMMA
@ -2288,6 +2380,9 @@ FFFD;A # So REPLACEMENT CHARACTER
1DA87..1DA8B;N # Po [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS
1DA9B..1DA9F;N # Mn [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
1DAA1..1DAAF;N # Mn [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
1DF00..1DF09;N # Ll [10] LATIN SMALL LETTER FENG DIGRAPH WITH TRILL..LATIN SMALL LETTER T WITH HOOK AND RETROFLEX HOOK
1DF0A;N # Lo LATIN LETTER RETROFLEX CLICK WITH RETROFLEX HOOK
1DF0B..1DF1E;N # Ll [20] LATIN SMALL LETTER ESH WITH DOUBLE BAR..LATIN SMALL LETTER S WITH CURL
1E000..1E006;N # Mn [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
1E008..1E018;N # Mn [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
1E01B..1E021;N # Mn [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
@ -2299,10 +2394,16 @@ FFFD;A # So REPLACEMENT CHARACTER
1E140..1E149;N # Nd [10] NYIAKENG PUACHUE HMONG DIGIT ZERO..NYIAKENG PUACHUE HMONG DIGIT NINE
1E14E;N # Lo NYIAKENG PUACHUE HMONG LOGOGRAM NYAJ
1E14F;N # So NYIAKENG PUACHUE HMONG CIRCLED CA
1E290..1E2AD;N # Lo [30] TOTO LETTER PA..TOTO LETTER A
1E2AE;N # Mn TOTO SIGN RISING TONE
1E2C0..1E2EB;N # Lo [44] WANCHO LETTER AA..WANCHO LETTER YIH
1E2EC..1E2EF;N # Mn [4] WANCHO TONE TUP..WANCHO TONE KOINI
1E2F0..1E2F9;N # Nd [10] WANCHO DIGIT ZERO..WANCHO DIGIT NINE
1E2FF;N # Sc WANCHO NGUN SIGN
1E7E0..1E7E6;N # Lo [7] ETHIOPIC SYLLABLE HHYA..ETHIOPIC SYLLABLE HHYO
1E7E8..1E7EB;N # Lo [4] ETHIOPIC SYLLABLE GURAGE HHWA..ETHIOPIC SYLLABLE HHWE
1E7ED..1E7EE;N # Lo [2] ETHIOPIC SYLLABLE GURAGE MWI..ETHIOPIC SYLLABLE GURAGE MWEE
1E7F0..1E7FE;N # Lo [15] ETHIOPIC SYLLABLE GURAGE QWI..ETHIOPIC SYLLABLE GURAGE PWEE
1E800..1E8C4;N # Lo [197] MENDE KIKAKUI SYLLABLE M001 KI..MENDE KIKAKUI SYLLABLE M060 NYON
1E8C7..1E8CF;N # No [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE
1E8D0..1E8D6;N # Mn [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
@ -2364,15 +2465,17 @@ FFFD;A # So REPLACEMENT CHARACTER
1F0D1..1F0F5;N # So [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21
1F100..1F10A;A # No [11] DIGIT ZERO FULL STOP..DIGIT NINE COMMA
1F10B..1F10C;N # No [2] DINGBAT CIRCLED SANS-SERIF DIGIT ZERO..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
1F10D..1F10F;N # So [3] CIRCLED ZERO WITH SLASH..CIRCLED DOLLAR SIGN WITH OVERLAID BACKSLASH
1F110..1F12D;A # So [30] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED CD
1F12E..1F12F;N # So [2] CIRCLED WZ..COPYLEFT SYMBOL
1F130..1F169;A # So [58] SQUARED LATIN CAPITAL LETTER A..NEGATIVE CIRCLED LATIN CAPITAL LETTER Z
1F16A..1F16C;N # So [3] RAISED MC SIGN..RAISED MR SIGN
1F16A..1F16F;N # So [6] RAISED MC SIGN..CIRCLED HUMAN FIGURE
1F170..1F18D;A # So [30] NEGATIVE SQUARED LATIN CAPITAL LETTER A..NEGATIVE SQUARED SA
1F18E;W # So NEGATIVE SQUARED AB
1F18F..1F190;A # So [2] NEGATIVE SQUARED WC..SQUARE DJ
1F191..1F19A;W # So [10] SQUARED CL..SQUARED VS
1F19B..1F1AC;A # So [18] SQUARED THREE D..SQUARED VOD
1F1AD;N # So MASK WORK SYMBOL
1F1E6..1F1FF;N # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
1F200..1F202;W # So [3] SQUARE HIRAGANA HOKA..SQUARED KATAKANA SA
1F210..1F23B;W # So [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
@ -2424,36 +2527,46 @@ FFFD;A # So REPLACEMENT CHARACTER
1F6CD..1F6CF;N # So [3] SHOPPING BAGS..BED
1F6D0..1F6D2;W # So [3] PLACE OF WORSHIP..SHOPPING TROLLEY
1F6D3..1F6D4;N # So [2] STUPA..PAGODA
1F6D5;W # So HINDU TEMPLE
1F6D5..1F6D7;W # So [3] HINDU TEMPLE..ELEVATOR
1F6DD..1F6DF;W # So [3] PLAYGROUND SLIDE..RING BUOY
1F6E0..1F6EA;N # So [11] HAMMER AND WRENCH..NORTHEAST-POINTING AIRPLANE
1F6EB..1F6EC;W # So [2] AIRPLANE DEPARTURE..AIRPLANE ARRIVING
1F6F0..1F6F3;N # So [4] SATELLITE..PASSENGER SHIP
1F6F4..1F6FA;W # So [7] SCOOTER..AUTO RICKSHAW
1F6F4..1F6FC;W # So [9] SCOOTER..ROLLER SKATE
1F700..1F773;N # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
1F780..1F7D8;N # So [89] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..NEGATIVE CIRCLED SQUARE
1F7E0..1F7EB;W # So [12] LARGE ORANGE CIRCLE..LARGE BROWN SQUARE
1F7F0;W # So HEAVY EQUALS SIGN
1F800..1F80B;N # So [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
1F810..1F847;N # So [56] LEFTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD..DOWNWARDS HEAVY ARROW
1F850..1F859;N # So [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
1F860..1F887;N # So [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
1F890..1F8AD;N # So [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
1F8B0..1F8B1;N # So [2] ARROW POINTING UPWARDS THEN NORTH WEST..ARROW POINTING RIGHTWARDS THEN CURVING SOUTH WEST
1F900..1F90B;N # So [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
1F90D..1F971;W # So [101] WHITE HEART..YAWNING FACE
1F973..1F976;W # So [4] FACE WITH PARTY HORN AND PARTY HAT..FREEZING FACE
1F97A..1F9A2;W # So [41] FACE WITH PLEADING EYES..SWAN
1F9A5..1F9AA;W # So [6] SLOTH..OYSTER
1F9AE..1F9CA;W # So [29] GUIDE DOG..ICE CUBE
1F9CD..1F9FF;W # So [51] STANDING PERSON..NAZAR AMULET
1F90C..1F93A;W # So [47] PINCHED FINGERS..FENCER
1F93B;N # So MODERN PENTATHLON
1F93C..1F945;W # So [10] WRESTLERS..GOAL NET
1F946;N # So RIFLE
1F947..1F9FF;W # So [185] FIRST PLACE MEDAL..NAZAR AMULET
1FA00..1FA53;N # So [84] NEUTRAL CHESS KING..BLACK CHESS KNIGHT-BISHOP
1FA60..1FA6D;N # So [14] XIANGQI RED GENERAL..XIANGQI BLACK SOLDIER
1FA70..1FA73;W # So [4] BALLET SHOES..SHORTS
1FA78..1FA7A;W # So [3] DROP OF BLOOD..STETHOSCOPE
1FA80..1FA82;W # So [3] YO-YO..PARACHUTE
1FA90..1FA95;W # So [6] RINGED PLANET..BANJO
20000..2A6D6;W # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6
2A6D7..2A6FF;W # Cn [41] <reserved-2A6D7>..<reserved-2A6FF>
2A700..2B734;W # Lo [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
2B735..2B73F;W # Cn [11] <reserved-2B735>..<reserved-2B73F>
1FA70..1FA74;W # So [5] BALLET SHOES..THONG SANDAL
1FA78..1FA7C;W # So [5] DROP OF BLOOD..CRUTCH
1FA80..1FA86;W # So [7] YO-YO..NESTING DOLLS
1FA90..1FAAC;W # So [29] RINGED PLANET..HAMSA
1FAB0..1FABA;W # So [11] FLY..NEST WITH EGGS
1FAC0..1FAC5;W # So [6] ANATOMICAL HEART..PERSON WITH CROWN
1FAD0..1FAD9;W # So [10] BLUEBERRIES..JAR
1FAE0..1FAE7;W # So [8] MELTING FACE..BUBBLES
1FAF0..1FAF6;W # So [7] HAND WITH INDEX FINGER AND THUMB CROSSED..HEART HANDS
1FB00..1FB92;N # So [147] BLOCK SEXTANT-1..UPPER HALF INVERSE MEDIUM SHADE AND LOWER HALF BLOCK
1FB94..1FBCA;N # So [55] LEFT HALF INVERSE MEDIUM SHADE AND RIGHT HALF BLOCK..WHITE UP-POINTING CHEVRON
1FBF0..1FBF9;N # Nd [10] SEGMENTED DIGIT ZERO..SEGMENTED DIGIT NINE
20000..2A6DF;W # Lo [42720] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6DF
2A6E0..2A6FF;W # Cn [32] <reserved-2A6E0>..<reserved-2A6FF>
2A700..2B738;W # Lo [4153] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B738
2B739..2B73F;W # Cn [7] <reserved-2B739>..<reserved-2B73F>
2B740..2B81D;W # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B81E..2B81F;W # Cn [2] <reserved-2B81E>..<reserved-2B81F>
2B820..2CEA1;W # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
@ -2463,7 +2576,8 @@ FFFD;A # So REPLACEMENT CHARACTER
2F800..2FA1D;W # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
2FA1E..2FA1F;W # Cn [2] <reserved-2FA1E>..<reserved-2FA1F>
2FA20..2FFFD;W # Cn [1502] <reserved-2FA20>..<reserved-2FFFD>
30000..3FFFD;W # Cn [65534] <reserved-30000>..<reserved-3FFFD>
30000..3134A;W # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
3134B..3FFFD;W # Cn [60595] <reserved-3134B>..<reserved-3FFFD>
E0001;N # Cf LANGUAGE TAG
E0020..E007F;N # Cf [96] TAG SPACE..CANCEL TAG
E0100..E01EF;A # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

File diff suppressed because it is too large Load diff

15
libc/unicode/update.sh Executable file
View file

@ -0,0 +1,15 @@
#!/bin/sh
[ -d libc/unicode ] || exit
[ -x o//examples/curl.com ] || make -j8 o//examples/curl.com || exit
mkdir -p o/tmp/ || exit
shineget() {
echo $2
o//examples/curl.com $2 >o/tmp/$$ || exit
mv o/tmp/$$ $1 || exit
}
shineget libc/unicode/blocks.txt https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt
shineget libc/unicode/unicodedata.txt https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
shineget libc/unicode/eastasianwidth.txt https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
shineget libc/unicode/SpecialCasing.txt https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt