mirror of
https://github.com/jart/cosmopolitan.git
synced 2025-05-28 16:22:29 +00:00
python-3.6.zip added from Github
README.cosmo contains the necessary links.
This commit is contained in:
parent
75fc601ff5
commit
0c4c56ff39
4219 changed files with 1968626 additions and 0 deletions
79
third_party/python/Modules/cjkcodecs/README
vendored
Normal file
79
third_party/python/Modules/cjkcodecs/README
vendored
Normal file
|
@ -0,0 +1,79 @@
|
|||
To generate or modify mapping headers
|
||||
-------------------------------------
|
||||
Mapping headers are imported from CJKCodecs as pre-generated form.
|
||||
If you need to tweak or add something on it, please look at tools/
|
||||
subdirectory of CJKCodecs' distribution.
|
||||
|
||||
|
||||
|
||||
Notes on implmentation characteristics of each codecs
|
||||
-----------------------------------------------------
|
||||
|
||||
1) Big5 codec
|
||||
|
||||
The big5 codec maps the following characters as cp950 does rather
|
||||
than conforming Unicode.org's that maps to 0xFFFD.
|
||||
|
||||
BIG5 Unicode Description
|
||||
|
||||
0xA15A 0x2574 SPACING UNDERSCORE
|
||||
0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE
|
||||
0xA1C5 0x02CD SPACING HEAVY UNDERSCORE
|
||||
0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT
|
||||
0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT
|
||||
0xA2CC 0x5341 HANGZHOU NUMERAL TEN
|
||||
0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY
|
||||
|
||||
Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
|
||||
big5 codes already, a roundtrip compatibility is not guaranteed for
|
||||
them.
|
||||
|
||||
|
||||
2) cp932 codec
|
||||
|
||||
To conform to Windows's real mapping, cp932 codec maps the following
|
||||
codepoints in addition of the official cp932 mapping.
|
||||
|
||||
CP932 Unicode Description
|
||||
|
||||
0x80 0x80 UNDEFINED
|
||||
0xA0 0xF8F0 UNDEFINED
|
||||
0xFD 0xF8F1 UNDEFINED
|
||||
0xFE 0xF8F2 UNDEFINED
|
||||
0xFF 0xF8F3 UNDEFINED
|
||||
|
||||
|
||||
3) euc-jisx0213 codec
|
||||
|
||||
The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
|
||||
unicode U+FF3C instead of U+005C as on unicode.org's mapping.
|
||||
Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
|
||||
is shown as a full width character, mapping to U+FF3C can make
|
||||
more sense.
|
||||
|
||||
The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
|
||||
codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
|
||||
overlapped by each other, it doesn't bother standard conformations
|
||||
(and JIS X 0213 Plane 2 is intended to use so.) On encoding
|
||||
sessions, the codec will try to encode kanji characters in this
|
||||
order:
|
||||
|
||||
JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212
|
||||
|
||||
|
||||
4) euc-jp codec
|
||||
|
||||
The euc-jp codec is a compatibility instance on these points:
|
||||
- U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
|
||||
- U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
|
||||
- U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)
|
||||
|
||||
|
||||
5) shift-jis codec
|
||||
|
||||
The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
|
||||
instead of using JIS X 0201 for compatibility. The differences are:
|
||||
- U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
|
||||
- U+007E TILDE is mapped to SHIFT-JIS 0x7e.
|
||||
- U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue