mirror of
				https://github.com/jart/cosmopolitan.git
				synced 2025-10-25 10:40:57 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			79 lines
		
	
	
	
		
			2.7 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			79 lines
		
	
	
	
		
			2.7 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| To generate or modify mapping headers
 | |
| -------------------------------------
 | |
| Mapping headers are imported from CJKCodecs as pre-generated form.
 | |
| If you need to tweak or add something on it, please look at tools/
 | |
| subdirectory of CJKCodecs' distribution.
 | |
| 
 | |
| 
 | |
| 
 | |
| Notes on implmentation characteristics of each codecs
 | |
| -----------------------------------------------------
 | |
| 
 | |
| 1) Big5 codec
 | |
| 
 | |
|   The big5 codec maps the following characters as cp950 does rather
 | |
|   than conforming Unicode.org's that maps to 0xFFFD.
 | |
| 
 | |
|     BIG5        Unicode     Description
 | |
| 
 | |
|     0xA15A      0x2574      SPACING UNDERSCORE
 | |
|     0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE
 | |
|     0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE
 | |
|     0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT
 | |
|     0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT
 | |
|     0xA2CC      0x5341      HANGZHOU NUMERAL TEN
 | |
|     0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY
 | |
| 
 | |
|   Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
 | |
|   big5 codes already, a roundtrip compatibility is not guaranteed for
 | |
|   them.
 | |
| 
 | |
| 
 | |
| 2) cp932 codec
 | |
| 
 | |
|   To conform to Windows's real mapping, cp932 codec maps the following
 | |
|   codepoints in addition of the official cp932 mapping.
 | |
| 
 | |
|     CP932     Unicode     Description
 | |
| 
 | |
|     0x80      0x80        UNDEFINED
 | |
|     0xA0      0xF8F0      UNDEFINED
 | |
|     0xFD      0xF8F1      UNDEFINED
 | |
|     0xFE      0xF8F2      UNDEFINED
 | |
|     0xFF      0xF8F3      UNDEFINED
 | |
| 
 | |
| 
 | |
| 3) euc-jisx0213 codec
 | |
| 
 | |
|   The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
 | |
|   unicode U+FF3C instead of U+005C as on unicode.org's mapping.
 | |
|   Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
 | |
|   is shown as a full width character, mapping to U+FF3C can make
 | |
|   more sense.
 | |
| 
 | |
|   The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
 | |
|   codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
 | |
|   overlapped by each other, it doesn't bother standard conformations
 | |
|   (and JIS X 0213 Plane 2 is intended to use so.) On encoding
 | |
|   sessions, the codec will try to encode kanji characters in this
 | |
|   order:
 | |
| 
 | |
|     JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212
 | |
| 
 | |
| 
 | |
| 4) euc-jp codec
 | |
| 
 | |
|   The euc-jp codec is a compatibility instance on these points:
 | |
|    - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
 | |
|    - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
 | |
|    - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)
 | |
| 
 | |
| 
 | |
| 5) shift-jis codec
 | |
| 
 | |
|   The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
 | |
|   instead of using JIS X 0201 for compatibility. The differences are:
 | |
|    - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
 | |
|    - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
 | |
|    - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.
 | |
| 
 |