From 0378e38d4a42399780b59ece0c1cbcffa3d6b488 Mon Sep 17 00:00:00 2001
From: dchandler
- FIXME: describe when the converter treats a space as a tsheg and when a space is Tibetan whitespace. Describe how a tsheg does not appear after {KA} and {GA} with most vowels, describe the handling of {NGA,} as {NGA ,}. Talk about dzongkha vs. tibetan when it comes to a tsheg at the end of a string of tsheg bars. Describe treatment of final line break or lack thereof. Warn users to watch out for lines that end with {-}. Describe treatment of {.} in certain contexts as U+0F0C. Etc. + The converters will insert a tsheg in some places where no ACIP + { } appears; this happens after {PA} and {DANG,} below: +
++GA PA + +GA PHA + +DAM, +LHAG + +GA CA, + +GA ++ +
+ Note that a space appears after {PHA}, and a comma appears after + {CA}, but {PA} has nothing between it and a line break. The + converters are smart enough to insert a tsheg regardless. +
+ ++ Also missing from the above ACIP, but inserted automatically by the + converters, is Tibetan whitespace; the converter sees + {DAM, LHAG} instead of {DAM,LHAG} above. +
+ ++ If such automatic corrections are not desired, try using a Unicode + escape before the line break instead of {PA} + or {,}. +
+ ++ The converters also treat {NGA,} as a typo for {NGA ,} + (actually, {NGA\u0F0C,} since one wouldn't want a line break to + occur after the tsheg and cause a shad to begin a + line; see the section on formatting Tibetan texts in the Tibetan! + 5.1 documentation) because Tibetan typesetting requires that NGA + not appear directly before a shad. (Perhaps {NGA,} + would look too much like {KA}.) +
+ ++ The converters embody the rule that a shad does not appear + after GA or KA unless a shabs kyu vowel is on the GA or + KA. For example, the space in {MA ,HA} is a tsheg, + and the space in {KU ,HA} is a tsheg, but the space in + {GA ,HA} is Tibetan whitespace. +
+ ++ If you find that the converters put a tsheg where it does not + belong, miss a tsheg, or put whitespace where it does belong, + please contact the + developers. +
+ ++ Though the ACIP standard does not mention it, it appears that some + ACIP Release IV texts use a period (i.e., {.}) to indicate a + non-breaking tsheg (i.e., U+0F0C). Search for {NGO.,}, + {....,DAM}, etc. Unless {,}, {.}, or a letter (i.e., a through + z) follows the {.}, it is only grudingly interpreted as a + non-breaking tsheg -- a warning is generated, too. FIXME: Is + this right? Allow for treating {.} as an outright error. +
+ ++ Note that the treatment of the very last line in an input text is + circumspect. +
+ +