Better docs w.r.t. the lexer's handling of ACIP spaces etc.
This commit is contained in:
parent
8561623b5e
commit
0378e38d4a
1 changed files with 92 additions and 1 deletions
|
@ -804,7 +804,81 @@ TIBETAN FONT AND NEEDS TO BE REDONE BY DOUBLE INPUT]"
|
|||
</p>
|
||||
|
||||
<p>
|
||||
FIXME: describe when the converter treats a space as a <i>tsheg</i> and when a space is Tibetan whitespace. Describe how a tsheg does not appear after {KA} and {GA} with most vowels, describe the handling of {NGA,} as {NGA ,}. Talk about dzongkha vs. tibetan when it comes to a <i>tsheg</i> at the end of a string of <i>tsheg bar</i>s. Describe treatment of final line break or lack thereof. Warn users to watch out for lines that end with {-}. Describe treatment of {.} in certain contexts as U+0F0C. Etc.
|
||||
The converters will insert a <i>tsheg</i> in some places where no ACIP
|
||||
{ } appears; this happens after {PA} and {DANG,} below:
|
||||
</p>
|
||||
<pre>
|
||||
GA PA
|
||||
|
||||
GA PHA
|
||||
|
||||
DAM,
|
||||
LHAG
|
||||
|
||||
GA CA,
|
||||
|
||||
GA
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Note that a space appears after {PHA}, and a comma appears after
|
||||
{CA}, but {PA} has nothing between it and a line break. The
|
||||
converters are smart enough to insert a <i>tsheg</i> regardless.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Also missing from the above ACIP, but inserted automatically by the
|
||||
converters, is Tibetan whitespace; the converter sees
|
||||
{DAM, LHAG} instead of {DAM,LHAG} above.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
If such automatic corrections are not desired, try using a Unicode
|
||||
<a href="#escapes">escape</a> before the line break instead of {PA}
|
||||
or {,}.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The converters also treat {NGA,} as a typo for {NGA ,}
|
||||
(actually, {NGA\u0F0C,} since one wouldn't want a line break to
|
||||
occur after the <i>tsheg</i> and cause a <i>shad</i> to begin a
|
||||
line; see the section on formatting Tibetan texts in the <i>Tibetan!
|
||||
5.1</i> documentation) because Tibetan typesetting requires that NGA
|
||||
not appear directly before a <i>shad</i>. (Perhaps {NGA,}
|
||||
would look too much like {KA}.)
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The converters embody the rule that a <i>shad</i> does not appear
|
||||
after GA or KA unless a <i>shabs kyu</i> vowel is on the GA or
|
||||
KA. For example, the space in {MA ,HA} is a <i>tsheg</i>,
|
||||
and the space in {KU ,HA} is a <i>tsheg</i>, but the space in
|
||||
{GA ,HA} is Tibetan whitespace.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
If you find that the converters put a <i>tsheg</i> where it does not
|
||||
belong, miss a <i>tsheg</i>, or put whitespace where it does belong,
|
||||
please contact <a
|
||||
href="mailto:thdltools-devel@lists.sourceforge.net">the
|
||||
developers</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Though the ACIP standard does not mention it, it appears that some
|
||||
ACIP Release IV texts use a period (i.e., {.}) to indicate a
|
||||
non-breaking tsheg (i.e., U+0F0C). Search for {NGO.,},
|
||||
{....,DAM}, etc. Unless {,}, {.}, or a letter (i.e., a through
|
||||
z) follows the {.}, it is only grudingly interpreted as a
|
||||
non-breaking tsheg -- a warning is generated, too. FIXME: Is
|
||||
this right? Allow for treating {.} as an outright error.<!--
|
||||
DLC FIXME -->
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Note that the treatment of the very last line in an input text is
|
||||
circumspect.<!-- DLC FIXME -->
|
||||
</p>
|
||||
|
||||
|
||||
<!-- <h1>DLC</h1>
|
||||
|
@ -1397,6 +1471,18 @@ Nativeness</h2>
|
|||
a change in font size.) [<a
|
||||
href="http://sourceforge.net/tracker/index.php?func=detail&aid=855519&group_id=61934&atid=502515">855519</a>]
|
||||
</li>
|
||||
<li>
|
||||
A folio marker {@0B1} can appear; it gives an error at present.
|
||||
</li>
|
||||
<li>
|
||||
The treatment of the very last line in an input text may be buggy
|
||||
with regard to treatment of ACIP spaces, etc.<!-- DLC -->
|
||||
</li>
|
||||
<li>
|
||||
The treatment of {:} directly before a line break is likely
|
||||
incorrect; a <i>tsheg</i> is inserted right now after the
|
||||
visarga.<!-- DLC FIXME -->
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -1486,6 +1572,11 @@ Nativeness</h2>
|
|||
The converter should warn for each occurrence of the vowels {'E},
|
||||
{'O}, {'EE}, or {'OO}.
|
||||
</li>
|
||||
<li>
|
||||
Default <a href="#sub">substitution</a> rules should handle
|
||||
{KAsh}, which seems to always mean {K+sh} in ACIP Release IV
|
||||
texts.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
|
Loading…
Reference in a new issue