Better docs w.r.t. the lexer's handling of ACIP spaces etc.
This commit is contained in:
parent
8561623b5e
commit
0378e38d4a
1 changed files with 92 additions and 1 deletions
|
@ -804,7 +804,81 @@ TIBETAN FONT AND NEEDS TO BE REDONE BY DOUBLE INPUT]"
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
FIXME: describe when the converter treats a space as a <i>tsheg</i> and when a space is Tibetan whitespace. Describe how a tsheg does not appear after {KA} and {GA} with most vowels, describe the handling of {NGA,} as {NGA ,}. Talk about dzongkha vs. tibetan when it comes to a <i>tsheg</i> at the end of a string of <i>tsheg bar</i>s. Describe treatment of final line break or lack thereof. Warn users to watch out for lines that end with {-}. Describe treatment of {.} in certain contexts as U+0F0C. Etc.
|
The converters will insert a <i>tsheg</i> in some places where no ACIP
|
||||||
|
{ } appears; this happens after {PA} and {DANG,} below:
|
||||||
|
</p>
|
||||||
|
<pre>
|
||||||
|
GA PA
|
||||||
|
|
||||||
|
GA PHA
|
||||||
|
|
||||||
|
DAM,
|
||||||
|
LHAG
|
||||||
|
|
||||||
|
GA CA,
|
||||||
|
|
||||||
|
GA
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Note that a space appears after {PHA}, and a comma appears after
|
||||||
|
{CA}, but {PA} has nothing between it and a line break. The
|
||||||
|
converters are smart enough to insert a <i>tsheg</i> regardless.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Also missing from the above ACIP, but inserted automatically by the
|
||||||
|
converters, is Tibetan whitespace; the converter sees
|
||||||
|
{DAM, LHAG} instead of {DAM,LHAG} above.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If such automatic corrections are not desired, try using a Unicode
|
||||||
|
<a href="#escapes">escape</a> before the line break instead of {PA}
|
||||||
|
or {,}.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The converters also treat {NGA,} as a typo for {NGA ,}
|
||||||
|
(actually, {NGA\u0F0C,} since one wouldn't want a line break to
|
||||||
|
occur after the <i>tsheg</i> and cause a <i>shad</i> to begin a
|
||||||
|
line; see the section on formatting Tibetan texts in the <i>Tibetan!
|
||||||
|
5.1</i> documentation) because Tibetan typesetting requires that NGA
|
||||||
|
not appear directly before a <i>shad</i>. (Perhaps {NGA,}
|
||||||
|
would look too much like {KA}.)
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The converters embody the rule that a <i>shad</i> does not appear
|
||||||
|
after GA or KA unless a <i>shabs kyu</i> vowel is on the GA or
|
||||||
|
KA. For example, the space in {MA ,HA} is a <i>tsheg</i>,
|
||||||
|
and the space in {KU ,HA} is a <i>tsheg</i>, but the space in
|
||||||
|
{GA ,HA} is Tibetan whitespace.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If you find that the converters put a <i>tsheg</i> where it does not
|
||||||
|
belong, miss a <i>tsheg</i>, or put whitespace where it does belong,
|
||||||
|
please contact <a
|
||||||
|
href="mailto:thdltools-devel@lists.sourceforge.net">the
|
||||||
|
developers</a>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Though the ACIP standard does not mention it, it appears that some
|
||||||
|
ACIP Release IV texts use a period (i.e., {.}) to indicate a
|
||||||
|
non-breaking tsheg (i.e., U+0F0C). Search for {NGO.,},
|
||||||
|
{....,DAM}, etc. Unless {,}, {.}, or a letter (i.e., a through
|
||||||
|
z) follows the {.}, it is only grudingly interpreted as a
|
||||||
|
non-breaking tsheg -- a warning is generated, too. FIXME: Is
|
||||||
|
this right? Allow for treating {.} as an outright error.<!--
|
||||||
|
DLC FIXME -->
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Note that the treatment of the very last line in an input text is
|
||||||
|
circumspect.<!-- DLC FIXME -->
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
<!-- <h1>DLC</h1>
|
<!-- <h1>DLC</h1>
|
||||||
|
@ -1397,6 +1471,18 @@ Nativeness</h2>
|
||||||
a change in font size.) [<a
|
a change in font size.) [<a
|
||||||
href="http://sourceforge.net/tracker/index.php?func=detail&aid=855519&group_id=61934&atid=502515">855519</a>]
|
href="http://sourceforge.net/tracker/index.php?func=detail&aid=855519&group_id=61934&atid=502515">855519</a>]
|
||||||
</li>
|
</li>
|
||||||
|
<li>
|
||||||
|
A folio marker {@0B1} can appear; it gives an error at present.
|
||||||
|
</li>
|
||||||
|
<li>
|
||||||
|
The treatment of the very last line in an input text may be buggy
|
||||||
|
with regard to treatment of ACIP spaces, etc.<!-- DLC -->
|
||||||
|
</li>
|
||||||
|
<li>
|
||||||
|
The treatment of {:} directly before a line break is likely
|
||||||
|
incorrect; a <i>tsheg</i> is inserted right now after the
|
||||||
|
visarga.<!-- DLC FIXME -->
|
||||||
|
</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
@ -1486,6 +1572,11 @@ Nativeness</h2>
|
||||||
The converter should warn for each occurrence of the vowels {'E},
|
The converter should warn for each occurrence of the vowels {'E},
|
||||||
{'O}, {'EE}, or {'OO}.
|
{'O}, {'EE}, or {'OO}.
|
||||||
</li>
|
</li>
|
||||||
|
<li>
|
||||||
|
Default <a href="#sub">substitution</a> rules should handle
|
||||||
|
{KAsh}, which seems to always mean {K+sh} in ACIP Release IV
|
||||||
|
texts.
|
||||||
|
</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Add table
Reference in a new issue