Better docs w.r.t. the lexer's handling of ACIP spaces etc.

2003-12-10 06:57:12 +00:00 · 2003-12-10 06:57:12 +00:00 · 0378e38d4a
commit 0378e38d4a
parent 8561623b5e
1 changed files with 92 additions and 1 deletions
--- a/htdocs/ACIP_To_Tibetan_Converter.html
+++ b/htdocs/ACIP_To_Tibetan_Converter.html
@ -804,7 +804,81 @@ TIBETAN FONT AND NEEDS TO BE REDONE BY DOUBLE INPUT]"
 </p>

 <p>
-  FIXME: describe when the converter treats a space as a <i>tsheg</i> and when a space is Tibetan whitespace.&nbsp; Describe how a tsheg does not appear after {KA} and {GA} with most vowels, describe the handling of {NGA,} as {NGA&nbsp;,}.&nbsp; Talk about dzongkha vs. tibetan when it comes to a <i>tsheg</i> at the end of a string of <i>tsheg bar</i>s.&nbsp; Describe treatment of final line break or lack thereof.&nbsp; Warn users to watch out for lines that end with {-}.&nbsp; Describe treatment of {.} in certain contexts as U+0F0C.&nbsp; Etc.
+  The converters will insert a <i>tsheg</i> in some places where no ACIP
+  {&nbsp;} appears; this happens after {PA} and {DANG,} below:
+</p>
+<pre>
+GA PA
+
+GA PHA 
+
+DAM,
+LHAG
+
+GA CA,
+
+GA 
+</pre>
+
+<p>
+  Note that a space appears after {PHA}, and a comma appears after
+  {CA}, but {PA} has nothing between it and a line break.&nbsp; The
+  converters are smart enough to insert a <i>tsheg</i> regardless.
+</p>
+
+<p>
+  Also missing from the above ACIP, but inserted automatically by the
+  converters, is Tibetan whitespace; the converter sees
+  {DAM,&nbsp;LHAG} instead of {DAM,LHAG} above.
+</p>
+
+<p>
+  If such automatic corrections are not desired, try using a Unicode
+  <a href="#escapes">escape</a> before the line break instead of {PA}
+  or {,}.
+</p>
+
+<p>
+  The converters also treat {NGA,} as a typo for {NGA&nbsp;,}
+  (actually, {NGA\u0F0C,} since one wouldn't want a line break to
+  occur after the <i>tsheg</i> and cause a <i>shad</i> to begin a
+  line; see the section on formatting Tibetan texts in the <i>Tibetan!
+  5.1</i> documentation) because Tibetan typesetting requires that NGA
+  not appear directly before a <i>shad</i>.&nbsp; (Perhaps {NGA,}
+  would look too much like {KA}.)
+</p>
+
+<p>
+  The converters embody the rule that a <i>shad</i> does not appear
+  after GA or KA unless a <i>shabs kyu</i> vowel is on the GA or
+  KA.&nbsp; For example, the space in {MA&nbsp;,HA} is a <i>tsheg</i>,
+  and the space in {KU&nbsp;,HA} is a <i>tsheg</i>, but the space in
+  {GA&nbsp;,HA} is Tibetan whitespace.
+</p>
+
+<p>
+  If you find that the converters put a <i>tsheg</i> where it does not
+  belong, miss a <i>tsheg</i>, or put whitespace where it does belong,
+  please contact <a
+  href="mailto:thdltools-devel@lists.sourceforge.net">the
+  developers</a>.
+</p>
+
+<p>
+  Though the ACIP standard does not mention it, it appears that some
+  ACIP Release IV texts use a period (i.e., {.}) to indicate a
+  non-breaking tsheg (i.e., U+0F0C).&nbsp; Search for {NGO.,},
+  {....,DAM}, etc.&nbsp; Unless {,}, {.}, or a letter (i.e., a through
+  z) follows the {.}, it is only grudingly interpreted as a
+  non-breaking tsheg -- a warning is generated, too.&nbsp; FIXME: Is
+  this right?&nbsp; Allow for treating {.} as an outright error.<!--
+  DLC FIXME -->
+</p>
+
+<p>
+  Note that the treatment of the very last line in an input text is
+  circumspect.<!-- DLC FIXME -->
+</p>


 <!-- <h1>DLC</h1>
@ -1397,6 +1471,18 @@ Nativeness</h2>
    a change in font size.)&nbsp; [<a
    href="http://sourceforge.net/tracker/index.php?func=detail&aid=855519&group_id=61934&atid=502515">855519</a>]
  </li>
+  <li>
+    A folio marker {@0B1} can appear; it gives an error at present.
+  </li>
+  <li>
+    The treatment of the very last line in an input text may be buggy
+    with regard to treatment of ACIP spaces, etc.<!-- DLC -->
+  </li>
+  <li>
+    The treatment of {:} directly before a line break is likely
+    incorrect; a <i>tsheg</i> is inserted right now after the
+    visarga.<!-- DLC FIXME -->
+  </li>
 </ul>


@ -1486,6 +1572,11 @@ Nativeness</h2>
    The converter should warn for each occurrence of the vowels {'E},
    {'O}, {'EE}, or {'OO}.
  </li>
+  <li>
+    Default <a href="#sub">substitution</a> rules should handle
+    {KAsh}, which seems to always mean {K+sh} in ACIP Release IV
+    texts.
+  </li>
 </ul>