Documents the new error and warning messages. When using short-form

messages, refer to this document to figure out what the converter thinks is questionable.
2004-05-16 18:07:17 +00:00 · 2004-05-16 18:07:17 +00:00 · c9127ba341
commit c9127ba341
parent 7bf0bcfa25
1 changed files with 181 additions and 67 deletions
--- a/htdocs/ACIP_To_Tibetan_Converter.html
+++ b/htdocs/ACIP_To_Tibetan_Converter.html
@ -269,6 +269,15 @@
  <tt>[#WARNING</tt>.
 </p>

+<p>
+  Some warning or error messages refer to lexical errors, that is,
+  errors that occurs when <a href="#lex">breaking an input text up
+  into <i>tsheg bar</i>s</a>.&nbsp; Others are parsing errors, that
+  is, errors that occur during the <a href="#parse">interpretation of
+  ACIP <i>tsheg bar</i>s</a>.&nbsp; It helps to understand both these
+  processes.
+</p>
+
 <p>
  There are four warning levels: 'None', 'Some', 'Most', and
  'All'.&nbsp; Choose 'None' if you don't want any warnings to appear
@ -284,89 +293,177 @@
 </p>

 <p>
-  The following are some (but not all) error and warning messages,
-  accompanied by further explication:
+  It is possible to alter the severity of a warning at runtime.&nbsp;
+  It is not possible to make an error a warning, however, and it is
+  not possible to make a warning into an error (though that might be
+  useful [vote for RFE <a
+  href="http://sourceforge.net/tracker/index.php?func=detail&aid=954903&group_id=61934&atid=502518">#954903</a>
+  if you want it].&nbsp; To change the severity of a warning, set the
+  system property <tt>thdl.acip.to.tibetan.warning.severity.XXX</tt>,
+  where XXX is the error number, e.g. 501, to your choice of
+  <tt>DISABLED</tt>, <tt>Some</tt>, <tt>Most</tt>, or
+  <tt>All</tt>.&nbsp; Alternatively, alter <tt>options.txt</tt>, a
+  file found inside the top level of the JAR file, as the comments
+  indicate.&nbsp; These instructions are for experts; please contact
+  <a href="mailto:thdltools-devel@lists.sourceforge.net">the
+  developers</a> if you need help.
+</p>
+
+<p>
+  One may choose to have ACIP-&gt;Tibetan ERRORS appear in long (i.e.,
+  verbose) form or in short (i.e., terse) forms.&nbsp; When short
+  forms appear, they are embedded in the output like <tt>[#ERROR 130:
+  {X}]</tt>.&nbsp; The long forms are as follows:
+</p>
+
+<p><tt>101: There's not even a unique, non-illegal parse for {X}</tt></p>
+
+<p><tt>102: Found an open bracket, 'X', within a [#COMMENT]-style comment.  Brackets may not appear in comments.</tt></p>
+
+<p><tt>103: Found a truly unmatched close bracket, 'X'.</tt></p>
+
+<p><tt>104: Found a closing bracket, 'X', without a matching open bracket.  Perhaps a [#COMMENT] incorrectly written as [COMMENT], or a [*CORRECTION] written incorrectly as [CORRECTION], caused this.</tt></p>
+
+<p><tt>105: Found a truly unmatched open bracket, '[' or '{', prior to this current illegal open bracket, 'X'.</tt></p>
+
+<p><tt>106: Found an illegal open bracket (in context, this is 'X').  Perhaps there is a [#COMMENT] written incorrectly as [COMMENT], or a [*CORRECTION] written incorrectly as [CORRECTION], or an unmatched open bracket?</tt></p>
+
+<p><tt>107: Found an illegal at sign, @ (in context, this is X).  This folio marker has a period, '.', at the end of it, which is illegal.</tt></p>
+
+<p><tt>108: Found an illegal at sign, @ (in context, this is X).  This folio marker is not followed by whitespace, as is expected.</tt></p>
+
+<p><tt>109: Found an illegal at sign, @ (in context, this is X).  @012B is an example of a legal folio marker.</tt></p>
+
+<p><tt>110: Found //, which could be legal (the Unicode would be \u0F3C\u0F3D), but is likely in an illegal construct like //NYA\\.</tt></p>
+
+<p><tt>111: Found an illegal open parenthesis, '('.  Nesting of parentheses is not allowed.</tt></p>
+
+<p><tt>112: Unexpected closing parenthesis, ')', found.</tt></p>
+
+<p><tt>113: The ACIP {?}, found alone, may intend U+0F08, but it may intend a question mark, i.e. '?', in the output.  It may even mean that the original text could not be deciphered with certainty, like the ACIP {[?]} does.</tt></p>
+
+<p><tt>114: Found an illegal, unprintable character.</tt></p>
+
+<p><tt>115: Found a backslash, \, which the ACIP Tibetan Input Code standard says represents a Sanskrit virama.  In practice, though, this is so often misused (to represent U+0F3D) that {\} always generates this error.  If you want a Sanskrit virama, change the input document to use {\u0F84} instead of {\}.  If you want U+0F3D, use {/NYA/} or {/NYA\u0F3D}.</tt></p>
+
+<p><tt>116: Found an illegal character, 'X', with ordinal (in decimal) Y.</tt></p>
+
+<p><tt>117: Unexpected end of input; truly unmatched open bracket found.</tt></p>
+
+<p><tt>118: Unmatched open bracket found.  A comment does not terminate.</tt></p>
+
+<p><tt>119: Unmatched open bracket found.  A correction does not terminate.</tt></p>
+
+<p><tt>120: Slashes are supposed to occur in pairs, but the input had an unmatched '/' character.</tt></p>
+
+<p><tt>121: Parentheses are supposed to occur in pairs, but the input had an unmatched parenthesis, '('.</tt></p>
+
+<p><tt>122: Warning, empty tsheg bar found while converting from ACIP!</tt></p>
+
+<p><tt>123: Cannot convert ACIP {X} because it contains a number but also a non-number.</tt></p>
+
+<p><tt>124: Cannot convert ACIP {X} because {V}, wa-zur, appears without being subscribed to a consonant.</tt></p>
+
+<p><tt>125: Cannot convert ACIP {X} because we would be required to assume that {A} is a consonant, when it is not clear if it is a consonant or a vowel.</tt></p>
+
+<p><tt>126: Cannot convert ACIP {X} because it ends with a '+'.</tt></p>
+
+<p><tt>127: Cannot convert ACIP {X} because it ends with a '-'.</tt></p>
+
+<a name="128"><p><tt>128: Cannot convert ACIP {X} because A: is a "vowel" without an associated consonant.</tt></p></a>
+
+<p><tt>129: Cannot convert ACIP {X} because + is not an ACIP consonant.</tt></p>
+
+<p><tt>130: The tsheg bar ("syllable") {X} is essentially nothing.</tt></p>
+
+<a name="131"><p><tt>131: The ACIP caret, {^}, must precede a tsheg bar.</tt></p></a>
+
+<a name="132"><p><tt>132: The ACIP {X} must be glued to the end of a tsheg bar, but this one was not.</tt></p></a>
+
+<p><tt>133: Cannot convert the ACIP {X} to Tibetan because it is unclear what the result should be.  The correct output would likely require special mark-up.</tt></p>
+
+<p><tt>134: The tsheg bar ("syllable") {X} has no legal parses.</tt></p>
+
+<p><tt>135: The Unicode escape 'X' with ordinal (in decimal) Y is specified by the Extended Wylie Transliteration Scheme (EWTS), but is in the private-use area (PUA) of Unicode and will thus not be written out into the output lest you think other tools will be able to understand this non-standard construction.</tt></p>
+
+<p><tt>136: The Unicode escape with ordinal (in decimal) Y does not match up with any TibetanMachineWeb glyph.</tt></p>
+
+<p><tt>137: The ACIP {X} cannot be represented with the TibetanMachine or TibetanMachineWeb fonts because no such glyph exists in these fonts.  The TibetanMachineWeb font has only a limited number of ready-made, precomposed glyphs, and {X} is not one of them.</tt></p>
+
+<p><tt>138: The Unicode escape 'X' with ordinal (in decimal) Y is in the Tibetan range of Unicode (i.e., [U+0F00, U+0FFF]), but is a reserved code in that area.</tt></p>
+
+<hr>
+
+
+<p>
+  Just as with ERRORS, one may choose to have WARNINGS appear in
+  either short or long form.&nbsp; The long forms of warnings are as
+  follows:
+</p>
+
+<a name="501"><p><tt>501: Using X, but only because the tool's knowledge of prefix rules (see the documentation) says that XX is not a legal Tibetan tsheg bar ("syllable")</tt></p></a>
+
+<p><tt>502: The last stack does not have a vowel in {X}; this may indicate a typo, because Sanskrit, which this probably is (because it's not legal Tibetan), should have a vowel after each stack.</tt></p>
+
+<p><tt>503: Though {X} is unambiguous, it would be more computer-friendly if '+' signs were used to stack things because there are two (or more) ways to interpret this ACIP if you're not careful.</tt></p>
+
+<a name="504"><p><tt>504: The ACIP {X} is treated by this converter as U+0F35, but sometimes might represent U+0F14 in practice.  To avoid seeing this warning again, change the input to use {\u0F35} instead of {X}.</tt></p></a>
+
+<p><tt>505: There is a useless disambiguator in {X}.</tt></p>
+
+<p><tt>506: There is a stack of three or more consonants in {X} that uses at least one '+' but does not use a '+' between each consonant.</tt></p>
+
+<p><tt>507: There is a chance that the ACIP {X} was intended to represent more consonants than we parsed it as representing -- GHNYA, e.g., means GH+NYA, but you can imagine seeing GH+N+YA and typing GHNYA for it too.</tt></p>
+
+<a name="508"><p><tt>508: The ACIP {X} has been interpreted as two stacks, not one, but you may wish to confirm that the original text had two stacks as it would be an easy mistake to make to see one stack (because there is such a stack used in Sanskrit transliteration for this particular sequence) and forget to input it with '+' characters.</tt></p></a>
+
+<a name="509"><p><tt>509: The ACIP {X} has an initial sequence that has been interpreted as two stacks, a prefix and a root stack, not one nonnative stack, but you may wish to confirm that the original text had two stacks as it would be an easy mistake to make to see one stack (because there is such a stack used in Sanskrit transliteration for this particular sequence) and forget to input it with '+' characters.</tt></p></a>
+
+<p><tt>510: A non-breaking tsheg, 'X', appeared, but not like "...," or ".," or ".dA" or ".DA".</tt></p>
+
+<p><tt>511: The ACIP {X} cannot be represented with the TibetanMachine or TibetanMachineWeb fonts because no such glyph exists in these fonts.  The TibetanMachineWeb font has only a limited number of ready-made, precomposed glyphs, and {X} is not one of them.</tt></p>
+
+<p><tt>512: There is a chance that the ACIP {X} was intended to represent more consonants than we parsed it as representing -- GHNYA, e.g., means GH+NYA, but you can imagine seeing GH+N+YA and typing GHNYA for it too.  In fact, there are glyphs in the Tibetan Machine font for N+N+Y, N+G+H, G+N+Y, G+H+N+Y, T+N+Y, T+S+TH, T+S+N, T+S+N+Y, TS+NY, TS+N+Y, H+N+Y, M+N+Y, T+S+M, T+S+M+Y, T+S+Y, T+S+R, T+S+V, N+T+S, T+S, S+H, R+T+S, R+T+S+N, R+T+S+N+Y, and N+Y, indicating the importance of these easily mistyped stacks, so the possibility is very real.</tt></p>
+
+<hr>
+
+<p>
+  The above messages are perhaps not verbose enough to help you figure
+  out what the converter thinks is wrong or questionable, so below is
+  further explanation of a few error and warning messages:
 </p>

 <ul>
  <li>
-    <tt>[#ERROR CONVERTING ACIP DOCUMENT: The Unicode escape with
-    ordinal 3912 does not match up with any TibetanMachineWeb
-    glyph.]</tt> appears for the input {\u0F48} because there is no
-    character at the Unicode codepoint U+0F48 (decimal 3912).
-  </li>
-  <li>
-    <tt>[#ERROR The ACIP {G+N+NA} cannot be represented with the
-    TibetanMachine or TibetanMachineWeb fonts because no such glyph
-    exists in these fonts.]</tt> appears because the Tibetan Machine
-    Web font has only a limited number of ready-made, precomposed
-    glyphs, and {G+N+NA} is not one of them.&nbsp; You'll only see
-    this error in an ACIP-&gt;TMW conversion, not an ACIP-&gt;Unicode
-    conversion.
-  </li>
-  <li>
-    <tt>[#ERROR CONVERTING ACIP DOCUMENT: This converter cannot
-    convert the ACIP {x} to Tibetan because it is unclear what the
-    result should be.]</tt> appears because the appropriate output for
-    this likely requires special mark-up.
-  </li>
-  <li>
-    <tt>[#ERROR CONVERTING ACIP DOCUMENT: Lexical error: The ACIP {^}
-    must precede a tsheg bar.]</tt> appears for
+    Error <a href="#131">131</a> appears for
    {^&nbsp;&nbsp;GONG&nbsp;SA}, for example, because only
    {^GONG&nbsp;SA} and {^&nbsp;GONG&nbsp;SA} are supported in this
    implementation.
  </li>
  <li>
-    <tt>[#ERROR CONVERTING ACIP DOCUMENT: The tsheg bar ("syllable") :
-    has these errors: Cannot convert ACIP A: because A: is a "vowel"
-    without an associated consonant]</tt> appears for the input {:}
-    because {:} cannot appear alone.&nbsp; (Sloppily, this message
-    exposes you to the internals of the converter, where {:} is
-    thought of as {A:} in some contexts.)
+    Error <a href="#128">128</a> appears for the input {:} because {:}
+    cannot appear alone.&nbsp; (Sloppily, this message exposes you to
+    the internals of the converter, where {:} is thought of as {A:} in
+    some contexts.)
  </li>
  <li>
-    <tt>[#ERROR CONVERTING ACIP DOCUMENT: Lexical error: The ACIP x
-    must be glued to the end of a tsheg bar, but this one was
-    not]</tt> appears because {%}, {o}, and {x} are really only to be
-    applied to whole <i>tsheg bar</i>s, and should not occur alone.
+    Error <a href="#132">132</a> appears because {%}, {o}, and {x} are
+    really only to be applied to whole <i>tsheg bar</i>s, and should
+    not occur alone.
  </li>
  <li>
-    <tt>[#WARNING CONVERTING ACIP DOCUMENT: The ACIP DGYA has been
-    interpreted as two stacks, not one, but you may wish to confirm
-    that the original text had two stacks as it would be an easy
-    mistake to make to see one stack and forget to input it with '+'
-    characters.]</tt> appears because it helps evince the impact of <a
-    href="#prefix">prefix rules</a>, a subtle point with regards to
-    ACIP because they are implied, but not discussed explicitly in
-    depth, by the ACIP standard.
+    Each of warnings <a href="#501">501</a>, <a href="#508">508</a>
+    and <a href="#509">509</a> appears because it helps evince the
+    impact of <a href="#prefix">prefix rules</a>, a subtle point with
+    regard to ACIP because they are implied, but not discussed
+    explicitly in depth, by the ACIP standard.
  </li>
  <li>
-    <tt>[#WARNING CONVERTING ACIP DOCUMENT: Warning: We're going with
-    {B+NA}, but only because our knowledge of prefix rules says that
-    {B}{NA} is not a legal Tibetan tsheg bar ("syllable")]</tt>
-    appears for the same reason as above.
-  </li>
-  <li>
-    <tt>[#WARNING CONVERTING ACIP DOCUMENT: Lexical warning: The ACIP
-    {%} is treated by this converter as U+0F35, but sometimes might
-    represent U+0F14 in practice.  To avoid seeing this warning again,
-    change the input to use {\u0F35} instead of {%}.]</tt> appears
-    because some ACIP transliteration out there does use {%} to mean
-    U+0F14.
+    Warning <a href="#504">504</a> appears because some ACIP
+    transliteration out there does use {%} to mean U+0F14.
  </li>
 </ul>

-<p>
-  When warning or error messages refer to a 'Lexical error', that is
-  an error that occurs when <a href="#lex">breaking an input text up
-  into <i>tsheg bar</i>s</a>.&nbsp; To fully understand all warning
-  and error messages, a thorough understanding of <a href="#lex">that
-  process</a> and of the <a href="#parse">interpretation of ACIP
-  <i>tsheg bar</i>s</a> is required.
-</p>
-
-
 <a name="colors"></a><h2>Coloration</h2>

 <p>
@ -1524,6 +1621,23 @@ Nativeness</h2>
 </p>

 <ul>
+  <li>
+    The Unicode U+0F43 is equivalent to the sequence U+0F42 followed
+    by U+0FB7.&nbsp; There are several distinct but similar
+    cases.&nbsp; The converter should have an option that allows for
+    producing one form or the other instead.&nbsp; (In practice, doing
+    Unicode normalization on the output is probably going to give you
+    results just as good, and having a separate normalizer facilitates
+    code reuse.)&nbsp; See issue <a
+    href="http://sourceforge.net/tracker/index.php?func=detail&aid=946063&group_id=61934&atid=502518">946043</a>.
+  </li>
+  <li>
+    The fact that stacks G+N+Y and M+N+Y exist in the TMW font means
+    that the ACIP snippets {GNY} and {MNY} should, in some cases,
+    trigger warning 512.&nbsp; They do not do so at present, and
+    warning 507 is not given either.&nbsp; See issue <a
+    href="http://sourceforge.net/tracker/index.php?func=detail&aid=946058&group_id=61934&atid=502518">946058</a>.
+  </li>
  <li>
    At present, an error in ACIP-&gt;TMW conversion is given when ACIP
    like {RTSNY} or {NNY} is seen; this is because no glyph R+TS+NY or
@ -1531,8 +1645,8 @@ Nativeness</h2>
    is a significant possibility that R+T+S+N+Y or N+N+Y was intended
    because TMW does have glyphs for both.&nbsp; It would be best to
    give an error or stern warning when creating the Unicode for RTSNY
-    etc., and to give an error that is more specific when creating TMW
-    for RTSNY etc.&nbsp; See issue <a
+    etc., and to allow for giving an error when creating TMW for RTSNY
+    etc. (whereas right now warning 512 is given).&nbsp; See issue <a
    href="http://sourceforge.net/tracker/index.php?func=detail&aid=936998&group_id=61934&atid=502518">936998</a>.
  </li>
  <li>