Documents the new error and warning messages. When using short-form

messages, refer to this document to figure out what the converter thinks is
questionable.
This commit is contained in:
dchandler 2004-05-16 18:07:17 +00:00
parent 7bf0bcfa25
commit c9127ba341
1 changed files with 181 additions and 67 deletions

View File

@ -269,6 +269,15 @@
<tt>[#WARNING</tt>.
</p>
<p>
Some warning or error messages refer to lexical errors, that is,
errors that occurs when <a href="#lex">breaking an input text up
into <i>tsheg bar</i>s</a>.&nbsp; Others are parsing errors, that
is, errors that occur during the <a href="#parse">interpretation of
ACIP <i>tsheg bar</i>s</a>.&nbsp; It helps to understand both these
processes.
</p>
<p>
There are four warning levels: 'None', 'Some', 'Most', and
'All'.&nbsp; Choose 'None' if you don't want any warnings to appear
@ -284,89 +293,177 @@
</p>
<p>
The following are some (but not all) error and warning messages,
accompanied by further explication:
It is possible to alter the severity of a warning at runtime.&nbsp;
It is not possible to make an error a warning, however, and it is
not possible to make a warning into an error (though that might be
useful [vote for RFE <a
href="http://sourceforge.net/tracker/index.php?func=detail&aid=954903&group_id=61934&atid=502518">#954903</a>
if you want it].&nbsp; To change the severity of a warning, set the
system property <tt>thdl.acip.to.tibetan.warning.severity.XXX</tt>,
where XXX is the error number, e.g. 501, to your choice of
<tt>DISABLED</tt>, <tt>Some</tt>, <tt>Most</tt>, or
<tt>All</tt>.&nbsp; Alternatively, alter <tt>options.txt</tt>, a
file found inside the top level of the JAR file, as the comments
indicate.&nbsp; These instructions are for experts; please contact
<a href="mailto:thdltools-devel@lists.sourceforge.net">the
developers</a> if you need help.
</p>
<p>
One may choose to have ACIP-&gt;Tibetan ERRORS appear in long (i.e.,
verbose) form or in short (i.e., terse) forms.&nbsp; When short
forms appear, they are embedded in the output like <tt>[#ERROR 130:
{X}]</tt>.&nbsp; The long forms are as follows:
</p>
<p><tt>101: There's not even a unique, non-illegal parse for {X}</tt></p>
<p><tt>102: Found an open bracket, 'X', within a [#COMMENT]-style comment. Brackets may not appear in comments.</tt></p>
<p><tt>103: Found a truly unmatched close bracket, 'X'.</tt></p>
<p><tt>104: Found a closing bracket, 'X', without a matching open bracket. Perhaps a [#COMMENT] incorrectly written as [COMMENT], or a [*CORRECTION] written incorrectly as [CORRECTION], caused this.</tt></p>
<p><tt>105: Found a truly unmatched open bracket, '[' or '{', prior to this current illegal open bracket, 'X'.</tt></p>
<p><tt>106: Found an illegal open bracket (in context, this is 'X'). Perhaps there is a [#COMMENT] written incorrectly as [COMMENT], or a [*CORRECTION] written incorrectly as [CORRECTION], or an unmatched open bracket?</tt></p>
<p><tt>107: Found an illegal at sign, @ (in context, this is X). This folio marker has a period, '.', at the end of it, which is illegal.</tt></p>
<p><tt>108: Found an illegal at sign, @ (in context, this is X). This folio marker is not followed by whitespace, as is expected.</tt></p>
<p><tt>109: Found an illegal at sign, @ (in context, this is X). @012B is an example of a legal folio marker.</tt></p>
<p><tt>110: Found //, which could be legal (the Unicode would be \u0F3C\u0F3D), but is likely in an illegal construct like //NYA\\.</tt></p>
<p><tt>111: Found an illegal open parenthesis, '('. Nesting of parentheses is not allowed.</tt></p>
<p><tt>112: Unexpected closing parenthesis, ')', found.</tt></p>
<p><tt>113: The ACIP {?}, found alone, may intend U+0F08, but it may intend a question mark, i.e. '?', in the output. It may even mean that the original text could not be deciphered with certainty, like the ACIP {[?]} does.</tt></p>
<p><tt>114: Found an illegal, unprintable character.</tt></p>
<p><tt>115: Found a backslash, \, which the ACIP Tibetan Input Code standard says represents a Sanskrit virama. In practice, though, this is so often misused (to represent U+0F3D) that {\} always generates this error. If you want a Sanskrit virama, change the input document to use {\u0F84} instead of {\}. If you want U+0F3D, use {/NYA/} or {/NYA\u0F3D}.</tt></p>
<p><tt>116: Found an illegal character, 'X', with ordinal (in decimal) Y.</tt></p>
<p><tt>117: Unexpected end of input; truly unmatched open bracket found.</tt></p>
<p><tt>118: Unmatched open bracket found. A comment does not terminate.</tt></p>
<p><tt>119: Unmatched open bracket found. A correction does not terminate.</tt></p>
<p><tt>120: Slashes are supposed to occur in pairs, but the input had an unmatched '/' character.</tt></p>
<p><tt>121: Parentheses are supposed to occur in pairs, but the input had an unmatched parenthesis, '('.</tt></p>
<p><tt>122: Warning, empty tsheg bar found while converting from ACIP!</tt></p>
<p><tt>123: Cannot convert ACIP {X} because it contains a number but also a non-number.</tt></p>
<p><tt>124: Cannot convert ACIP {X} because {V}, wa-zur, appears without being subscribed to a consonant.</tt></p>
<p><tt>125: Cannot convert ACIP {X} because we would be required to assume that {A} is a consonant, when it is not clear if it is a consonant or a vowel.</tt></p>
<p><tt>126: Cannot convert ACIP {X} because it ends with a '+'.</tt></p>
<p><tt>127: Cannot convert ACIP {X} because it ends with a '-'.</tt></p>
<a name="128"><p><tt>128: Cannot convert ACIP {X} because A: is a "vowel" without an associated consonant.</tt></p></a>
<p><tt>129: Cannot convert ACIP {X} because + is not an ACIP consonant.</tt></p>
<p><tt>130: The tsheg bar ("syllable") {X} is essentially nothing.</tt></p>
<a name="131"><p><tt>131: The ACIP caret, {^}, must precede a tsheg bar.</tt></p></a>
<a name="132"><p><tt>132: The ACIP {X} must be glued to the end of a tsheg bar, but this one was not.</tt></p></a>
<p><tt>133: Cannot convert the ACIP {X} to Tibetan because it is unclear what the result should be. The correct output would likely require special mark-up.</tt></p>
<p><tt>134: The tsheg bar ("syllable") {X} has no legal parses.</tt></p>
<p><tt>135: The Unicode escape 'X' with ordinal (in decimal) Y is specified by the Extended Wylie Transliteration Scheme (EWTS), but is in the private-use area (PUA) of Unicode and will thus not be written out into the output lest you think other tools will be able to understand this non-standard construction.</tt></p>
<p><tt>136: The Unicode escape with ordinal (in decimal) Y does not match up with any TibetanMachineWeb glyph.</tt></p>
<p><tt>137: The ACIP {X} cannot be represented with the TibetanMachine or TibetanMachineWeb fonts because no such glyph exists in these fonts. The TibetanMachineWeb font has only a limited number of ready-made, precomposed glyphs, and {X} is not one of them.</tt></p>
<p><tt>138: The Unicode escape 'X' with ordinal (in decimal) Y is in the Tibetan range of Unicode (i.e., [U+0F00, U+0FFF]), but is a reserved code in that area.</tt></p>
<hr>
<p>
Just as with ERRORS, one may choose to have WARNINGS appear in
either short or long form.&nbsp; The long forms of warnings are as
follows:
</p>
<a name="501"><p><tt>501: Using X, but only because the tool's knowledge of prefix rules (see the documentation) says that XX is not a legal Tibetan tsheg bar ("syllable")</tt></p></a>
<p><tt>502: The last stack does not have a vowel in {X}; this may indicate a typo, because Sanskrit, which this probably is (because it's not legal Tibetan), should have a vowel after each stack.</tt></p>
<p><tt>503: Though {X} is unambiguous, it would be more computer-friendly if '+' signs were used to stack things because there are two (or more) ways to interpret this ACIP if you're not careful.</tt></p>
<a name="504"><p><tt>504: The ACIP {X} is treated by this converter as U+0F35, but sometimes might represent U+0F14 in practice. To avoid seeing this warning again, change the input to use {\u0F35} instead of {X}.</tt></p></a>
<p><tt>505: There is a useless disambiguator in {X}.</tt></p>
<p><tt>506: There is a stack of three or more consonants in {X} that uses at least one '+' but does not use a '+' between each consonant.</tt></p>
<p><tt>507: There is a chance that the ACIP {X} was intended to represent more consonants than we parsed it as representing -- GHNYA, e.g., means GH+NYA, but you can imagine seeing GH+N+YA and typing GHNYA for it too.</tt></p>
<a name="508"><p><tt>508: The ACIP {X} has been interpreted as two stacks, not one, but you may wish to confirm that the original text had two stacks as it would be an easy mistake to make to see one stack (because there is such a stack used in Sanskrit transliteration for this particular sequence) and forget to input it with '+' characters.</tt></p></a>
<a name="509"><p><tt>509: The ACIP {X} has an initial sequence that has been interpreted as two stacks, a prefix and a root stack, not one nonnative stack, but you may wish to confirm that the original text had two stacks as it would be an easy mistake to make to see one stack (because there is such a stack used in Sanskrit transliteration for this particular sequence) and forget to input it with '+' characters.</tt></p></a>
<p><tt>510: A non-breaking tsheg, 'X', appeared, but not like "...," or ".," or ".dA" or ".DA".</tt></p>
<p><tt>511: The ACIP {X} cannot be represented with the TibetanMachine or TibetanMachineWeb fonts because no such glyph exists in these fonts. The TibetanMachineWeb font has only a limited number of ready-made, precomposed glyphs, and {X} is not one of them.</tt></p>
<p><tt>512: There is a chance that the ACIP {X} was intended to represent more consonants than we parsed it as representing -- GHNYA, e.g., means GH+NYA, but you can imagine seeing GH+N+YA and typing GHNYA for it too. In fact, there are glyphs in the Tibetan Machine font for N+N+Y, N+G+H, G+N+Y, G+H+N+Y, T+N+Y, T+S+TH, T+S+N, T+S+N+Y, TS+NY, TS+N+Y, H+N+Y, M+N+Y, T+S+M, T+S+M+Y, T+S+Y, T+S+R, T+S+V, N+T+S, T+S, S+H, R+T+S, R+T+S+N, R+T+S+N+Y, and N+Y, indicating the importance of these easily mistyped stacks, so the possibility is very real.</tt></p>
<hr>
<p>
The above messages are perhaps not verbose enough to help you figure
out what the converter thinks is wrong or questionable, so below is
further explanation of a few error and warning messages:
</p>
<ul>
<li>
<tt>[#ERROR CONVERTING ACIP DOCUMENT: The Unicode escape with
ordinal 3912 does not match up with any TibetanMachineWeb
glyph.]</tt> appears for the input {\u0F48} because there is no
character at the Unicode codepoint U+0F48 (decimal 3912).
</li>
<li>
<tt>[#ERROR The ACIP {G+N+NA} cannot be represented with the
TibetanMachine or TibetanMachineWeb fonts because no such glyph
exists in these fonts.]</tt> appears because the Tibetan Machine
Web font has only a limited number of ready-made, precomposed
glyphs, and {G+N+NA} is not one of them.&nbsp; You'll only see
this error in an ACIP-&gt;TMW conversion, not an ACIP-&gt;Unicode
conversion.
</li>
<li>
<tt>[#ERROR CONVERTING ACIP DOCUMENT: This converter cannot
convert the ACIP {x} to Tibetan because it is unclear what the
result should be.]</tt> appears because the appropriate output for
this likely requires special mark-up.
</li>
<li>
<tt>[#ERROR CONVERTING ACIP DOCUMENT: Lexical error: The ACIP {^}
must precede a tsheg bar.]</tt> appears for
Error <a href="#131">131</a> appears for
{^&nbsp;&nbsp;GONG&nbsp;SA}, for example, because only
{^GONG&nbsp;SA} and {^&nbsp;GONG&nbsp;SA} are supported in this
implementation.
</li>
<li>
<tt>[#ERROR CONVERTING ACIP DOCUMENT: The tsheg bar ("syllable") :
has these errors: Cannot convert ACIP A: because A: is a "vowel"
without an associated consonant]</tt> appears for the input {:}
because {:} cannot appear alone.&nbsp; (Sloppily, this message
exposes you to the internals of the converter, where {:} is
thought of as {A:} in some contexts.)
Error <a href="#128">128</a> appears for the input {:} because {:}
cannot appear alone.&nbsp; (Sloppily, this message exposes you to
the internals of the converter, where {:} is thought of as {A:} in
some contexts.)
</li>
<li>
<tt>[#ERROR CONVERTING ACIP DOCUMENT: Lexical error: The ACIP x
must be glued to the end of a tsheg bar, but this one was
not]</tt> appears because {%}, {o}, and {x} are really only to be
applied to whole <i>tsheg bar</i>s, and should not occur alone.
Error <a href="#132">132</a> appears because {%}, {o}, and {x} are
really only to be applied to whole <i>tsheg bar</i>s, and should
not occur alone.
</li>
<li>
<tt>[#WARNING CONVERTING ACIP DOCUMENT: The ACIP DGYA has been
interpreted as two stacks, not one, but you may wish to confirm
that the original text had two stacks as it would be an easy
mistake to make to see one stack and forget to input it with '+'
characters.]</tt> appears because it helps evince the impact of <a
href="#prefix">prefix rules</a>, a subtle point with regards to
ACIP because they are implied, but not discussed explicitly in
depth, by the ACIP standard.
Each of warnings <a href="#501">501</a>, <a href="#508">508</a>
and <a href="#509">509</a> appears because it helps evince the
impact of <a href="#prefix">prefix rules</a>, a subtle point with
regard to ACIP because they are implied, but not discussed
explicitly in depth, by the ACIP standard.
</li>
<li>
<tt>[#WARNING CONVERTING ACIP DOCUMENT: Warning: We're going with
{B+NA}, but only because our knowledge of prefix rules says that
{B}{NA} is not a legal Tibetan tsheg bar ("syllable")]</tt>
appears for the same reason as above.
</li>
<li>
<tt>[#WARNING CONVERTING ACIP DOCUMENT: Lexical warning: The ACIP
{%} is treated by this converter as U+0F35, but sometimes might
represent U+0F14 in practice. To avoid seeing this warning again,
change the input to use {\u0F35} instead of {%}.]</tt> appears
because some ACIP transliteration out there does use {%} to mean
U+0F14.
Warning <a href="#504">504</a> appears because some ACIP
transliteration out there does use {%} to mean U+0F14.
</li>
</ul>
<p>
When warning or error messages refer to a 'Lexical error', that is
an error that occurs when <a href="#lex">breaking an input text up
into <i>tsheg bar</i>s</a>.&nbsp; To fully understand all warning
and error messages, a thorough understanding of <a href="#lex">that
process</a> and of the <a href="#parse">interpretation of ACIP
<i>tsheg bar</i>s</a> is required.
</p>
<a name="colors"></a><h2>Coloration</h2>
<p>
@ -1524,6 +1621,23 @@ Nativeness</h2>
</p>
<ul>
<li>
The Unicode U+0F43 is equivalent to the sequence U+0F42 followed
by U+0FB7.&nbsp; There are several distinct but similar
cases.&nbsp; The converter should have an option that allows for
producing one form or the other instead.&nbsp; (In practice, doing
Unicode normalization on the output is probably going to give you
results just as good, and having a separate normalizer facilitates
code reuse.)&nbsp; See issue <a
href="http://sourceforge.net/tracker/index.php?func=detail&aid=946063&group_id=61934&atid=502518">946043</a>.
</li>
<li>
The fact that stacks G+N+Y and M+N+Y exist in the TMW font means
that the ACIP snippets {GNY} and {MNY} should, in some cases,
trigger warning 512.&nbsp; They do not do so at present, and
warning 507 is not given either.&nbsp; See issue <a
href="http://sourceforge.net/tracker/index.php?func=detail&aid=946058&group_id=61934&atid=502518">946058</a>.
</li>
<li>
At present, an error in ACIP-&gt;TMW conversion is given when ACIP
like {RTSNY} or {NNY} is seen; this is because no glyph R+TS+NY or
@ -1531,8 +1645,8 @@ Nativeness</h2>
is a significant possibility that R+T+S+N+Y or N+N+Y was intended
because TMW does have glyphs for both.&nbsp; It would be best to
give an error or stern warning when creating the Unicode for RTSNY
etc., and to give an error that is more specific when creating TMW
for RTSNY etc.&nbsp; See issue <a
etc., and to allow for giving an error when creating TMW for RTSNY
etc. (whereas right now warning 512 is given).&nbsp; See issue <a
href="http://sourceforge.net/tracker/index.php?func=detail&aid=936998&group_id=61934&atid=502518">936998</a>.
</li>
<li>