diff --git a/htdocs/ACIP_To_Tibetan_Converter.html b/htdocs/ACIP_To_Tibetan_Converter.html index 1ebfd2f..28fc01f 100644 --- a/htdocs/ACIP_To_Tibetan_Converter.html +++ b/htdocs/ACIP_To_Tibetan_Converter.html @@ -269,6 +269,15 @@ [#WARNING.

+

+ Some warning or error messages refer to lexical errors, that is, + errors that occurs when breaking an input text up + into tsheg bars.  Others are parsing errors, that + is, errors that occur during the interpretation of + ACIP tsheg bars.  It helps to understand both these + processes. +

+

There are four warning levels: 'None', 'Some', 'Most', and 'All'.  Choose 'None' if you don't want any warnings to appear @@ -284,89 +293,177 @@

- The following are some (but not all) error and warning messages, - accompanied by further explication: + It is possible to alter the severity of a warning at runtime.  + It is not possible to make an error a warning, however, and it is + not possible to make a warning into an error (though that might be + useful [vote for RFE #954903 + if you want it].  To change the severity of a warning, set the + system property thdl.acip.to.tibetan.warning.severity.XXX, + where XXX is the error number, e.g. 501, to your choice of + DISABLED, Some, Most, or + All.  Alternatively, alter options.txt, a + file found inside the top level of the JAR file, as the comments + indicate.  These instructions are for experts; please contact + the + developers if you need help. +

+ +

+ One may choose to have ACIP->Tibetan ERRORS appear in long (i.e., + verbose) form or in short (i.e., terse) forms.  When short + forms appear, they are embedded in the output like [#ERROR 130: + {X}].  The long forms are as follows: +

+ +

101: There's not even a unique, non-illegal parse for {X}

+ +

102: Found an open bracket, 'X', within a [#COMMENT]-style comment. Brackets may not appear in comments.

+ +

103: Found a truly unmatched close bracket, 'X'.

+ +

104: Found a closing bracket, 'X', without a matching open bracket. Perhaps a [#COMMENT] incorrectly written as [COMMENT], or a [*CORRECTION] written incorrectly as [CORRECTION], caused this.

+ +

105: Found a truly unmatched open bracket, '[' or '{', prior to this current illegal open bracket, 'X'.

+ +

106: Found an illegal open bracket (in context, this is 'X'). Perhaps there is a [#COMMENT] written incorrectly as [COMMENT], or a [*CORRECTION] written incorrectly as [CORRECTION], or an unmatched open bracket?

+ +

107: Found an illegal at sign, @ (in context, this is X). This folio marker has a period, '.', at the end of it, which is illegal.

+ +

108: Found an illegal at sign, @ (in context, this is X). This folio marker is not followed by whitespace, as is expected.

+ +

109: Found an illegal at sign, @ (in context, this is X). @012B is an example of a legal folio marker.

+ +

110: Found //, which could be legal (the Unicode would be \u0F3C\u0F3D), but is likely in an illegal construct like //NYA\\.

+ +

111: Found an illegal open parenthesis, '('. Nesting of parentheses is not allowed.

+ +

112: Unexpected closing parenthesis, ')', found.

+ +

113: The ACIP {?}, found alone, may intend U+0F08, but it may intend a question mark, i.e. '?', in the output. It may even mean that the original text could not be deciphered with certainty, like the ACIP {[?]} does.

+ +

114: Found an illegal, unprintable character.

+ +

115: Found a backslash, \, which the ACIP Tibetan Input Code standard says represents a Sanskrit virama. In practice, though, this is so often misused (to represent U+0F3D) that {\} always generates this error. If you want a Sanskrit virama, change the input document to use {\u0F84} instead of {\}. If you want U+0F3D, use {/NYA/} or {/NYA\u0F3D}.

+ +

116: Found an illegal character, 'X', with ordinal (in decimal) Y.

+ +

117: Unexpected end of input; truly unmatched open bracket found.

+ +

118: Unmatched open bracket found. A comment does not terminate.

+ +

119: Unmatched open bracket found. A correction does not terminate.

+ +

120: Slashes are supposed to occur in pairs, but the input had an unmatched '/' character.

+ +

121: Parentheses are supposed to occur in pairs, but the input had an unmatched parenthesis, '('.

+ +

122: Warning, empty tsheg bar found while converting from ACIP!

+ +

123: Cannot convert ACIP {X} because it contains a number but also a non-number.

+ +

124: Cannot convert ACIP {X} because {V}, wa-zur, appears without being subscribed to a consonant.

+ +

125: Cannot convert ACIP {X} because we would be required to assume that {A} is a consonant, when it is not clear if it is a consonant or a vowel.

+ +

126: Cannot convert ACIP {X} because it ends with a '+'.

+ +

127: Cannot convert ACIP {X} because it ends with a '-'.

+ +

128: Cannot convert ACIP {X} because A: is a "vowel" without an associated consonant.

+ +

129: Cannot convert ACIP {X} because + is not an ACIP consonant.

+ +

130: The tsheg bar ("syllable") {X} is essentially nothing.

+ +

131: The ACIP caret, {^}, must precede a tsheg bar.

+ +

132: The ACIP {X} must be glued to the end of a tsheg bar, but this one was not.

+ +

133: Cannot convert the ACIP {X} to Tibetan because it is unclear what the result should be. The correct output would likely require special mark-up.

+ +

134: The tsheg bar ("syllable") {X} has no legal parses.

+ +

135: The Unicode escape 'X' with ordinal (in decimal) Y is specified by the Extended Wylie Transliteration Scheme (EWTS), but is in the private-use area (PUA) of Unicode and will thus not be written out into the output lest you think other tools will be able to understand this non-standard construction.

+ +

136: The Unicode escape with ordinal (in decimal) Y does not match up with any TibetanMachineWeb glyph.

+ +

137: The ACIP {X} cannot be represented with the TibetanMachine or TibetanMachineWeb fonts because no such glyph exists in these fonts. The TibetanMachineWeb font has only a limited number of ready-made, precomposed glyphs, and {X} is not one of them.

+ +

138: The Unicode escape 'X' with ordinal (in decimal) Y is in the Tibetan range of Unicode (i.e., [U+0F00, U+0FFF]), but is a reserved code in that area.

+ +
+ + +

+ Just as with ERRORS, one may choose to have WARNINGS appear in + either short or long form.  The long forms of warnings are as + follows: +

+ +

501: Using X, but only because the tool's knowledge of prefix rules (see the documentation) says that XX is not a legal Tibetan tsheg bar ("syllable")

+ +

502: The last stack does not have a vowel in {X}; this may indicate a typo, because Sanskrit, which this probably is (because it's not legal Tibetan), should have a vowel after each stack.

+ +

503: Though {X} is unambiguous, it would be more computer-friendly if '+' signs were used to stack things because there are two (or more) ways to interpret this ACIP if you're not careful.

+ +

504: The ACIP {X} is treated by this converter as U+0F35, but sometimes might represent U+0F14 in practice. To avoid seeing this warning again, change the input to use {\u0F35} instead of {X}.

+ +

505: There is a useless disambiguator in {X}.

+ +

506: There is a stack of three or more consonants in {X} that uses at least one '+' but does not use a '+' between each consonant.

+ +

507: There is a chance that the ACIP {X} was intended to represent more consonants than we parsed it as representing -- GHNYA, e.g., means GH+NYA, but you can imagine seeing GH+N+YA and typing GHNYA for it too.

+ +

508: The ACIP {X} has been interpreted as two stacks, not one, but you may wish to confirm that the original text had two stacks as it would be an easy mistake to make to see one stack (because there is such a stack used in Sanskrit transliteration for this particular sequence) and forget to input it with '+' characters.

+ +

509: The ACIP {X} has an initial sequence that has been interpreted as two stacks, a prefix and a root stack, not one nonnative stack, but you may wish to confirm that the original text had two stacks as it would be an easy mistake to make to see one stack (because there is such a stack used in Sanskrit transliteration for this particular sequence) and forget to input it with '+' characters.

+ +

510: A non-breaking tsheg, 'X', appeared, but not like "...," or ".," or ".dA" or ".DA".

+ +

511: The ACIP {X} cannot be represented with the TibetanMachine or TibetanMachineWeb fonts because no such glyph exists in these fonts. The TibetanMachineWeb font has only a limited number of ready-made, precomposed glyphs, and {X} is not one of them.

+ +

512: There is a chance that the ACIP {X} was intended to represent more consonants than we parsed it as representing -- GHNYA, e.g., means GH+NYA, but you can imagine seeing GH+N+YA and typing GHNYA for it too. In fact, there are glyphs in the Tibetan Machine font for N+N+Y, N+G+H, G+N+Y, G+H+N+Y, T+N+Y, T+S+TH, T+S+N, T+S+N+Y, TS+NY, TS+N+Y, H+N+Y, M+N+Y, T+S+M, T+S+M+Y, T+S+Y, T+S+R, T+S+V, N+T+S, T+S, S+H, R+T+S, R+T+S+N, R+T+S+N+Y, and N+Y, indicating the importance of these easily mistyped stacks, so the possibility is very real.

+ +
+ +

+ The above messages are perhaps not verbose enough to help you figure + out what the converter thinks is wrong or questionable, so below is + further explanation of a few error and warning messages:

-

- When warning or error messages refer to a 'Lexical error', that is - an error that occurs when breaking an input text up - into tsheg bars.  To fully understand all warning - and error messages, a thorough understanding of that - process and of the interpretation of ACIP - tsheg bars is required. -

- -

Coloration

@@ -1524,6 +1621,23 @@ Nativeness