THDL : Tools : Software : Tibetan Machine Web Converter

Tibetan Machine Web Converter

In the same JAR file as Jskad, power users will find a command-line utility that converts Tibetan documents from one digital representation to another.  The converter embodies the same technology as Jskad itself, but often works even when Jskad fails due to Java's presently poor support for viewing RTF documents.  This command-line utility converts a Tibetan Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to either of these three output formats:

  • RTF files in Unicode
  • RTF files with the appropriate THDL Extended Wylie (Wylie) used instead of TMW
  • RTF files in Tibetan Machine (used in legacy systems)

In addition, this converter can convert Tibetan Machine RTF files to Tibetan Machine Web RTF files, and takes precautions to ensure that only a 100% perfect conversion is done in both directions (TM->TMW and TMW>TM).  One such precaution is that two independent teams (Garrett and Garson, Chandler) turned the Tibetan Machine Web documentation into TM<->TMW tables.  These tables were compared, giving full confidence that the tables are as accurate as the documentation (which has a few flaws itself).  That documentation has not been extensively verified against the actual fonts, however.  Another precaution is that any unknown characters cause the conversion to fail, and the result is a document containing merely the unknown characters.  (There are some known, illegal glyphs created by Tibet Doc, and the converter handles the ones it knows of and treats the rest as unknown.)

This converter is smart enough to solve the "curly-brace problem", wherein Tahoma '{', '}', and '\' characters appear instead of the TMW stacks they are supposed to represent.  This problem originates with certain versions of Microsoft Word's Rich Text Format writing capabilities.

Further, this converter gives a polite error message when a given .rtf file simply cannot be read by the version of Java used.

Perhaps most importantly, the converter has a --find-some-non-tmw mode of operation that gives you, the user, confidence that RTF reading and writing idiosyncrasies are not going to interfere with a flawless conversion.  It does so by printing out the first occurrence of a given character in a non-TMW font.  Here is some example output:

java -cp "c:\my thdl tools\Jskad.jar" \
     org.thdl.tib.input.TibetanConverter \
        --find-some-non-tmw \
        "Dalai Lama Fifth History 01.rtf"

Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39
Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45
Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66
Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219
Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237
Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754

Given the above output, you can be sure that a flawless conversion (barring the appearance of known bugs) will result when you run java -cp "c:\my thdl tools\Jskad.jar" org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended Wylie.rtf".  (Note that the '>' causes the output to be directed to the file named thereafter; this is quite handy.)  This is because the only text in the input file besides Tibetan is whitespace and the Tahoma characters '{', '}', and '\'. These Tahoma characters are understood by the tool; they are symptoms of the "curly-brace problem".

Failed Conversions

In this section, you'll learn how to tell if a conversion has succeeded in full, ran into minor problems, or failed altogether.

TMW to Wylie

Note that some TMW glyphs have no transliteration in Exteded Wylie.  When you encounter such a glyph, you'll find \tmwXYYY in your output, where X tells you which TMW font the troublesome glyph comes from and YYY is the decimal number of the glyph in that font (which is a number between 000 and 255 inclusive, usually between 33 and 126).  The following are values corresponding to X:

  • When X is 0, the TibetanMachineWeb font contains the glyph.
  • When X is 1, the TibetanMachineWeb1 font contains the glyph.
  • When X is 2, the TibetanMachineWeb2 font contains the glyph.
  • When X is 3, the TibetanMachineWeb3 font contains the glyph.
  • When X is 4, the TibetanMachineWeb4 font contains the glyph.
  • When X is 5, the TibetanMachineWeb5 font contains the glyph.
  • When X is 6, the TibetanMachineWeb6 font contains the glyph.
  • When X is 7, the TibetanMachineWeb7 font contains the glyph.
  • When X is 8, the TibetanMachineWeb8 font contains the glyph.
  • When X is 9, the TibetanMachineWeb9 font contains the glyph.

Upon finding a \tmwXYYY sequence in your output, you should consult the documentation for the specific TMW font named.  Find the glyph (by its YYY value) and decide how to proceed.  If you find a glyph that you believe should have been converted into Extended Wylie by the tool, please report this as a bug through the SourceForge website or via e-mail.

Other Conversions

The other conversions are all-or-nothing.  That is, if you run into any trouble whatsoever, the result will be a file containing just the problematic glyphs, each preceded by achen (i.e., U+0F68, the letter whose THDL Extended Wylie representation is 'a').  These glyphs will be bracketed on the left by U+0F3C (for which the THDL Extended Wylie is '(') and on the right by U+0F3D (for which the THDL Extended Wylie is ')').  If your result is as long as your input, then the conversion went flawlessly.

There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091]) that has no Tibetan Machine equivalent.  This glyph is the only TMW glyph that can cause a TMW->TM conversion to fail.  It is fairly common, though, especially if you've used Jskad to prepare your document.  It might be appropriate to change the document to use TibetanMachineWeb7, glyph 90 [\tmw7090], a similar glyph that does have a TM equivalent.

You might consider using Jskad to convert documents that give errors, as it has better error reporting and can tell you just what's wrong.

If you ever encounter problems in a TM->TMW conversion, please send us mail with the error report (and the problem input document's resulting document) so that we can improve our tools.

Invoking the Converter

First add Jskad.jar to your CLASSPATH.  You can do this by setting an environment variable CLASSPATH to contain the absolute path of the Jskad.jar file and then running the command java org.thdl.tib.input.TibetanConverter.  Alternatively, you can use java -cp "c:\my tibetan documents\Jskad.jar" org.thdl.tib.input.TibetanConverter where you put in the appropriate path to Jskad.jar.  You will see usage information appear if you do this correctly; you'll see a message like java.lang.NoClassDefFoundError: org/thdl/tib/input/TibetanConverter; Exception in thread "main" if you've not correctly told Java where to find Jskad.jar.

Known Bugs

All known bugs are listed in this section.  They're more likely to be fixed if users complain, so complain away.

First, if the TMW given is not syntactically legal, then the Wylie that results will not necessarily yield, if imported into Jskad, the same Tibetan with which the converter started.  The glyphs corresponding to the Wylie 'jaskadaskeda' have this problem, for example.

Please e-mail us your comments about this page.

The THDL Tools project is generously hosted by: SourceForge Logo