THDL : Tools : Software : Tibetan Machine Web Converter

Tibetan Machine Web Converter

In the same JAR file as Jskad, power users will find a command-line utility that converts a Tibetan Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to either of these three output formats:

  • RTF files in Unicode
  • RTF files with the appropriate THDL Extended Wylie (Wylie) used instead of TMW
  • RTF files in Tibetan Machine (used in legacy systems)

In addition, this converter can convert Tibetan Machine RTF files to Tibetan Machine Web RTF files, and takes precautions to ensure that only a 100% perfect conversion is done in both directions (TM->TMW and TMW>TM).  One such precaution is that two independent teams (Garrett and Garson, Chandler) turned the Tibetan Machine Web documentation into TM<->TMW tables.  These tables were compared, giving full confidence that the tables are as accurate as the documentation (which has a few flaws itself).  That documentation has not been extensively verified against the actual fonts, however.  Another precaution is that any unknown characters cause the conversion to fail, and the result is a document containing merely the unknown characters.  (There are some known, illegal glyphs created by Tibet Doc, and the converter handles the ones it knows of and treats the rest as unknown.)

This converter is smart enough to solve the "curly-brace problem", wherein Tahoma '{', '}', and '\' characters appear instead of the TMW stacks they are supposed to represent.  This problem originates with certain versions of Microsoft Word's Rich Text Format writing capabilities.

Further, this converter gives a polite error message when a given .rtf file simply cannot be read by the version of Java used.

Perhaps most importantly, the converter has a --find-some-non-tmw mode of operation that gives you, the user, confidence that RTF reading and writing idiosyncrasies are not going to interfere with a flawless conversion.  It does so by printing out the first occurrence of a given character in a non-TMW font.  Here is some example output:

java -cp Jskad.jar \
     org.thdl.tib.input.TibetanConverter \
        --find-some-non-tmw \
        "Dalai Lama Fifth History 01.rtf"
non-TMW character newline in the font Tahoma appears first at location 39
non-TMW character ' ' in the font TimesNewRoman appears first at location 45
non-TMW character '}' in the font Tahoma appears first at location 66
non-TMW character '{' in the font Tahoma appears first at location 219
non-TMW character '\' in the font Tahoma appears first at location 1237
non-TMW character newline in the font Times New Roman appears first at location 9754

Given the above output, you can be sure that a flawless conversion (barring the appearance of known bugs) will result when you run java -cp Jskad.jar org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended Wylie.rtf".  This is because the only text in the input file besides Tibetan is whitespace and the Tahoma characters '{', '}', and '\'. These Tahoma characters are understood by the tool; they are symptoms of the "curly-brace problem".

Failed Conversions

In this section, you'll learn how to tell if a conversion has succeeded in full, ran into minor problems, or failed altogether.

TMW to Wylie

Note that some TMW glyphs have no transliteration in Exteded Wylie.  When you encounter such a glyph, you'll find a message like the following in your RTF output:

<<[[JSKAD_TMW_TO_WYLIE_ERROR_NO_SUCH_WYLIE: Cannot convert DuffCode <duffcode font=TibetanMachineWeb8 charNum=101 character=e/> to THDL Extended Wylie.  Please see the documentation for the TMW font and transcribe this yourself.]]>>

Upon seeing this, you should consult the documentation for the specific TMW font named.  Find the glyph (by its charNum) and decide how to proceed.  If you find a glyph that you believe should have been converted into Extended Wylie by the tool, please report this as a bug.

Other Conversions

The other conversions are all-or-nothing.  That is, if you run into any trouble whatsoever, the result will be a file containing just the problematic glyphs.  If your result is as long as your input, then the conversion went flawlessly.

There is one TMW glyph (TibetanMachineWeb7, glyph 91) that has no Tibetan Machine equivalent.  This glyph is the only TMW glyph that can cause a TMW->TM conversion to fail.

You might consider using Jskad to convert documents that give errors, as it has better error reporting and can tell you just what's wrong.

If you ever encounter problems in a TM->TMW conversion, please send us mail with the error report (and the problem input document's resulting document) so that we can improve our tools. 

Invoking the Converter

First add Jskad.jar to your CLASSPATH.  Now run the command java org.thdl.tib.input.TibetanConverter from a command prompt.  You will see usage information appear.

Known Bugs

If the TMW given is not syntactically legal, then the Wylie that results will not necessarily yield, if imported into Jskad, the same Tibetan with which the converter started.  The glyphs corresponding to the Wylie 'jaskadaskeda' have this problem, for example.

Please e-mail us your comments about this page.

The THDL Tools project is generously hosted by: SourceForge Logo