THDL : Tools : Software : Tibetan Machine Web Converter

Tibetan Machine Web Converter

In the same JAR file as Jskad, power users will find a command-line utility that converts a Tibetan Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to either of these two output formats:

  • RTF files with the appropriate THDL Extended Wylie (Wylie) used instead of TMW
  • RTF files in Tibetan Machine (used in legacy systems)

This converter is smart enough to solve the "curly-brace problem".  This problem originates with certain versions of Microsoft Word's Rich Text Format writing capabilities.

Further, this converter gives a polite error message when a given .rtf file simply cannot be read by the version of Java used.

Perhaps most importantly, the converter has a --find-some-non-tmw mode of operation that gives you, the user, confidence that RTF reading and writing idiosyncrasies are not going to interfere with a flawless conversion.  It does so by printing out the first occurrence of a given character in a non-TMW font.  Here is some example output:

java -cp Jskad.jar \
     org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE \
        --find-some-non-tmw \
        "Dalai Lama Fifth History 01.rtf"
non-TMW character newline in the font Tahoma appears first at location 39
non-TMW character ' ' in the font TimesNewRoman appears first at location 45
non-TMW character '}' in the font Tahoma appears first at location 66
non-TMW character '{' in the font Tahoma appears first at location 219
non-TMW character '\' in the font Tahoma appears first at location 1237
non-TMW character newline in the font Times New Roman appears first at location 9754

Given the above output, you can be sure that a flawless conversion (barring the appearance of known bugs) will result when you run java -cp Jskad.jar org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE --to-wylie "Dalai Lama Fifth History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended Wylie.rtf".  This is because the only text in the input file besides Tibetan is whitespace and the Tahoma characters '{', '}', and '\'. These Tahoma characters are understood by the tool; they are symptoms of the "curly-brace problem".

Note that some TMW glyphs have no transliteration in Exteded Wylie.  When you encounter such a glyph, you'll find a message like the following in your RTF output:

<<[[JSKAD_TMW_TO_WYLIE_ERROR_NO_SUCH_WYLIE: Cannot convert DuffCode <duffcode font=TibetanMachineWeb8 charNum=101 character=e/> to THDL Extended Wylie.  Please see the documentation for the TMW font and transcribe this yourself.]]>>

Upon seeing this, you should consult the documentation for the specific TMW font named.  Find the glyph (by its charNum) and decide how to proceed.  If you find a glyph that you believe should have been converted into Extended Wylie by the tool, please report this as a bug.

Note also that there is one TMW glyph (TibetanMachineWeb7, glyph 91) that has no Tibetan Machine equivalent. A 72-point copy of the alphabet and the Tibetan numbers will be inserted (in TMW) in place of this glyph.

Invoking the Converter

First add Jskad.jar to your CLASSPATH.  Now run the command java org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE from a command prompt.  You will see usage information appear.  Forgive the name; this converter's scope widened after its creation.

Known Bugs

If the TMW given is not syntactically legal, then the Wylie that results will not necessarily yield, if imported into Jskad, the same Tibetan with which the converter started.

Please e-mail us your comments about this page.

The THDL Tools project is generously hosted by: SourceForge Logo