Menu loading... |
Tibetan Machine Web ConverterIn recent versions of Jskad, the 'Tools' menu has an option 'Launch Converter...'. If you use that option, you will find a first-class Tibetan-to-Tibetan and Tibetan-to-Wylie converter. That converter has a user-friendly GUI interface, and it tells you when things go wrong (even things as subtle as your having selected the wrong conversion). If you need a command-line interface to that converter, however, read on. In the same JAR file as Jskad, power users will find a command-line utility that converts Tibetan documents from one digital representation to another. The converter embodies the same technology as Jskad itself, but often works even when Jskad fails due to Java's presently poor support for viewing RTF documents. This command-line utility converts a Tibetan Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to either of these three output formats: In the same JAR file as Jskad, power users will find a command-line utility that converts Tibetan documents from one digital representation to another. The converter embodies the same technology as Jskad itself, but often works even when Jskad fails due to Java's presently poor support for viewing RTF documents. This command-line utility converts a Tibetan Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to either of these three output formats:
In addition, this converter can convert Tibetan Machine RTF files to Tibetan Machine Web RTF files, and takes precautions to ensure that only a 100% perfect conversion is done in both directions (TM->TMW and TMW>TM). One such precaution is that two independent teams (Garrett and Garson, Chandler) turned the Tibetan Machine Web documentation into TM<->TMW tables. These tables were compared, giving full confidence that the tables are as accurate as the documentation (which has a few flaws itself). That documentation has not been extensively verified against the actual fonts, however. Another precaution is that any unknown characters cause the conversion to fail, and the result is a document containing merely the unknown characters. (There are some known, illegal glyphs created by Tibet Doc, and the converter handles the ones it knows of and treats the rest as unknown.) This converter is smart enough to solve the "curly-brace problem", wherein Tahoma '{', '}', and '\' characters appear instead of the TMW stacks they are supposed to represent. This problem originates with certain versions of Microsoft Word's Rich Text Format writing capabilities. Further, this converter gives a polite error message when a given .rtf file simply cannot be read by the version of Java used. Perhaps most importantly, the converter has a --find-some-non-tmw mode of operation that gives you, the user, confidence that RTF reading and writing idiosyncrasies are not going to interfere with a flawless conversion. It does so by printing out the first occurrence of a given character in a non-TMW font. Here is some example output: java -cp "c:\my thdl tools\Jskad.jar" \ org.thdl.tib.input.TibetanConverter \ --find-some-non-tmw \ "Dalai Lama Fifth History 01.rtf" Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39 Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45 Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66 Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219 Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237 Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754 Given the above output, you can be sure that a flawless conversion (barring the appearance of known bugs) will result when you run java -cp "c:\my thdl tools\Jskad.jar" org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended Wylie.rtf". (Note that the '>' causes the output to be directed to the file named thereafter; this is quite handy.) This is because the only text in the input file besides Tibetan is whitespace and the Tahoma characters '{', '}', and '\'. These Tahoma characters are understood by the tool; they are symptoms of the "curly-brace problem". Failed ConversionsIn this section, you'll learn how to tell if a conversion has succeeded in full, ran into minor problems, or failed altogether. TMW to Wylie
This section is too up-to-date -- this is documenting plans for the
future. At present, an error message like
Note that some TMW glyphs have no transliteration in Exteded Wylie. When you encounter such a glyph, you'll find \tmwXYYY in your output, where X tells you which TMW font the troublesome glyph comes from and YYY is the decimal number of the glyph in that font (which is a number between 000 and 255 inclusive, usually between 33 and 126). The following are values corresponding to X:
Upon finding a \tmwXYYY sequence in your output, you should consult the documentation for the specific TMW font named. Find the glyph (by its YYY value) and decide how to proceed. If you find a glyph that you believe should have been converted into Extended Wylie by the tool, please report this as a bug through the SourceForge website or via e-mail. Other ConversionsThe other conversions are all-or-nothing. That is, if you run into any trouble whatsoever, the result will be a file containing just the problematic glyphs, each preceded by achen (i.e., U+0F68, the letter whose THDL Extended Wylie representation is 'a'). These glyphs will be bracketed on the left by U+0F3C (for which the THDL Extended Wylie is '(') and on the right by U+0F3D (for which the THDL Extended Wylie is ')'). If your result is as long as your input, then the conversion went flawlessly. There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091]) that has no Tibetan Machine equivalent. This glyph is the only TMW glyph that can cause a TMW->TM conversion to fail. It is fairly common, though, especially if you've used Jskad to prepare your document. It might be appropriate to change the document to use TibetanMachineWeb7, glyph 90 [\tmw7090], a similar glyph that does have a TM equivalent. You might consider using Jskad to convert documents that give errors, as it has better error reporting and can tell you just what's wrong. If you ever encounter problems in a TM->TMW conversion, please send us mail with the error report (and the problem input document's resulting document) so that we can improve our tools. Invoking the Converter
First add Jskad.jar to your CLASSPATH. You can do this by
setting an environment variable CLASSPATH to contain the absolute
path of the Jskad.jar file and then running the command java
org.thdl.tib.input.TibetanConverter. Alternatively, you
can use Known BugsAll known bugs are listed in this section. They're more likely to be fixed if users complain, so complain away. There are no known bugs at present. Please e-mail us your comments about this page. The THDL Tools project is generously hosted by: |