338 lines
13 KiB
HTML
338 lines
13 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
|
|
<!-- @author David Chandler -->
|
|
<!-- @date-created May 18, 2003 -->
|
|
<!-- @editor Emacs, baby! -->
|
|
|
|
|
|
<head>
|
|
<title>Tibetan Machine Web Converter</title>
|
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
<script type="text/javascript" src="http://iris.lib.virginia.edu/tibet/scripts/thdl_scripts.js"></script>
|
|
<link rel="stylesheet" type="text/css" href="http://iris.lib.virginia.edu/tibet/style/thdl-styles.css"/>
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div id="banner">
|
|
<a id="logo" href="http://iris.lib.virginia.edu/tibet/index.html"><img id="test" alt="THDL Logo" src="http://iris.lib.virginia.edu/tibet/images/logo.png"/></a>
|
|
<h1>The Tibetan & Himalayan Digital Library</h1>
|
|
|
|
<div id="menubar">
|
|
<script type='text/javascript'>function Go(){return}</script>
|
|
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/thdl_menu_config.js'></script>
|
|
|
|
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/menu_new.js'></script>
|
|
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/menu9_com.js'></script>
|
|
<noscript><p>Your browser does not support javascript.</p></noscript>
|
|
<div id='MenuPos' >Menu Loading... </div>
|
|
</div><!--END menubar-->
|
|
|
|
</div><!--END banner-->
|
|
|
|
<div id="sub_banner">
|
|
<div id="search">
|
|
<form method="get" action="http://www.google.com/u/thdl">
|
|
<p>
|
|
<input type="text" name="q" id="q" size="15" maxlength="255" value="" />
|
|
<input type="submit" name="sa" id="sa" value="Search"/>
|
|
<input type="hidden" name="hq" id="hq" value="inurl:iris.lib.virginia.edu"/>
|
|
</p>
|
|
</form>
|
|
|
|
</div>
|
|
<div id="breadcrumbs">
|
|
<a href="http://iris.lib.virginia.edu/tibet/index.html">Home</a> > <a href="index.html">Tools</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/software.html">Software</a> > Nightly Builds
|
|
</div>
|
|
</div><!--END banner-->
|
|
|
|
|
|
<div id="main">
|
|
|
|
<h2>Tibetan Machine Web Converter</h2>
|
|
|
|
<p>
|
|
In recent versions of Jskad, the 'Tools' menu has an option 'Launch
|
|
Converter...'. If you use that option, you will find a
|
|
first-class Tibetan-to-Tibetan and Tibetan-to-Wylie converter.
|
|
That converter has a user-friendly GUI interface, and it tells you
|
|
when things go wrong (even things as subtle as your having selected
|
|
the wrong conversion). If you need a command-line interface to
|
|
that converter, however, read on.
|
|
</p>
|
|
|
|
<p>
|
|
In the same JAR file as Jskad, power users will find a command-line
|
|
utility that converts Tibetan documents from one digital
|
|
representation to another. The converter embodies the same
|
|
technology as Jskad itself, but often works even when Jskad fails
|
|
due to Java's presently poor support for viewing RTF
|
|
documents. This command-line utility converts a Tibetan
|
|
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
|
|
either of these three output formats:
|
|
</p>
|
|
|
|
<p>
|
|
In the same JAR file as Jskad, power users will find a command-line
|
|
utility that converts Tibetan documents from one digital
|
|
representation to another. The converter embodies the same
|
|
technology as Jskad itself, but often works even when Jskad fails
|
|
due to Java's presently poor support for viewing RTF
|
|
documents. This command-line utility converts a Tibetan
|
|
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
|
|
either of these three output formats:
|
|
</p>
|
|
<ul>
|
|
<li>RTF files in Unicode</li>
|
|
<li>RTF files with the appropriate THDL Extended Wylie (Wylie) used
|
|
instead of TMW</li>
|
|
<li>RTF files in Tibetan Machine (used in legacy systems)</li>
|
|
</ul>
|
|
|
|
<p>
|
|
In addition, this converter can convert Tibetan Machine RTF files to
|
|
Tibetan Machine Web RTF files, and takes precautions to ensure that
|
|
only a 100% perfect conversion is done in both directions
|
|
(TM->TMW and TMW>TM). One such precaution is that two
|
|
independent teams (Garrett and Garson, Chandler) turned the Tibetan
|
|
Machine Web <a
|
|
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
|
documentation</a> into TM<->TMW tables. These tables
|
|
were compared, giving full confidence that the tables are as
|
|
accurate as the documentation (which has a <a
|
|
href="http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515">
|
|
few flaws</a> itself). That documentation has not been
|
|
extensively verified against the actual fonts, however.
|
|
Another precaution is that any unknown characters cause the
|
|
conversion to fail, and the result is a document containing merely
|
|
the unknown characters. (There are some known, illegal glyphs
|
|
created by Tibet Doc, and the converter handles the ones it knows of
|
|
and treats the rest as unknown.)
|
|
</p>
|
|
|
|
<p>
|
|
This converter is smart enough to solve the "curly-brace
|
|
problem", wherein Tahoma '{', '}', and '\' characters appear
|
|
instead of the TMW stacks they are supposed to represent. This
|
|
problem originates with certain versions of Microsoft Word's Rich
|
|
Text Format writing capabilities.
|
|
</p>
|
|
|
|
<p>
|
|
Further, this converter gives a polite error message when a given
|
|
.rtf file simply cannot be read by the version of Java used.
|
|
</p>
|
|
|
|
<p>
|
|
Perhaps most importantly, the converter has a
|
|
<tt>--find-some-non-tmw</tt> mode of operation that gives you, the
|
|
user, confidence that RTF reading and writing idiosyncrasies are not
|
|
going to interfere with a flawless conversion. It does so by
|
|
printing out the first occurrence of a given character in a non-TMW
|
|
font. Here is some example output:
|
|
</p>
|
|
<pre>
|
|
java -cp "c:\my thdl tools\Jskad.jar" \
|
|
org.thdl.tib.input.TibetanConverter \
|
|
--find-some-non-tmw \
|
|
"Dalai Lama Fifth History 01.rtf"
|
|
|
|
Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39
|
|
Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45
|
|
Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66
|
|
Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219
|
|
Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237
|
|
Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754
|
|
</pre>
|
|
|
|
<p>
|
|
Given the above output, you can be sure that a flawless conversion
|
|
(barring the appearance of <a href="#knownbugs">known bugs</a>) will
|
|
result when you run <tt>java -cp "c:\my thdl tools\Jskad.jar"
|
|
org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth
|
|
History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended
|
|
Wylie.rtf"</tt>. (Note that the '>' causes the output to be
|
|
directed to the file named thereafter; this is quite handy.)
|
|
This is because the only text in the input file besides Tibetan is
|
|
whitespace and the Tahoma characters <tt>'{'</tt>, <tt>'}'</tt>, and
|
|
<tt>'\'</tt>. These Tahoma characters are understood by the tool;
|
|
they are symptoms of the "curly-brace problem".
|
|
</p>
|
|
|
|
<h3>Failed Conversions</h3>
|
|
|
|
<p>
|
|
In this section, you'll learn how to tell if a conversion has
|
|
succeeded in full, ran into minor problems, or failed altogether.
|
|
</p>
|
|
|
|
<h4>TMW to Wylie</h4>
|
|
|
|
<p>
|
|
<font color="red">
|
|
This section is too up-to-date -- this is documenting plans for the
|
|
future. At present, an error message like
|
|
<code><<[[JSKAD_TMW_TO_WYLIE_ERROR_NO_SUCH_WYLIE: Cannot
|
|
convert DuffCode <duffcode font=TibetanMachineWeb7 charNum=72
|
|
character=H/> to THDL Extended Wylie. Please see the
|
|
documentation for the TMW font and transcribe this
|
|
yourself.]]>></code> appears.
|
|
</font>
|
|
</p>
|
|
|
|
<p>
|
|
Note that some TMW glyphs have no transliteration in Exteded
|
|
Wylie. When you encounter such a glyph, you'll find
|
|
<tt>\tmwXYYY</tt> in your output, where X tells you which TMW font
|
|
the troublesome glyph comes from and YYY is the decimal number of
|
|
the glyph in that font (which is a number between 000 and 255
|
|
inclusive, usually between 33 and 126). The following are
|
|
values corresponding to X:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
When X is 0, the TibetanMachineWeb font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 1, the TibetanMachineWeb1 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 2, the TibetanMachineWeb2 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 3, the TibetanMachineWeb3 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 4, the TibetanMachineWeb4 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 5, the TibetanMachineWeb5 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 6, the TibetanMachineWeb6 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 7, the TibetanMachineWeb7 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 8, the TibetanMachineWeb8 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 9, the TibetanMachineWeb9 font contains the glyph.
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
Upon finding a <tt>\tmwXYYY</tt> sequence in your output, you should
|
|
consult the <a
|
|
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
|
documentation</a> for the specific TMW font named. Find the
|
|
glyph (by its YYY value) and decide how to proceed. If you
|
|
find a glyph that you believe should have been converted into
|
|
Extended Wylie by the tool, please report this as a bug through the
|
|
SourceForge website or via e-mail.
|
|
</p>
|
|
|
|
<h4>Other Conversions</h4>
|
|
|
|
<p>
|
|
The other conversions are all-or-nothing. That is, if you run
|
|
into any trouble whatsoever, the result will be a file containing
|
|
just the problematic glyphs, each preceded by achen (i.e., U+0F68,
|
|
the letter whose THDL Extended Wylie representation is 'a').
|
|
These glyphs will be bracketed on the left by U+0F3C (for which the
|
|
THDL Extended Wylie is '(') and on the right by U+0F3D (for which
|
|
the THDL Extended Wylie is ')'). If your result is as long as
|
|
your input, then the conversion went flawlessly.
|
|
</p>
|
|
|
|
<p>
|
|
There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091])
|
|
that has no Tibetan Machine equivalent. This glyph is the only
|
|
TMW glyph that can cause a TMW->TM conversion to fail. It
|
|
is fairly common, though, especially if you've used Jskad to prepare
|
|
your document. It might be appropriate to change the document
|
|
to use TibetanMachineWeb7, glyph 90 [\tmw7090], a similar glyph that
|
|
does have a TM equivalent.
|
|
</p>
|
|
|
|
<p>
|
|
You might consider using Jskad to convert documents that give
|
|
errors, as it has better error reporting and can tell you just
|
|
what's wrong.
|
|
</p>
|
|
<p>
|
|
If you ever encounter problems in a TM->TMW conversion, please
|
|
send us mail with the error report (and the problem input document's
|
|
resulting document) so that we can improve our tools.
|
|
</p>
|
|
|
|
<h3>Invoking the Converter</h3>
|
|
|
|
<p>
|
|
First add Jskad.jar to your CLASSPATH. You can do this by
|
|
setting an environment variable CLASSPATH to contain the absolute
|
|
path of the Jskad.jar file and then running the command <tt>java
|
|
org.thdl.tib.input.TibetanConverter</tt>. Alternatively, you
|
|
can use <code>java -cp "c:\my tibetan documents\Jskad.jar"
|
|
org.thdl.tib.input.TibetanConverter</code> where you put in the
|
|
appropriate path to Jskad.jar. You will see usage information
|
|
appear if you do this correctly; you'll see a message like
|
|
<code>java.lang.NoClassDefFoundError:
|
|
org/thdl/tib/input/TibetanConverter; Exception in thread
|
|
"main"</code> if you've not correctly told Java where to find
|
|
Jskad.jar.
|
|
</p>
|
|
|
|
<h3><a name="knownbugs"></a>Known Bugs</h3>
|
|
|
|
<p>
|
|
All known bugs are listed in this section. They're more likely
|
|
to be fixed if users complain, so complain away.
|
|
</p>
|
|
|
|
<p>
|
|
There are no known bugs at present.
|
|
</p>
|
|
|
|
<p>
|
|
Please
|
|
|
|
<a href="mailto:thdltools-devel@lists.sourceforge.net">
|
|
e-mail us</a>
|
|
|
|
your comments about this page.
|
|
</p>
|
|
|
|
<p>
|
|
The
|
|
<a href="http://www.sourceforge.net/projects/thdltools">
|
|
THDL Tools</a>
|
|
project is generously hosted by:
|
|
<!--
|
|
|
|
DO NOT DELETE THE SF.NET LOGO.
|
|
|
|
We have a choice of colors and sizes for this logo (see
|
|
"https://sourceforge.net/docman/display_doc.php?docid=790&group_id=1"),
|
|
but we do not have the option of removing it. SourceForge requests
|
|
that we put it on each web page for our project, and to give us
|
|
incentive to do so, they will not track the number of hits for our
|
|
project web pages unless we put this link in. To track hits, see
|
|
"http://sourceforge.net/project/stats/index.php?report=months&group_id=61934".
|
|
|
|
-->
|
|
<a href="http://sourceforge.net/">
|
|
<img src="http://sourceforge.net/sflogo.php?group_id=61934&type=1"
|
|
width="88" height="31" alt="SourceForge Logo" />
|
|
</a>
|
|
<!-- AGAIN, DO NOT DELETE THE SF.NET LOGO. -->
|
|
</p>
|
|
</div>
|
|
|
|
|
|
</body>
|
|
</html>
|